instructions 2
instructions 2
### **Description**:
You are tasked with building **SceneScoutAI**, a sophisticated macOS application for video content analysis and
management. SceneScoutAI leverages cutting-edge video processing technologies for efficient video library
management, including features like video transcription, object recognition, metadata creation, scene detection, and
detailed processing tracking. SceneScoutAI is intended to exceed Apple's design standards with a focus on accessibility,
advanced technology integration, and an engaging user experience.
**Your goal is to deliver a feature-rich, visually appealing, and robust macOS application** using **Swift** and
**SwiftUI**, ensuring that it meets the functional and aesthetic requirements outlined below.
- Drag and drop video files into a designated area for automatic processing.
- Undergo a guided onboarding experience to set up the OpenAI API key and initial configurations.
- Extract audio from videos and transcribe it using OpenAI Whisper, with formatting and metadata generation via GPT.
- Perform scene and object detection, tagging objects with timecodes.
- Automatically update the video library with relevant metadata, including generating a transcript text file and a CSV
with detected objects and timestamps.
- Generate video thumbnails, manage settings, and log application actions for user transparency.
1. **Drag and Drop Video Input**: Users should be able to drag and drop video files to initiate processing. Implement a
drop zone interface using SwiftUI.
**Code Snippet**:
```swift
struct DropZoneView: View {
@State private var isDragging = false
2. **Onboarding Process**: Create a multi-screen onboarding experience guiding first-time users to:
**Code Snippet**:
```swift
struct OnboardingView: View {
@State private var currentStep = 0
- Extract the audio from video files and send it to OpenAI Whisper for transcription.
- Send the transcript to GPT for proper formatting, and generate a title and overview of the content.
- Perform macOS native scene detection to identify scenes and analyze their content.
- Conduct object recognition using the native Vision framework, for people, buildings, landmarks, and other named
entities, tagging the identified items with timestamps.
- Update the video library with metadata, including a `transcript.txt` and CSV.
**Code Snippet**:
```swift
func processVideo(url: URL) {
DispatchQueue.global(qos: .userInitiated).async {
let audioURL = extractAudio(from: url)
let transcription = transcribeAudio(audioURL)
let formattedText = formatTranscript(transcription)
let scenes = detectScenes(in: url)
let objects = recognizeObjects(in: scenes)
**Code Snippet**:
```swift
struct SettingsView: View {
@AppStorage("apiKey") var apiKey: String = ""
@AppStorage("mergeCSV") var mergeCSV: Bool = false
5. **Thumbnail Generation**: Extract thumbnails from videos. Handle any errors gracefully, with clear fallback
mechanisms.
**Code Snippet**:
```swift
func generateThumbnail(for url: URL) -> UIImage? {
let asset = AVAsset(url: url)
let imageGenerator = AVAssetImageGenerator(asset: asset)
do {
let cgImage = try imageGenerator.copyCGImage(at: CMTime(seconds: 1.0, preferredTimescale: 600),
6. **Library Management**:
- A dedicated library view should showcase all processed videos with indicators like a red dot for errors or green for
success.
- Each video entry should feature clickable icons to access the transcript, spreadsheet, or perform translations.
**Code Snippet**:
```swift
struct LibraryView: View {
@State private var videos: [VideoItem] = []
7. **Translation Feature**:
- Allow translations for the transcript into Spanish, French, German, Chinese, or English.
- Use GPT for translation, followed by OpenAI TTS to generate audio.
- Display the availability of translated content using icons with green checkmarks.
**Code Snippet**:
```swift
func translateTranscript(_ transcript: String, to language: String) -> String {
// Call GPT API for translation
// Return translated text
}
8. **Error Handling**: Implement extensive error management with safe optional binding (`guard` or `if let`) to avoid
`nil` values.
**Code Snippet**:
```swift
func safeProcessVideo(url: URL?) {
guard let validURL = url else {
print("Invalid URL provided.")
return
}
// Proceed with video processing
}
```
9. **Processing View**:
- A detailed log view with auto-scrolling should show the step-by-step video processing status.
- A visual thumbnail of the currently processed frame should be included to give immediate visual feedback.
- Users can cancel processing anytime, and progress should be saved for resumption.
**Code Snippet**:
```swift
struct ProcessingView: View {
@State private var logMessages: [String] = []
@State private var thumbnail: UIImage?
10. **Logging Framework**: Replace all debug `print()` statements with a unified `log` function, using `#if DEBUG`
for conditional compilation.
**Code Snippet**:
```swift
func log(_ message: String) {
#if DEBUG
print("[DEBUG] \(message)")
#endif
}
```
- Ensure that video processing occurs in the background using `DispatchQueue` for responsiveness.
- Allow users to navigate the app while video processing is ongoing in the background.
**Code Snippet**:
```swift
func startBackgroundProcessing(for videoURL: URL) {
DispatchQueue.global(qos: .background).async {
processVideo(url: videoURL)
}
}
```
1. **App Launch**:
- If first launch: Show **Onboarding View** for initial setup.
- Otherwise: Display a main window with a drop zone, library icon, and settings icon.
**Code Snippet**:
```swift
@main
struct SceneScoutAIApp: App {
@AppStorage("hasCompletedOnboarding") var hasCompletedOnboarding: Bool = false
- **Delight and Fun**: Transform routine tasks into engaging activities. Example: Progress bars with animations during
processing.
- **High Aesthetic Quality**: Make the app visually stunning. Include intuitive icons and animations that exceed
Apple's design quality.
- **Keep Users Engaged**: Blend functionality with creative design that brings joy to frequent users. Ensure all UI
elements are consistent, crisp, and follow a logical flow.
- **Accessibility and Inclusivity**: Maintain usability for users with a wide range of preferences and abilities.
- **Innovation and Creativity**: Integrate advanced technologies, like AI-driven object recognition, and provide users
with meaningful insights into their video content.
A fully functional macOS app named **SceneScoutAI**, including all core features and meeting all design standards.
The app should:
Generate code with modular, easily maintainable components, making sure that each feature is well-documented,
follows Swift best practices, and has comprehensive error handling and logging for debugging purposes.