0% found this document useful (0 votes)
2 views

instructions 2

The document outlines the development of SceneScoutAI, a macOS application designed for video content analysis and management, featuring advanced capabilities such as video transcription, object recognition, and metadata generation. It emphasizes user experience through a guided onboarding process, accessibility, and adherence to Apple's design standards while utilizing Swift and SwiftUI for implementation. The final deliverables include a fully functional app with seamless video processing, an engaging interface, and comprehensive error handling.

Uploaded by

justintylermoore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

instructions 2

The document outlines the development of SceneScoutAI, a macOS application designed for video content analysis and management, featuring advanced capabilities such as video transcription, object recognition, and metadata generation. It emphasizes user experience through a guided onboarding process, accessibility, and adherence to Apple's design standards while utilizing Swift and SwiftUI for implementation. The final deliverables include a fully functional app with seamless video processing, an engaging interface, and comprehensive error handling.

Uploaded by

justintylermoore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

**Title: Develop macOS Application SceneScoutAI**

### **Description**:

You are tasked with building **SceneScoutAI**, a sophisticated macOS application for video content analysis and
management. SceneScoutAI leverages cutting-edge video processing technologies for efficient video library
management, including features like video transcription, object recognition, metadata creation, scene detection, and
detailed processing tracking. SceneScoutAI is intended to exceed Apple's design standards with a focus on accessibility,
advanced technology integration, and an engaging user experience.

**Your goal is to deliver a feature-rich, visually appealing, and robust macOS application** using **Swift** and
**SwiftUI**, ensuring that it meets the functional and aesthetic requirements outlined below.

### **App Overview**:

SceneScoutAI is a macOS tool that enables users to:

- Drag and drop video files into a designated area for automatic processing.
- Undergo a guided onboarding experience to set up the OpenAI API key and initial configurations.
- Extract audio from videos and transcribe it using OpenAI Whisper, with formatting and metadata generation via GPT.
- Perform scene and object detection, tagging objects with timecodes.
- Automatically update the video library with relevant metadata, including generating a transcript text file and a CSV
with detected objects and timestamps.
- Generate video thumbnails, manage settings, and log application actions for user transparency.

### **Core Features**:

1. **Drag and Drop Video Input**: Users should be able to drag and drop video files to initiate processing. Implement a
drop zone interface using SwiftUI.

**Code Snippet**:

```swift
struct DropZoneView: View {
@State private var isDragging = false

var body: some View {


ZStack {
RoundedRectangle(cornerRadius: 10)
.fill(isDragging ? Color.gray.opacity(0.5) : Color.blue.opacity(0.3))
.frame(height: 200)
.overlay(Text("Drag & Drop Video Here").foregroundColor(.white))
}
.onDrop(of: ["public.file-url"], isTargeted: $isDragging) { providers in
handleDrop(providers)
return true
}
}

private func handleDrop(_ providers: [NSItemProvider]) {


// Handle dropped video URL here
}
}

instructions 2.txt[3/10/25, 9:08:50 AM]


```

2. **Onboarding Process**: Create a multi-screen onboarding experience guiding first-time users to:

- Enter their OpenAI API key.


- Select a default output folder.
- Get familiar with SceneScoutAI’s capabilities, using clear illustrations or animations.
- Track onboarding completion with `UserDefaults`.

**Code Snippet**:

```swift
struct OnboardingView: View {
@State private var currentStep = 0

var body: some View {


VStack {
if currentStep == 0 {
Text("Welcome to SceneScoutAI")
Button("Next") {
currentStep += 1
}
} else if currentStep == 1 {
Text("Enter OpenAI API Key")
TextField("API Key", text: .constant(""))
Button("Next") {
currentStep += 1
}
}
// More steps here...
}
}
}
```

3. **Video Processing Pipeline**:

- Extract the audio from video files and send it to OpenAI Whisper for transcription.
- Send the transcript to GPT for proper formatting, and generate a title and overview of the content.
- Perform macOS native scene detection to identify scenes and analyze their content.
- Conduct object recognition using the native Vision framework, for people, buildings, landmarks, and other named
entities, tagging the identified items with timestamps.
- Update the video library with metadata, including a `transcript.txt` and CSV.

**Code Snippet**:

```swift
func processVideo(url: URL) {
DispatchQueue.global(qos: .userInitiated).async {
let audioURL = extractAudio(from: url)
let transcription = transcribeAudio(audioURL)
let formattedText = formatTranscript(transcription)
let scenes = detectScenes(in: url)
let objects = recognizeObjects(in: scenes)

instructions 2.txt[3/10/25, 9:08:50 AM]


DispatchQueue.main.async {
updateLibrary(with: formattedText, scenes: scenes, objects: objects)
}
}
}

private func extractAudio(from url: URL) -> URL {


// Audio extraction logic here
}

private func transcribeAudio(_ audioURL: URL) -> String {


// Call OpenAI Whisper API here
}
```

4. **Settings Management**: Create a **Settings View** in SwiftUI where users can:

- Update the OpenAI API key.


- Adjust sensitivity settings for object and scene detection.
- Toggle options such as "Merge CSV" for merged metadata management.

**Code Snippet**:

```swift
struct SettingsView: View {
@AppStorage("apiKey") var apiKey: String = ""
@AppStorage("mergeCSV") var mergeCSV: Bool = false

var body: some View {


Form {
Section(header: Text("API Settings")) {
TextField("OpenAI API Key", text: $apiKey)
}
Section(header: Text("Preferences")) {
Toggle("Merge CSV", isOn: $mergeCSV)
}
}
.navigationTitle("Settings")
}
}
```

5. **Thumbnail Generation**: Extract thumbnails from videos. Handle any errors gracefully, with clear fallback
mechanisms.

**Code Snippet**:

```swift
func generateThumbnail(for url: URL) -> UIImage? {
let asset = AVAsset(url: url)
let imageGenerator = AVAssetImageGenerator(asset: asset)
do {
let cgImage = try imageGenerator.copyCGImage(at: CMTime(seconds: 1.0, preferredTimescale: 600),

instructions 2.txt[3/10/25, 9:08:50 AM]


actualTime: nil)
return UIImage(cgImage: cgImage)
} catch {
print("Error generating thumbnail: \(error.localizedDescription)")
return UIImage(systemName: "photo")
}
}
```

6. **Library Management**:

- A dedicated library view should showcase all processed videos with indicators like a red dot for errors or green for
success.
- Each video entry should feature clickable icons to access the transcript, spreadsheet, or perform translations.

**Code Snippet**:

```swift
struct LibraryView: View {
@State private var videos: [VideoItem] = []

var body: some View {


List(videos) { video in
HStack {
Text(video.title)
Spacer()
Image(systemName: video.hasError ? "exclamationmark.triangle" : "checkmark.circle")
Button(action: { viewTranscript(for: video) }) {
Image(systemName: "doc.text")
}
Button(action: { viewSpreadsheet(for: video) }) {
Image(systemName: "tablecells")
}
}
}
}
}
```

7. **Translation Feature**:

- Allow translations for the transcript into Spanish, French, German, Chinese, or English.
- Use GPT for translation, followed by OpenAI TTS to generate audio.
- Display the availability of translated content using icons with green checkmarks.

**Code Snippet**:

```swift
func translateTranscript(_ transcript: String, to language: String) -> String {
// Call GPT API for translation
// Return translated text
}

func generateAudio(from text: String) {

instructions 2.txt[3/10/25, 9:08:50 AM]


// Call OpenAI TTS model
}
```

8. **Error Handling**: Implement extensive error management with safe optional binding (`guard` or `if let`) to avoid
`nil` values.

**Code Snippet**:

```swift
func safeProcessVideo(url: URL?) {
guard let validURL = url else {
print("Invalid URL provided.")
return
}
// Proceed with video processing
}
```

9. **Processing View**:

- A detailed log view with auto-scrolling should show the step-by-step video processing status.
- A visual thumbnail of the currently processed frame should be included to give immediate visual feedback.
- Users can cancel processing anytime, and progress should be saved for resumption.

**Code Snippet**:

```swift
struct ProcessingView: View {
@State private var logMessages: [String] = []
@State private var thumbnail: UIImage?

var body: some View {


HStack {
VStack {
if let thumbnail = thumbnail {
Image(uiImage: thumbnail)
.resizable()
.frame(width: 100, height: 100)
}
Text("Filename.mp4")
}
List(logMessages, id: \.\self) { log in
Text(log)
}
.onAppear {
// Simulate log updates
DispatchQueue.main.asyncAfter(deadline: .now() + 1) {
logMessages.append("Started processing...")
}
}
}
}
}

instructions 2.txt[3/10/25, 9:08:50 AM]


```

10. **Logging Framework**: Replace all debug `print()` statements with a unified `log` function, using `#if DEBUG`
for conditional compilation.

**Code Snippet**:

```swift
func log(_ message: String) {
#if DEBUG
print("[DEBUG] \(message)")
#endif
}
```

11. **Background Processing**:

- Ensure that video processing occurs in the background using `DispatchQueue` for responsiveness.
- Allow users to navigate the app while video processing is ongoing in the background.

**Code Snippet**:

```swift
func startBackgroundProcessing(for videoURL: URL) {
DispatchQueue.global(qos: .background).async {
processVideo(url: videoURL)
}
}
```

### **Detailed Workflow**:

1. **App Launch**:
- If first launch: Show **Onboarding View** for initial setup.
- Otherwise: Display a main window with a drop zone, library icon, and settings icon.
**Code Snippet**:
```swift
@main
struct SceneScoutAIApp: App {
@AppStorage("hasCompletedOnboarding") var hasCompletedOnboarding: Bool = false

var body: some Scene {


WindowGroup {
if hasCompletedOnboarding {
ContentView()
} else {
OnboardingView()
}
}
}
}
```

### **User Experience Considerations**:

instructions 2.txt[3/10/25, 9:08:50 AM]


- **Engagement**: Make the onboarding fun with visual aids and clear, interactive steps.
- **Guidance**: Provide clear instructions at each stage to avoid overwhelming users.
- **Accessibility**: Ensure features are approachable, with a focus on diverse abilities and preferences.
- **Aesthetic Excellence**: Exceed Apple's design standards; make SceneScoutAI visually delightful and simple to use.
- **Gamification**: Add small gamified elements, such as rewards for completing onboarding or successfully
processing a certain number of videos.
- **Efficiency**: Optimize for macOS compatibility, battery usage, and smooth performance.

### **Design Goals**:

- **Delight and Fun**: Transform routine tasks into engaging activities. Example: Progress bars with animations during
processing.
- **High Aesthetic Quality**: Make the app visually stunning. Include intuitive icons and animations that exceed
Apple's design quality.
- **Keep Users Engaged**: Blend functionality with creative design that brings joy to frequent users. Ensure all UI
elements are consistent, crisp, and follow a logical flow.
- **Accessibility and Inclusivity**: Maintain usability for users with a wide range of preferences and abilities.
- **Innovation and Creativity**: Integrate advanced technologies, like AI-driven object recognition, and provide users
with meaningful insights into their video content.

### **Final Deliverables**:

A fully functional macOS app named **SceneScoutAI**, including all core features and meeting all design standards.
The app should:

- Provide seamless drag-and-drop video processing.


- Include an intuitive and engaging onboarding process.
- Maintain a responsive and visually appealing library for managing video assets.
- Incorporate advanced AI features for transcription, translation, scene detection, and object recognition.
- Reflect a high standard of aesthetic and functional design, ensuring the best possible user experience.

### **Output Requirements**:

Generate code with modular, easily maintainable components, making sure that each feature is well-documented,
follows Swift best practices, and has comprehensive error handling and logging for debugging purposes.

instructions 2.txt[3/10/25, 9:08:50 AM]

You might also like