SE Final Documentation
SE Final Documentation
Key Challenges:
Objectives:
1. Purpose
2. Functional Scope
3. Features
Feature Description
4. Non-Functional Scope
Aspect Description
Visualisations
Out of Scope:
Assumptions:
We will address resources, costs, and risk planning next. For now, I’ll
proceed with Resource Estimation.
1. Technical Resources:
○ Development Tools: Streamlit, FAISS, PyTorch, NLTK, Groq
API, Transformers library.
○ Hardware:
■ Development machine: NVIDIA GPUs with 16GB VRAM
or higher.
■ Storage: At least 1TB SSD for model weights, datasets,
and logs.
○ APIs/Frameworks:
■ Groq API for Llama model integration.
■ PyMuPDF for PDF parsing.
■ YouTube Transcript API for video transcription.
2. Human Resources:
○ Project Manager (1): To oversee timelines, deliverables, and
team coordination.
○ Developers (2-3): Skilled in Python, machine learning, and
web development.
○ QA Engineers (1-2): For software quality and validation.
○ UI/UX Designer (1): To ensure user-friendly interaction.
3. Time Resources:
○ Development Period: 12-16 weeks.
○ Testing and Quality Assurance: Final 2-3 weeks.
○ Project Timeline Breakdown:
1. Phase 1 (Weeks 1-3): Requirements gathering, research, and
prototype design.
2. Phase 2 (Weeks 4-7): Core development—RAG implementation
with multimodal data processing.
3. Phase 3 (Weeks 8-10): Testing, optimization, and integration.
4. Phase 4 (Weeks 11-16): Deployment, quality assurance, and user
feedback incorporation.
Risk Identification
1. Data Integrity:
○ Implement data validation checks for the PDFs, images, and
video files before processing.
○ Use OCR libraries for image-based text extraction (e.g.,
Tesseract).
○ Preprocess and clean the data sources to avoid corrupt or
incomplete data.
2. System Integration:
○ Break down the integration into smaller modules and test
each data type (PDF, image, video) before full integration.
○ Use modular APIs to scrape and process data in separate
layers.
3. Performance:
○ Optimize deep learning models by fine-tuning the
parameters for lower computational overhead (using
libraries like PyTorch and Groq).
○ Use GPU acceleration for processing large files and videos.
4. Model Inaccuracy:
○ Regularly train the AI model with accurate and up-to-date
medical datasets.
○ Implement a feedback loop where doctors can verify the
AI's answers and correct it when necessary.
5. Compliance:
○ Ensure the system stores and processes data according to
relevant medical and data protection regulations.
○ Encrypt sensitive patient data and implement strong access
control mechanisms.
1. Objectives
Schedule Details
Task Dependencies
Requirements 1 3 3
Analysis
Prototype Design 2 3 2
Core Development 4 7 4
Testing and 8 10 3
Optimization
Deployment and 11 16 6
QA
Challenges and Mitigation
Introduction The SQA plan defines the activities and processes that
will ensure the software meets the required quality standards. It
outlines the objectives, strategies, and tools to be used to assess and
enhance the quality of the multi-modal RAG system.
Objectives of SQA
SQA Activities
Test Strategy
Risk Management
● Risks Identified:
○ Performance degradation when processing large PDFs or
images.
○ Accuracy of data extraction from websites or videos.
● Mitigation Plan:
○ Implement efficient algorithms and caching mechanisms to
handle large data inputs.
○ Use robust scraping and transcription techniques to ensure
data accuracy.
● Image Processing:
○ encode_to_64: Converts images to base64 strings to send
them for processing.
○ image_to_text: Uses Groq's API to process an image and
extract textual information (e.g., description).
○ further_query: Allows users to ask further questions based
on the image's description.
○ complete_image_func: Combines the above functions to
process the image and generate responses to further
queries.
● PDF Processing:
○ extract_text_and_images_from_pdf: Extracts both textual
content and images from PDF files using the PyMuPDF
library.
● Web Scraping:
○ scrape_page: Scrapes text and images from a given
webpage. It uses BeautifulSoup to parse the HTML content,
and also saves any images to the local file system.
● YouTube Video Processing:
○ extract_video_id: Extracts the YouTube video ID from a
URL.
○ YouTubeTranscriptApi: Retrieves the transcript of the
YouTube video and extracts the text.
The embeddings for each chunk are stored in a FAISS index, which is
used to retrieve the most relevant content based on the user's query.
This phase involves integrating all the functions and ensuring smooth
interaction between the components. The query handling is done by
the following:
1. beautifulsoup4
2. faiss_cpu
3. groq
5. nltk
6. numpy
7. opencv_python
8. Requests
9. streamlit
10. torch
11. transformers
12. youtube_transcript_api
13. pymupdf
Diagram source
Diagram sorce
1. Data Design
Data design ensures that the data structures and formats used are
optimal for the application needs. The key components of data design
in your system are:
2. Architecture Design
3. Interface Design
● Authentication:
○ The authentication system (login and signup) ensures that
users can securely access their personalized assistant.
● Text and Image Processing:
○ Image Processing: Uses the Groq API to process images.
Images are first encoded to base64 and then sent to Groq
for text extraction.
○ PDF Text and Image Extraction: Extracts both text and
images from PDFs, storing them locally for further
processing.
● Web Scraping:
○ Scrapes text and images from webpages using BeautifulSoup
and requests, storing the images in a local directory for
later use.
● YouTube Transcript Extraction:
○ Extracts YouTube video transcripts using the
YouTubeTranscriptApi, which is helpful for converting
spoken content into text that can be processed.
● RAG Response Generation:
○ This is the core function, where the system combines
various types of content (PDF, web text, video transcripts,
image descriptions) into a knowledge base, and the query is
matched with the most relevant content using FAISS. The
combined context is then fed into the Groq model to
generate an answer.
You are importing several libraries for NLP, image processing, web
scraping, PDF handling, and interaction with APIs (like Groq and
YouTube Transcripts). This is good as it covers multiple input types and
use cases.
However, for clarity, you can modularize the imports into separate
sections for better readability.
The setup for the Groq API seems correct. However, ensure that the
API key you are using is valid and that you've tested the Groq-related
functions.
python
Copy code
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
GROQ_API_KEY = 'your_api_key'
client = Groq(api_key=GROQ_API_KEY)
Note: You should avoid sharing sensitive information like API keys
publicly. Always ensure that they are kept safe and encrypted in your
production environment.
3. Helper Functions
4. NLP Functions
This function is the core of your system. It integrates all data sources
(PDFs, images, YouTube, and web pages) and processes them to
generate a response based on user queries.
6. Streamlit Interface
Your Streamlit interface looks good but can be improved for better
user experience and modularity.
Phase 2: Testing
1.13 Developing test cases for the software
Test_cases.xlsx