0% found this document useful (0 votes)
13 views

Data Science with Generative AI outline

The Data Science with Generative AI program covers essential skills in Python, statistics, machine learning, and deep learning, providing hands-on experience with tools like TensorFlow and PyTorch. Key topics include data manipulation, visualization, and advanced AI techniques such as NLP and generative models. The curriculum is structured into milestones, each focusing on specific areas of data science, culminating in practical projects that apply theoretical knowledge to real-world scenarios.

Uploaded by

anubhavsagar021
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Data Science with Generative AI outline

The Data Science with Generative AI program covers essential skills in Python, statistics, machine learning, and deep learning, providing hands-on experience with tools like TensorFlow and PyTorch. Key topics include data manipulation, visualization, and advanced AI techniques such as NLP and generative models. The curriculum is structured into milestones, each focusing on specific areas of data science, culminating in practical projects that apply theoretical knowledge to real-world scenarios.

Uploaded by

anubhavsagar021
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Data Science with Generative AI

Program Outline
The Data Science course equips learners with skills to extract insights, make data-driven
decisions, and solve real-world problems. Starting with Python fundamentals and statistics, it
covers data manipulation (NumPy, Pandas), visualization (Matplotlib, Seaborn), and database
management (SQL, NoSQL).

Key topics include machine learning (regression, decision trees, SVMs, Random Forests), deep
learning (neural networks, CNNs, RNNs, GANs), and NLP (BERT, GPT). Learners gain
hands-on experience with tools like TensorFlow, PyTorch, Hugging Face, and LangChain,
tackling projects such as customer segmentation, image classification, and text summarization.

The course also explores cutting-edge AI topics like LLMs, prompt engineering, and RAG,
ensuring mastery of the end-to-end data science workflow for a successful career in AI.

Learning Objective
● Develop a strong foundation in Python programming, including data types, loops,
conditionals, functions, and object-oriented programming.
● Understand and work with data structures like lists, tuples, dictionaries, and sets for data
organization and manipulation.
● Gain expertise in advanced Python concepts, including iterators, generators, lambda
functions, and functional programming tools like map, reduce, and filter.
● Learn object-oriented programming principles such as inheritance, polymorphism,
encapsulation, and abstraction, and implement advanced features like decorators and
dunder methods.
● Master file handling, exception handling, debugging, and logging in Python while
exploring multithreading and multiprocessing.
● Use industry-standard libraries such as NumPy, Pandas, Matplotlib, Seaborn, and Plotly
for data analysis and visualization.
● Build RESTful APIs with Flask, including creating, deploying, and securing APIs, along
with user authentication and testing.
● Work with SQL and NoSQL databases, performing CRUD operations, schema design,
optimization, and integration with tools like MongoDB.
● Apply statistical concepts like probability distributions, hypothesis testing, correlation,
and variance using Python to analyze real-world datasets.
● Learn machine learning fundamentals, including supervised and unsupervised learning,
and implement regression, classification, and clustering models.
● Perform exploratory data analysis (EDA) on datasets, identifying trends and insights
through practical case studies and projects.
● Understand and implement linear and logistic regression, regularization techniques, and
evaluation metrics such as R², MSE, and MAE.
● Explore advanced machine learning techniques, including decision trees, SVMs, Naive
Bayes, ensemble methods, and clustering algorithms.
● Dive into anomaly detection and time series analysis, using techniques like ARIMA,
SARIMA, and machine learning models for forecasting and anomaly detection.
● Build and deploy deep learning models for tasks like image classification and sentiment
analysis using frameworks like TensorFlow and PyTorch.
● Gain hands-on experience with NLP techniques, including text preprocessing,
embeddings, and transformer models like BERT and GPT.
● Explore generative AI concepts such as autoencoders, GANs, and large language
models (LLMs) for advanced applications in AI and natural language understanding.
● Learn object detection, image segmentation, and advanced frameworks like YOLO,
Detectron2, and TFOD for building AI-powered solutions.
● Implement and deploy cutting-edge projects like AI assistants, sentiment analysis tools,
and recommendation systems.
● Master fine-tuning, prompt engineering, and Retrieval-Augmented Generation (RAG) for
customizing AI models to specific use cases.

Learning Outcomes:
● Develop a strong foundation in Python programming, including object-oriented
principles, data structures, and file handling for building scalable and efficient
applications.
● Master statistical analysis, probability distributions, and hypothesis testing to analyze
data and draw meaningful insights for decision-making.
● Gain expertise in machine learning techniques, including regression, classification,
ensemble methods, and clustering, while working on real-world datasets and projects.
● Build and deploy deep learning models using frameworks like TensorFlow and PyTorch,
implementing advanced architectures such as CNNs, RNNs, and GANs.
● Design and implement RESTful APIs using Flask to enable seamless communication
between applications and systems.
● Leverage databases, both SQL and NoSQL (e.g., MongoDB), for efficient data storage,
querying, and optimization, with best practices for schema design and indexing.
● Explore and apply data visualization tools such as Matplotlib, Seaborn, and Plotly to
create meaningful and interactive dashboards.
● Understand and utilize modern NLP techniques, including transformer-based models like
BERT and GPT, for text processing, sentiment analysis, and summarization.
● Implement generative AI techniques for creative applications such as text generation,
music composition, and machine translation, while addressing ethical concerns.
● Employ Gen AI tools like Langchain, LlamaIndex, and VectorDB for building cutting-edge
AI-driven applications.
● Design and optimize end-to-end pipelines for machine learning and deep learning
projects, integrating concepts such as feature engineering, model training, and
evaluation.
● Develop and fine-tune large language models (LLMs) for domain-specific applications
and enhance them using prompt engineering and retrieval-augmented generation (RAG)
techniques.
● Build and deploy sophisticated object detection, tracking, and segmentation solutions
using frameworks like YOLO, Detectron2, and TFOD.
● Utilize cloud infrastructure and DevOps tools for deployment and monitoring, ensuring
secure, reliable, and scalable applications.
● Complete industry-relevant projects, such as anomaly detection, recommendation
systems, and API development, showcasing the ability to apply theoretical knowledge to
practical use cases.

Tools & Technologies


● Python: Core programming language for development and automation.
● Python IDEs: PyCharm, VS Code, Jupyter Notebook, Google Colab, Deepnote
● Data Analysis & Visualization: NumPy, Pandas, Matplotlib, Seaborn, Plotly, Bokeh.
● Databases: MySQL, MongoDB
● API Development & Testing: Flask, Postman
● Machine Learning & Deep Learning:
○ Frameworks: Scikit-learn, TensorFlow, Keras, PyTorch.
○ Libraries: XGBoost, CatBoost, Hugging Face, OpenCV.
● Computer Vision: YOLOv9, Mask R-CNN, Detectron2, OpenCV, Roboflow.
● Natural Language Processing (NLP):
○ Libraries: NLTK, spaCy, TextBlob.
○ Tools: LangChain, LlamaIndex, Hugging Face Hub.
● Generative AI:
○ Frameworks: GPTs, Autoencoders, LangChain, ChromaDB.
○ Deployment Tools: Streamlit

Curriculum

Milestone 1: Python | Week 1-7

Milestone Overview
This milestone introduces Python programming fundamentals, covering core concepts, data
handling, object-oriented programming (OOP), file management, APIs, and databases. It
provides a solid foundation in Python development, culminating in hands-on projects that
simulate real-world applications.
Comprehensive List of Topics:

● Python Basics: Covers Python's features, variables, data types, operators,


expressions, and control flow statements like loops and conditionals.
● Data Types & Structures: Explores strings, lists, tuples, dictionaries, and sets, focusing
on their properties, manipulation techniques, and practical applications.
● Functions: Introduces function definition, calling, arguments, return values, lambda
functions, scope, and recursion for writing reusable and efficient code.
● Object-Oriented Programming (OOPs): Covers classes, objects, inheritance,
polymorphism, encapsulation, and abstraction to implement modular and reusable code
structures.
● Files, Exception Handling, Logging, & Memory Management: Focuses on
reading/writing files, handling exceptions, implementing logging for debugging, and
understanding garbage collection and memory optimization.
● Data Toolkit: Introduces NumPy for numerical computations, Pandas for data handling,
and Matplotlib/Seaborn for data visualization techniques.
● RESTful API & Flask: Covers API concepts, building lightweight web applications with
Flask, creating RESTful API endpoints, and implementing authentication mechanisms.
● Databases & MongoDB: Explores SQL vs NoSQL databases, working with
SQLite/MySQL, understanding MongoDB, and performing CRUD operations efficiently.

Expected Learning Outcomes


● Gain proficiency in Python programming fundamentals.
● Understand data structures and their applications.
● Develop modular and reusable code using functions.
● Implement object-oriented principles for efficient coding.
● Handle files, exceptions, and logs proficiently.
● Use NumPy and Pandas for data analysis.
● Build and deploy RESTful APIs using Flask.
● Work with SQL and NoSQL databases, including MongoDB.

Project

Project Description

This project automates extracting product


reviews, ratings, and comments from Flipkart
Advance Review Scraper
to analyze customer sentiment and product
performance.
Milestone 2: Statistics | Week 8-10
Milestone description

This milestone covers fundamental and advanced statistical concepts essential for data analysis
and decision-making. It includes probability distributions, hypothesis testing, ANOVA,
correlation, and statistical implementations in Python. By the end of this milestone, learners will
gain the ability to apply statistical methods to real-world datasets, perform hypothesis testing,
and make data-driven decisions.

Comprehensive List of Topics:

● Statistics Basics – Covers the role of statistics in data science, differentiates between
descriptive and inferential statistics, explores types of data and sampling techniques,
and explains levels of measurement.
● Measures of Central Tendency & Dispersion – Introduces mean, median, mode,
range, variance, and standard deviation, along with Python implementations to analyze
data spread and variability.
● Measure of Symmetry (Skewness) & Variability – Explains skewness (left, right, and
symmetric distributions) and standard deviation, variance, and implementation of spread
calculations in Python.
● Set Theory & Correlation – Covers fundamental set operations, covariance, correlation,
and Python implementations to measure relationships between variables.
● Probability & Random Variables – Explores random variables, probability distributions,
and key concepts like probability mass function (PMF), probability density function
(PDF), and cumulative density function (CDF).
● Discrete & Continuous Probability Distributions – Introduces binomial, Bernoulli,
Poisson, discrete and continuous uniform distributions, and normal (Gaussian)
distribution with real-world applications.
● Central Limit Theorem & Estimation – Explains how sample means approximate
normal distribution, point and interval estimation, confidence intervals, and margin of
error.
● Hypothesis Testing & Statistical Inference – Covers Z-tests, T-tests, hypothesis
testing mechanisms, p-values, Type 1 & Type 2 errors, and when to use T-test vs. Z-test.
● Chi-Square, F-Distribution, & ANOVA – Discusses chi-square tests, Bayes' theorem,
goodness of fit, F-distribution, F-test, ANOVA types, partitioning variance, and their
assumptions.
● Python Implementation for Statistics – Applies statistical methods in Python, including
hypothesis testing, correlation, regression, and ANOVA, to automate data analysis and
decision-making.

Expected Learning Outcomes

● Understand fundamental and advanced statistical concepts.


● Perform data sampling, distribution analysis, and statistical inference.
● Implement statistical methods such as hypothesis testing, ANOVA, and correlation.
● Apply probability distributions to model real-world data scenarios.
● Gain proficiency in Python for statistical analysis and visualization.
● Develop a strong foundation in data-driven decision-making using statistics.

Milestone 3: Machine Learning | Week 11-20


Milestone description
This milestone provides a comprehensive journey through Machine Learning (ML), covering
fundamental concepts, exploratory data analysis (EDA), regression and classification models,
ensemble techniques, dimensionality reduction, clustering, anomaly detection, and time series
forecasting. Learners will gain hands-on experience in building, evaluating, and optimizing ML
models using Python.

Comprehensive List of Topics

● Introduction to Machine Learning – Covers the fundamentals of ML, including AI vs


ML vs DL vs DS, types of ML, data partitioning (train, test, validation), and concepts like
overfitting, underfitting, and bias-variance tradeoff.
● Feature Engineering & Data Preprocessing – Focuses on handling missing data,
imbalanced datasets, outliers, feature extraction, scaling, and encoding techniques to
improve model performance, with hands-on implementation.
● Exploratory Data Analysis (EDA) – Covers data visualization, statistical analysis, and
extracting meaningful insights from datasets, including projects like Flight Price
Prediction and Travel Data Analysis.
● Linear Regression & Evaluation Metrics – Introduces simple and multiple linear
regression, polynomial regression, evaluation metrics (R², Adjusted R², MSE, MAE,
RMSE), ML pipelines, and model deployment.
● Logistic Regression & Classification Metrics – Covers logistic regression,
regularization, cross-validation, hyperparameter tuning, classification metrics (Precision,
Recall, F1-score, ROC-AUC), and an Email Spam Classification project.
● Decision Trees & Random Forest – Explains decision tree classification, pruning
techniques, random forest classifier and regressor, and includes a Car Evaluation
project.
● Support Vector Machines (SVM) & Naive Bayes – Covers SVM soft/hard margins,
kernel tricks, Naive Bayes classifiers, and their variants, with an emergency survival
prediction project.
● Ensemble Learning: Bagging & Boosting – Introduces ensemble learning with
bagging (Random Forest) and boosting techniques (AdaBoost, Gradient Boosting,
XGBoost), with a Sentiment Analysis project.
● KNN & Stacking Models – Covers K-Nearest Neighbors for classification and
regression, stacking techniques, meta-learners, and an Energy Consumption
Forecasting project using stacked models.
● Dimensionality Reduction & Clustering – Introduces feature selection, PCA, and
clustering methods (K-Means, Hierarchical, DBSCAN) with a Movie Recommendation
Clustering project.
● Anomaly Detection & Time Series Forecasting – Covers isolation forest, local outlier
factors, DBSCAN for anomalies, time series forecasting (AR, MA, ARIMA, SARIMA,
SARIMAX), and a Stock Market Anomaly Detection project.

Expected Learning Outcomes


● Understand the fundamentals of machine learning and its key applications.
● Perform data preprocessing, feature engineering, and EDA for better model training.
● Implement regression, classification, clustering, and ensemble models in Python.
● Optimize models using hyperparameter tuning and cross-validation.
● Work with dimensionality reduction, anomaly detection, and time series forecasting.
● Gain hands-on experience through real-world projects in multiple domains.

Projects Description

A wafer is a thin semiconductor slice (e.g.,


c-Si) used in ICs and solar cells. It undergoes
Sensor Fault Detection
microfabrication (doping, etching, deposition,
photolithography) before dicing and
packaging into integrated circuits.

The Shipment Price Prediction project uses


advanced machine learning to accurately
Shipment Price Prediction forecast shipping costs, helping businesses
optimize logistics, reduce expenses, and
enhance efficiency.

This project builds a machine learning system


to classify text as spam or ham, enhancing
Spam detection email and SMS filtering. It helps detect
unwanted messages for improved
communication security.

This project develops a machine learning


model to predict maximum visibility based on
Visibility distance prediction weather and geographical factors. It
enhances safety and efficiency in aviation,
transportation, and outdoor activities.

Milestone 4: Deep Learning | Week: 21-29


Milestone description

This milestone provides a comprehensive deep learning curriculum, covering foundational


concepts, neural network architectures, image classification, object detection, segmentation,
tracking, and generative models. Learners will gain hands-on experience with TensorFlow,
PyTorch, CNNs, YOLO, Detectron2, TFOD2, and GANs, developing state-of-the-art AI models
for various real-world applications.

Comprehensive List of Topics

● Neural Network & Perceptron – Covers deep learning fundamentals, neural network
architectures, activation functions, loss functions, optimization techniques, forward &
backward propagation, gradient issues, and visualization, with hands-on implementation.
● Deep Learning Frameworks (TensorFlow & PyTorch) – Focuses on installing and
working with TensorFlow and PyTorch, building neural networks, debugging using
TensorBoard and Netron, cloud training in Colab Pro, and implementing a sentiment
analysis project.
● CNN & Image Classification – Explores convolutional layers, pooling, feature
extraction, training CNNs from scratch, and deploying image classification models as
web apps.
● Advanced CNN Architectures – Covers deep CNN variants (LeNet-5, AlexNet,
GoogLeNet, VGGNet, ResNet, Inception), transfer learning, and visualization.
● Object Detection: RCNN & YOLO – Introduces region-based CNN (Fast & Faster
RCNN) and real-time object detection using YOLOv9, covering data annotation, model
training, and inference.
● Object Detection: Detectron2 & TFOD2 – Focuses on implementing Detectron2 and
TensorFlow Object Detection API (TFOD2) for advanced object detection tasks.
● Image Segmentation & Instance Segmentation – Covers scene understanding, types
of segmentation, Mask R-CNN, and transitioning from bounding boxes to polygon
masks.
● Object Tracking – Introduces dataset annotation, Kalman filters, YOLO-based tracking,
and DeepSORT for real-time object tracking applications.
● Generative Adversarial Networks (GANs) – Covers GAN architecture, discriminator
and generator networks, WGANs, DCGANs, StyleGANs, and synthetic data generation.

Expected Learning Outcomes

● Develop a strong foundation in neural networks and deep learning frameworks like
TensorFlow & PyTorch.
● Implement CNNs, RCNN, YOLO, and Detectron2 for image classification, object
detection, and segmentation.
● Work with Mask R-CNN for instance segmentation and DeepSORT for object tracking.
● Train GANs for synthetic data generation and creative AI applications.
● Gain hands-on experience through real-world projects in computer vision and deep
learning.

Project Description

This project builds an audio binary


classification model to classify speech. Users
can input an audio file to receive a
Audio Classification System
spectrogram image and target label. The
dataset includes 6,000 cat and dog audio
files.

This project uses computer vision to detect


helmets in real-time, enhancing safety
Helmet detection compliance in workplaces and traffic
monitoring. It ensures accurate identification
to prevent violations and accidents.

This project develops an AI-driven text


summarizer that automatically condenses
AI Driven Text Summarizer long texts while preserving key information. It
enhances readability and efficiency for users
by generating concise summaries.

This project develops an object detection


system using AI to identify and locate objects
Object Detection in images or videos. It enhances automation,
security, and real-time monitoring
applications.

Milestone 5: NLP | Week: 30-32

Milestone description
This milestone introduces learners to Natural Language Processing (NLP) and covers essential
techniques for text processing, useful NLP libraries, and advanced models like attention
mechanisms and transformers. Learners will gain hands-on experience in building models for
text categorization, emotion detection, and text summarization using popular frameworks and
architectures.

Comprehensive List of Topics

● NLP Introduction & Text Processing – Covers the fundamentals of NLP, computational
linguistics, text processing techniques, regex, tokenization, normalization, word
embeddings (Word2Vec, Doc2Vec), and vector-based representations, with hands-on
implementation in news categorization.
● Useful NLP Libraries & Networks – Focuses on NLP libraries (NLTK, SpaCy, TextBlob,
Stanford NLP), neural network architectures (RNNs, LSTMs, Bi-LSTMs, GRUs), and
real-world NLP model implementations, with an emotion detection project using
Bi-LSTM.
● Attention-Based Models & Transfer Learning – Introduces sequence-to-sequence
models, attention mechanisms, self-attention, transformers (BERT, GPT-2), and their
applications in advanced NLP tasks like text summarization.

Expected Learning Outcomes

● Understand key concepts and techniques in Natural Language Processing (NLP).


● Be proficient in using NLP libraries such as NLTK, SpaCy, and TextBlob for text
processing and analysis.
● Learn to build and train sequential models (RNN, LSTM, GRU) for text-based tasks.
● Implement transformer-based models like BERT and GPT-2 for advanced NLP
applications.
● Gain hands-on experience in solving real-world NLP problems, including news
categorization, emotion detection, and text summarization.
Milestone 6: Generative AI | Week: 33-36
Milestone description
This milestone introduces learners to Generative AI, focusing on text generation, machine
translation, large language models (LLMs), Langchain, vector databases, Hugging Face, prompt
engineering, retrieval-augmented generation (RAG), and fine-tuning. Through hands-on
projects, learners will build AI-powered assistants, content retrieval systems, and creative
writing models while gaining expertise in modern AI frameworks and ethical considerations.

Comprehensive List of Topics

● Generative AI Intro & Text Generation – Covers the fundamentals of generative AI,
including probabilistic modeling, autoencoders, variational autoencoders (VAEs), and
GPT models for text generation. Introduces key challenges like controlling AI-generated
content and ensuring coherent outputs, with hands-on implementation in
autoencoder-based poetry generation.
● Generative AI for Machine Translation – Explores traditional and deep learning-based
translation methods (SMT, NMT), attention mechanisms, and GPTs for machine
translation. Covers AI applications in multilingual text generation, poetry, music
composition, and ethical considerations in creative AI, with a project on ethical bias
detection in machine translation.
● LLM & Langchain Framework – Introduces Large Language Models (LLMs), their
industry applications, and integrations with LangChain and LlamaIndex for structured
data-driven AI assistants.
● Vector Databases (Vector DB) – Covers vector databases, their importance, types
(e.g., ChromaDB), and implementation for AI-powered search and retrieval systems.
● Hugging Face & Ollama – Explores Hugging Face’s pre-trained models, real-world
applications, and Ollama’s role in creating and fine-tuning custom LLM models.
● Prompt Engineering & Retrieval- Augmented Generation (RAG) – Focuses on crafting
effective prompts, advanced RAG techniques, and frameworks like LangChain for
AI-powered text summarization, QA, and creative writing.
● Fine-Tuning LLMs – Covers different fine-tuning approaches for LLMs, including
real-world applications, domain-specific adaptations, and optimizing AI-generated
outputs.

Expected Learning Outcomes

● Gain expertise in generative AI techniques for text generation, translation, and


storytelling.
● Learn advanced NLP models, including GPT, VAEs, transformers, and multimodal
models.
● Work with LLMs and LangChain for context-aware AI applications.
● Implement vector databases (VectorDB) for intelligent search and retrieval.
● Utilize Hugging Face & Ollama for deploying generative AI models.
● Master prompt engineering and RAG for efficient AI responses.
● Learn fine-tuning techniques for customizing LLMs for specific applications.

Projects

Project Description

This project builds an AI-powered system to


detect and interpret sign language from
Sign Language Detection
images or videos. It enhances communication
accessibility for the hearing impaired.

This project develops a Named Entity


Recognition (NER) system to identify and
classify entities like names, dates, and
Named Entity Recognition (NER)
locations in text. It enhances information
extraction and text analysis for various
applications.

This project develops a tailored chatbot using


GPT, designed to provide domain-specific
AI-Powered Custom Chatbot responses and personalized interactions. It
enhances user experience across various
applications.

This project develops GeneAI, a smart voice


assistant similar to Alexa, enabling seamless
GeneAI – Intelligent Voice Assistant voice interactions and task automation. It
enhances user convenience through
AI-driven responses and commands.

This project develops a system that


generates images and captions based on
AI-Powered Image and Caption Generator user prompts using advanced AI models. It
enhances creativity and automation in
content generation.

You might also like