Data Science with Generative AI outline
Data Science with Generative AI outline
Program Outline
The Data Science course equips learners with skills to extract insights, make data-driven
decisions, and solve real-world problems. Starting with Python fundamentals and statistics, it
covers data manipulation (NumPy, Pandas), visualization (Matplotlib, Seaborn), and database
management (SQL, NoSQL).
Key topics include machine learning (regression, decision trees, SVMs, Random Forests), deep
learning (neural networks, CNNs, RNNs, GANs), and NLP (BERT, GPT). Learners gain
hands-on experience with tools like TensorFlow, PyTorch, Hugging Face, and LangChain,
tackling projects such as customer segmentation, image classification, and text summarization.
The course also explores cutting-edge AI topics like LLMs, prompt engineering, and RAG,
ensuring mastery of the end-to-end data science workflow for a successful career in AI.
Learning Objective
● Develop a strong foundation in Python programming, including data types, loops,
conditionals, functions, and object-oriented programming.
● Understand and work with data structures like lists, tuples, dictionaries, and sets for data
organization and manipulation.
● Gain expertise in advanced Python concepts, including iterators, generators, lambda
functions, and functional programming tools like map, reduce, and filter.
● Learn object-oriented programming principles such as inheritance, polymorphism,
encapsulation, and abstraction, and implement advanced features like decorators and
dunder methods.
● Master file handling, exception handling, debugging, and logging in Python while
exploring multithreading and multiprocessing.
● Use industry-standard libraries such as NumPy, Pandas, Matplotlib, Seaborn, and Plotly
for data analysis and visualization.
● Build RESTful APIs with Flask, including creating, deploying, and securing APIs, along
with user authentication and testing.
● Work with SQL and NoSQL databases, performing CRUD operations, schema design,
optimization, and integration with tools like MongoDB.
● Apply statistical concepts like probability distributions, hypothesis testing, correlation,
and variance using Python to analyze real-world datasets.
● Learn machine learning fundamentals, including supervised and unsupervised learning,
and implement regression, classification, and clustering models.
● Perform exploratory data analysis (EDA) on datasets, identifying trends and insights
through practical case studies and projects.
● Understand and implement linear and logistic regression, regularization techniques, and
evaluation metrics such as R², MSE, and MAE.
● Explore advanced machine learning techniques, including decision trees, SVMs, Naive
Bayes, ensemble methods, and clustering algorithms.
● Dive into anomaly detection and time series analysis, using techniques like ARIMA,
SARIMA, and machine learning models for forecasting and anomaly detection.
● Build and deploy deep learning models for tasks like image classification and sentiment
analysis using frameworks like TensorFlow and PyTorch.
● Gain hands-on experience with NLP techniques, including text preprocessing,
embeddings, and transformer models like BERT and GPT.
● Explore generative AI concepts such as autoencoders, GANs, and large language
models (LLMs) for advanced applications in AI and natural language understanding.
● Learn object detection, image segmentation, and advanced frameworks like YOLO,
Detectron2, and TFOD for building AI-powered solutions.
● Implement and deploy cutting-edge projects like AI assistants, sentiment analysis tools,
and recommendation systems.
● Master fine-tuning, prompt engineering, and Retrieval-Augmented Generation (RAG) for
customizing AI models to specific use cases.
Learning Outcomes:
● Develop a strong foundation in Python programming, including object-oriented
principles, data structures, and file handling for building scalable and efficient
applications.
● Master statistical analysis, probability distributions, and hypothesis testing to analyze
data and draw meaningful insights for decision-making.
● Gain expertise in machine learning techniques, including regression, classification,
ensemble methods, and clustering, while working on real-world datasets and projects.
● Build and deploy deep learning models using frameworks like TensorFlow and PyTorch,
implementing advanced architectures such as CNNs, RNNs, and GANs.
● Design and implement RESTful APIs using Flask to enable seamless communication
between applications and systems.
● Leverage databases, both SQL and NoSQL (e.g., MongoDB), for efficient data storage,
querying, and optimization, with best practices for schema design and indexing.
● Explore and apply data visualization tools such as Matplotlib, Seaborn, and Plotly to
create meaningful and interactive dashboards.
● Understand and utilize modern NLP techniques, including transformer-based models like
BERT and GPT, for text processing, sentiment analysis, and summarization.
● Implement generative AI techniques for creative applications such as text generation,
music composition, and machine translation, while addressing ethical concerns.
● Employ Gen AI tools like Langchain, LlamaIndex, and VectorDB for building cutting-edge
AI-driven applications.
● Design and optimize end-to-end pipelines for machine learning and deep learning
projects, integrating concepts such as feature engineering, model training, and
evaluation.
● Develop and fine-tune large language models (LLMs) for domain-specific applications
and enhance them using prompt engineering and retrieval-augmented generation (RAG)
techniques.
● Build and deploy sophisticated object detection, tracking, and segmentation solutions
using frameworks like YOLO, Detectron2, and TFOD.
● Utilize cloud infrastructure and DevOps tools for deployment and monitoring, ensuring
secure, reliable, and scalable applications.
● Complete industry-relevant projects, such as anomaly detection, recommendation
systems, and API development, showcasing the ability to apply theoretical knowledge to
practical use cases.
Curriculum
Milestone Overview
This milestone introduces Python programming fundamentals, covering core concepts, data
handling, object-oriented programming (OOP), file management, APIs, and databases. It
provides a solid foundation in Python development, culminating in hands-on projects that
simulate real-world applications.
Comprehensive List of Topics:
Project
Project Description
This milestone covers fundamental and advanced statistical concepts essential for data analysis
and decision-making. It includes probability distributions, hypothesis testing, ANOVA,
correlation, and statistical implementations in Python. By the end of this milestone, learners will
gain the ability to apply statistical methods to real-world datasets, perform hypothesis testing,
and make data-driven decisions.
● Statistics Basics – Covers the role of statistics in data science, differentiates between
descriptive and inferential statistics, explores types of data and sampling techniques,
and explains levels of measurement.
● Measures of Central Tendency & Dispersion – Introduces mean, median, mode,
range, variance, and standard deviation, along with Python implementations to analyze
data spread and variability.
● Measure of Symmetry (Skewness) & Variability – Explains skewness (left, right, and
symmetric distributions) and standard deviation, variance, and implementation of spread
calculations in Python.
● Set Theory & Correlation – Covers fundamental set operations, covariance, correlation,
and Python implementations to measure relationships between variables.
● Probability & Random Variables – Explores random variables, probability distributions,
and key concepts like probability mass function (PMF), probability density function
(PDF), and cumulative density function (CDF).
● Discrete & Continuous Probability Distributions – Introduces binomial, Bernoulli,
Poisson, discrete and continuous uniform distributions, and normal (Gaussian)
distribution with real-world applications.
● Central Limit Theorem & Estimation – Explains how sample means approximate
normal distribution, point and interval estimation, confidence intervals, and margin of
error.
● Hypothesis Testing & Statistical Inference – Covers Z-tests, T-tests, hypothesis
testing mechanisms, p-values, Type 1 & Type 2 errors, and when to use T-test vs. Z-test.
● Chi-Square, F-Distribution, & ANOVA – Discusses chi-square tests, Bayes' theorem,
goodness of fit, F-distribution, F-test, ANOVA types, partitioning variance, and their
assumptions.
● Python Implementation for Statistics – Applies statistical methods in Python, including
hypothesis testing, correlation, regression, and ANOVA, to automate data analysis and
decision-making.
Projects Description
● Neural Network & Perceptron – Covers deep learning fundamentals, neural network
architectures, activation functions, loss functions, optimization techniques, forward &
backward propagation, gradient issues, and visualization, with hands-on implementation.
● Deep Learning Frameworks (TensorFlow & PyTorch) – Focuses on installing and
working with TensorFlow and PyTorch, building neural networks, debugging using
TensorBoard and Netron, cloud training in Colab Pro, and implementing a sentiment
analysis project.
● CNN & Image Classification – Explores convolutional layers, pooling, feature
extraction, training CNNs from scratch, and deploying image classification models as
web apps.
● Advanced CNN Architectures – Covers deep CNN variants (LeNet-5, AlexNet,
GoogLeNet, VGGNet, ResNet, Inception), transfer learning, and visualization.
● Object Detection: RCNN & YOLO – Introduces region-based CNN (Fast & Faster
RCNN) and real-time object detection using YOLOv9, covering data annotation, model
training, and inference.
● Object Detection: Detectron2 & TFOD2 – Focuses on implementing Detectron2 and
TensorFlow Object Detection API (TFOD2) for advanced object detection tasks.
● Image Segmentation & Instance Segmentation – Covers scene understanding, types
of segmentation, Mask R-CNN, and transitioning from bounding boxes to polygon
masks.
● Object Tracking – Introduces dataset annotation, Kalman filters, YOLO-based tracking,
and DeepSORT for real-time object tracking applications.
● Generative Adversarial Networks (GANs) – Covers GAN architecture, discriminator
and generator networks, WGANs, DCGANs, StyleGANs, and synthetic data generation.
● Develop a strong foundation in neural networks and deep learning frameworks like
TensorFlow & PyTorch.
● Implement CNNs, RCNN, YOLO, and Detectron2 for image classification, object
detection, and segmentation.
● Work with Mask R-CNN for instance segmentation and DeepSORT for object tracking.
● Train GANs for synthetic data generation and creative AI applications.
● Gain hands-on experience through real-world projects in computer vision and deep
learning.
Project Description
Milestone description
This milestone introduces learners to Natural Language Processing (NLP) and covers essential
techniques for text processing, useful NLP libraries, and advanced models like attention
mechanisms and transformers. Learners will gain hands-on experience in building models for
text categorization, emotion detection, and text summarization using popular frameworks and
architectures.
● NLP Introduction & Text Processing – Covers the fundamentals of NLP, computational
linguistics, text processing techniques, regex, tokenization, normalization, word
embeddings (Word2Vec, Doc2Vec), and vector-based representations, with hands-on
implementation in news categorization.
● Useful NLP Libraries & Networks – Focuses on NLP libraries (NLTK, SpaCy, TextBlob,
Stanford NLP), neural network architectures (RNNs, LSTMs, Bi-LSTMs, GRUs), and
real-world NLP model implementations, with an emotion detection project using
Bi-LSTM.
● Attention-Based Models & Transfer Learning – Introduces sequence-to-sequence
models, attention mechanisms, self-attention, transformers (BERT, GPT-2), and their
applications in advanced NLP tasks like text summarization.
● Generative AI Intro & Text Generation – Covers the fundamentals of generative AI,
including probabilistic modeling, autoencoders, variational autoencoders (VAEs), and
GPT models for text generation. Introduces key challenges like controlling AI-generated
content and ensuring coherent outputs, with hands-on implementation in
autoencoder-based poetry generation.
● Generative AI for Machine Translation – Explores traditional and deep learning-based
translation methods (SMT, NMT), attention mechanisms, and GPTs for machine
translation. Covers AI applications in multilingual text generation, poetry, music
composition, and ethical considerations in creative AI, with a project on ethical bias
detection in machine translation.
● LLM & Langchain Framework – Introduces Large Language Models (LLMs), their
industry applications, and integrations with LangChain and LlamaIndex for structured
data-driven AI assistants.
● Vector Databases (Vector DB) – Covers vector databases, their importance, types
(e.g., ChromaDB), and implementation for AI-powered search and retrieval systems.
● Hugging Face & Ollama – Explores Hugging Face’s pre-trained models, real-world
applications, and Ollama’s role in creating and fine-tuning custom LLM models.
● Prompt Engineering & Retrieval- Augmented Generation (RAG) – Focuses on crafting
effective prompts, advanced RAG techniques, and frameworks like LangChain for
AI-powered text summarization, QA, and creative writing.
● Fine-Tuning LLMs – Covers different fine-tuning approaches for LLMs, including
real-world applications, domain-specific adaptations, and optimizing AI-generated
outputs.
Projects
Project Description