Syllabus_7th_sem_pages_deleted
Syllabus_7th_sem_pages_deleted
UNIT – I
Data Driven Organizations & Elements of Data:
Data-driven decisions, data pipeline infrastructure for data-driven decisions, role of the data engineer in
data-driven organizations, Modern data strategies, Introduction to elements of Data, the five Vs of data –
volume, velocity, variety, veracity, and value, Variety – data types & data sources, Activities to improve
veracity and value.
UNIT – II
Design Principles and Patterns for Data Pipelines
The evolution of data architectures, Modern data architecture on various cloud platforms, Modern data
architecture pipeline - Ingestion and storage, Modern data architecture pipeline - Processing and
Consumption, Streaming analytics pipeline
Securing and Scaling the Data Pipeline:
Cloud security, Security of analytics workloads, ML security, Scaling Data Pipeline, creating a scalable
infrastructure, Creating scalable components.
UNIT – III
Ingesting and Preparing Data:
ETL and ELT comparison, Data wrangling, Data Discovery, Data structuring, Data Cleaning, Data
enriching, Data validating, Data publishing
Ingesting by Batch or by Stream
Comparing batch and stream ingestion, Batch ingestion processing, Purpose-built data ingestion tools,
Scaling considerations for batch processing, stream processing, Scaling considerations for stream
processing, Ingesting IoT data by stream
UNIT – IV
Storing and Organizing Data
Storage in the modern data architecture, Data Lake storage, Data warehouse storage, Purpose-built
databases, Storage in support of the pipeline, Securing storage.
Processing Big Data
Big data processing concepts, Apache Hadoop, Apache Spark, Amazon EMR
UNIT – V
Processing Data for ML & Automating the Pipeline:
ML Concepts, ML Lifecycle, Framing the ML problem to meet the business goal, Collecting data, Applying
labels to training data with known targets, Pre-processing data, Feature engineering, Developing a model,
Deploying a model, ML infrastructure on AWS, AWS SageMaker, Automating the Pipeline, Automating
infrastructure deployment, CI/CD, Automating with Step Functions.
List of Experiments:
● 7 - 10 experiments to be framed as per the syllabus.
Recommended Books:
1. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable
Systems, by Martin Kleppmann
2. T-SQL Querying (Developer Reference) by Itzik Ben-Gan
3. The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling by Margy Ross
4. Spark: The Definitive Guide: Big Data Processing Made Simple by Bill Chambers
5. Data Pipelines with Apache Airflow by Bas P. Harenslak
6. Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing by Tyler
Akidau
7. Kubernetes in Action by Marko Luksa
RAJIV GANDHI PROUDYOGIKI VISHWAVIDYALAYA, BHOPAL
Unit 1: Data Definitions and Analysis Techniques: Elements, Variables, and Data categorization Levels of
Measurement Data management and indexing Introduction to Statistical Concepts: Sampling Distributions,
Resampling, Statistical Inference and Descriptive Statistics, Measures of central tendency, Measures of
location of dispersions
Unit 2: Advance Data analysis techniques: Statistical hypothesis generation and testing, Chi-Square test, t-
Test, Analysis of variance, Correlation analysis, Maximum likelihood test, Regression Modelling,
Multivariate Analysis, Bayesian Modelling, Inference and Bayesian Network, Regression analysis
Unit 3: Data Wrangling: Intro to Data Wrangling, Gathering Data, Assessing Data, Cleaning Data. Data
Visualization in Data Analysis: Design of Visualizations, Univariate Exploration of Data, Bivariate
Exploration of Data, Multivariate Exploration of Data, Explanatory Visualizations.
Unit 4: Data Ecosystem: Overview of the Data Analyst Ecosystem, Types of Data, Understanding Different
Types of File Formats, Sources of Data, Overview of Data Repositories, NoSQL, Data Marts, Data Lakes,
ETL, and Data Pipelines, Foundations of Big Data, Big Data processing tools such as Hadoop, Hadoop
Distributed File System (HDFS), Hive, and Spark
Unit 5: Data Visualization tools: Python visualization libraries (matplotlib, pandas, seaborn, ggplot, plotly),
Introduction to PowerBI tools, Examples of inspiring (industry) projects- Exercise: create your own
visualization of a complex dataset.
Text Books/References:
1. Joel Grus, Data Science from Scratch, Shroff Publisher Publisher /O’Reilly Publisher Media
2. Annalyn Ng, Kenneth Soo, Numsense! Data Science for the Layman, Shroff Publisher Publisher
3. Cathy O’Neil and Rachel Schutt. Doing Data Science, Straight Talk from The Frontline. O’Reilly
Publisher Media.
4. Jure Leskovek, Anand Rajaraman and Jeffrey Ullman. Mining of Massive Datasets. v2.1,
Cambridge University Press.
5. Jake VanderPlas, Python Data Science Handbook, Shroff Publisher Publisher /O’Reilly Publisher
Media 6. Philipp Janert, Data Analysis with Open Source Tools, Shroff Publisher Publisher
/O’Reilly Publisher Media.
RAJIV GANDHI PROUDYOGIKI VISHWAVIDYALAYA, BHOPAL
Unit 1
Introduction to Computer Vision and Basic Concepts of Image Formation: Introduction and Goals of
Computer Vision and Image Processing, Image Formation Concepts. Fundamental Concepts of Image
Formation: Radiometry, Geometric Transformations, Geometric Camera Models.
Unit-2
Fundamental Concepts of Image Formation: Camera Calibration, Image Formation in a Stereo Vision
Setup, Image Reconstruction from a Series of Projections. Image Processing Concepts: Image Transforms.
Unit 3
Image Processing Concepts: Image Transforms, Image Enhancement. Image Processing Concepts: Image
Filtering, Colour Image Processing, Image Segmentation. Image Descriptors and Features: Texture
Descriptors, Colour Features, Edges/Boundaries.
Unit 4.
Image Descriptors and Features: Object Boundary and Shape Representations. Image Descriptors and
Features: Interest or Corner Point Detectors, Histogram of Oriented Gradients, Scale Invariant Feature
Transform, Speeded up Robust Features, Saliency
Unit 5
Applications of Computer Vision: Artificial Neural Network for Pattern Classification, Convolutional
Neural Networks, Autoencoders. Applications of Computer Vision: Gesture Recognition, Motion
Estimation and Object Tracking, Programming Assignments.
Lab experiments
4. Write a python program for image enhancement.
5. Write a python program to perform compression operation on the input image.
6. Write a python program for color image processing on the input image.
7. Write a python program to perform image segmentation operation.
8. Write a python program to perform image morphology operation on the image.
9. Write a python program for Image Restoration operation.
10. Write a python program to implement Scaling, Rotating, Shifting and Edge Detection operations on
input image.
11. Write a program for object tracking using Open CV