0% found this document useful (0 votes)

4 views6 pages

5-Day KVCET Bootcamp - Data Analytics

The 5-day KVCET Bootcamp on Data Analytics offers a hands-on, industry-aligned curriculum focused on transforming raw data into actionable insights. Participants will learn data gathering, cleaning, engineering, and automation techniques using real-world applications and tools like Python, SQL, and Apache Kafka. The program culminates in a final project where students build a complete data pipeline, integrating all learned skills to generate insights from large datasets.

Uploaded by

solomon raju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views6 pages

5-Day KVCET Bootcamp - Data Analytics

Uploaded by

solomon raju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

5-day KVCET Bootcamp - Data Analytics

MODE OF DELIVERY:
Offline (in-campus)

METHODOLOGY:
Hands-on & industry-aligned

TENTATIVE PROGRAM DURATION:

5 days

INDICATIVE CURRICULA:

ro
Course Title: "Data Mastery: From Raw Data to Actionable Insights"

This 30-hour course will take students through the entire process of data gathering,
cleaning, and engineering, covering essential techniques to transform raw data into
structured, high-quality datasets ready for analysis or machine learning. The course will
st
focus on real-world applications, using hands-on assignments and advanced industry tools
to prepare students for data-related challenges in the industry.

Unit 1: Data Gathering - Collecting Data from Diverse Sources (6 hours)

Lu
Objective: Introduce students to the different methods of gathering structured and
unstructured data, and prepare them for handling data from a variety of real-world
sources.

Sub-unit 1.1: Introduction to Data Sources and Collection Methods (2 hours)

● Topics:
○ Different Types of Data: Structured, semi-structured, and unstructured
○ APIs: Using REST APIs to collect data (e.g., public datasets, third-party
services)
○ Web Scraping: Extracting data from websites using tools like BeautifulSoup,
Scrapy, and Selenium
○ Unstructured Data: Gathering data from non-structured sources like PDFs,
emails, or social media
○ Cloud and Streaming Data: Collecting real-time data using cloud-based
services (AWS, Google Cloud, and APIs)
● Real-World Assignment:
○ Task: Write a script that scrapes product data (e.g., name, price, and reviews)
from an e-commerce website and stores it in a structured CSV or JSON file.
○ Deliverables: A Python script that scrapes real-time data from a website
using BeautifulSoup and saves it in the required format.
● Test:
○ Practical: Scrape data from a website and ensure that it's structured in the
correct format (CSV or JSON).
○ Theory: Multiple-choice questions on types of data and methods of collection.

Sub-unit 1.2: Real-Time Data Collection and Cloud Integration (4 hours)

● Topics:

ro
○ Real-Time Data Streaming: Introduction to Apache Kafka, AWS Kinesis, and
Google Cloud Pub/Sub
○ Working with REST APIs: Collecting real-time data using API endpoints for
live updates
○ Integrating with Cloud Data Platforms: Collecting and storing data in
cloud-based databases (e.g., AWS DynamoDB, Google BigQuery)
st
● Real-World Assignment:
○ Task: Set up a real-time data pipeline using Kafka or AWS Kinesis to stream
data from an API (e.g., Twitter) and store it in MongoDB or Google
BigQuery.
○ Deliverables: A script that sets up real-time streaming and stores data in a
Lu
cloud database.
● Test:
○ Practical: Create a real-time data pipeline that collects and stores data from
a public API and demonstrates handling real-time data streams.
○ Theory: Questions on real-time data collection and cloud-based data
integration.
In

Unit 2: Data Cleaning - Transforming Raw Data into Structured Format

(10 hours)

Objective: Equip students with the skills to clean, preprocess, and prepare raw data
for analysis or machine learning.

Sub-unit 2.1: Handling Missing Data and Outliers (4 hours)

● Topics:
○ Techniques for Handling Missing Data: Imputation methods (mean, median,
mode, KNN imputation)
○ Dealing with Duplicates: Identifying and removing duplicates in datasets
○ Outlier Detection and Treatment: Techniques for identifying and removing
outliers (IQR, Z-Score)
○ Visualizing Missing Data: Using Python tools like Seaborn and Matplotlib to
visualize missing data
● Real-World Assignment:
○ Task: Clean a messy dataset with missing values and outliers (e.g., a real
estate or customer dataset). Apply imputation and remove outliers where
appropriate.
○ Deliverables: A cleaned dataset with imputation, outlier removal, and
visualizations showing the impact of these changes.

ro
● Test:
○ Practical: Clean a given dataset by handling missing values and outliers
using different techniques.
○ Theory: Multiple-choice questions on handling missing data and outliers.
st
Sub-unit 2.2: Data Transformation and Feature Engineering (6 hours)

● Topics:
○ Data Transformation: Scaling, normalization, and standardization techniques
(e.g., Min-Max Scaling, Z-Score Normalization)
Lu
○ Feature Engineering: Creating meaningful features from raw data (e.g.,
extracting date-related features, text analysis)
○ Feature Selection and Dimensionality Reduction: Using techniques like PCA
and LDA to reduce the number of features
○ Handling Categorical Data: Encoding techniques like One-Hot Encoding and
Label Encoding
● Real-World Assignment:
In

○ Task: Transform a dataset using normalization, feature extraction, and feature

selection. Apply one-hot encoding to categorical variables and scale
numerical features.
○ Deliverables: A transformed dataset with new features and encoded
categorical data, ready for machine learning.
● Test:
○ Practical: Implement feature engineering and data transformation techniques
on a given dataset.
○ Theory: Short-answer questions on scaling, normalization, and feature
extraction techniques.
Unit 3: Advanced Data Engineering - Building Data Pipelines and
Automating Workflows (8 hours)

Objective: Teach students how to automate and scale their data processing
workflows, handling large datasets efficiently.

Sub-unit 3.1: Building and Automating ETL Pipelines (4 hours)

● Topics:
○ What is ETL? Understanding Extract, Transform, Load processes
○ Automating ETL: Using Apache Airflow for scheduling and managing ETL
workflows

ro
○ Building Data Pipelines: Using Python, Airflow, and AWS Lambda for
automating ETL tasks
○ Data Warehousing: Introduction to cloud data warehouses (e.g., Google
BigQuery, AWS Redshift)
● Real-World Assignment:
○ Task: Set up an automated ETL pipeline using Airflow to extract data from an
st API, transform it (cleaning, feature engineering), and load it into a
cloud-based data warehouse.
○ Deliverables: A functional ETL pipeline that runs on a schedule, extracts data
from an API, processes it, and stores it in a cloud data warehouse.
● Test:
Lu
○ Practical: Build a simple ETL pipeline using Airflow to automate the process
of data extraction, transformation, and loading into a database.

Sub-unit 3.2: Big Data Engineering and Real-Time Data Processing (4 hours)

● Topics:
In

○ Introduction to Big Data Technologies: Using Apache Spark for distributed

data processing
○ Working with Hadoop and Spark: Writing Spark jobs for large-scale data
transformation
○ Real-Time Data Processing: Using Apache Kafka and Apache Flink for
streaming data processing
○ Cloud Data Processing: Using AWS EMR or Google Dataproc for scalable
data processing in the cloud
● Real-World Assignment:
○ Task: Process a large dataset (e.g., logs or sales data) using Apache Spark
on AWS EMR or Google Dataproc, performing distributed transformations.
○ Deliverables: A data processing job that uses Spark to process and analyze
a large dataset in a distributed environment.
● Test:
○ Practical: Write a Spark job that processes a large dataset and outputs a
transformed dataset.
○ Theory: Multiple-choice questions on big data technologies, Spark, and
real-time processing.

Unit 4: Final Project - From Raw Data to Actionable Insights (8 hours)

Objective: Bring together all the concepts and techniques learned in the course to
complete a full data engineering project.

ro
Sub-unit 4.1: Capstone Project Setup and Data Preparation (4 hours)

● Topics:
○ Selecting a Real-World Dataset: Choosing a large dataset from a domain
such as finance, healthcare, or e-commerce
○ Data Cleaning and Transformation: Applying all learned data cleaning,
st
transformation, and engineering techniques
○ Exploratory Data Analysis (EDA): Visualizing the data and summarizing key
patterns and insights
● Real-World Assignment:
○ Task: Choose a real-world dataset and apply cleaning, transformation, and
Lu
feature engineering to prepare the data for analysis or machine learning.
○ Deliverables: A fully cleaned and transformed dataset with visualizations and
insights from exploratory data analysis.
● Test:
○ Practical: Clean and preprocess a real-world dataset and provide EDA
insights.
In

Sub-unit 4.2: Data Pipeline Automation and Reporting (4 hours)

● Topics:
○ Automating the Data Pipeline: Building a full end-to-end ETL pipeline to
automate data processing tasks
○ Reporting: Generating insights from the processed data and creating reports
or dashboards
○ Final Presentation: Preparing the final project report and presentation for
stakeholders
● Real-World Assignment:
○ Task: Complete the data pipeline project by setting up automation, generating
reports, and visualizing insights in a dashboard (e.g., using Tableau or Power
BI).
○ Deliverables: An automated ETL pipeline with a final report or dashboard
showcasing the insights derived from the data.
● Test:
○ Practical: Present a fully functional data pipeline project with automated
processing and real-time insights.

Assessment and Certification:

1. Hands-On Projects: Real-world assignments like web scraping, data transformation,

ro
and building automated ETL pipelines.
2. Final Project: Build a complete data pipeline that automates the collection,
transformation, and reporting of insights from a large dataset.
3. Exams:
○ Practical: Build a full data pipeline, including collecting, cleaning, and
st transforming data, and generating reports.
○ Theory: Multiple-choice and short-answer questions on data gathering,
cleaning, and engineering techniques.
Lu
Tools and Technologies Covered:

● Languages: Python, SQL

● Libraries: Pandas, NumPy, BeautifulSoup, Scrapy, Airflow, Matplotlib, Seaborn
● Tools: Apache Kafka, Apache Spark, AWS, Google Cloud, MongoDB Atlas
● Other: Jupyter, Tableau, Power BI
In

This 30-hour course ensures students are equipped to handle the entire data pipeline
process, from data collection and cleaning to advanced engineering and real-time
processing, preparing them for real-world data engineering challenges.

Please note: The proposed curricula are only indicative, and will be modified to fit the requirements of
the students at the institution. Estimated program delivery durations may vary based on client
approvals and the actual length of the course.

Data Engineering Course Outline
No ratings yet
Data Engineering Course Outline
3 pages
WEB 3.0 Full Report
33% (3)
WEB 3.0 Full Report
13 pages
Data Science Training Content Naresh IT Hyderabad
No ratings yet
Data Science Training Content Naresh IT Hyderabad
13 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
SEM-BCS Versus BPC - Focus On Consolidation
100% (2)
SEM-BCS Versus BPC - Focus On Consolidation
12 pages
SYLLABUS FOR DATA ENGINEERING
No ratings yet
SYLLABUS FOR DATA ENGINEERING
3 pages
roadmap
No ratings yet
roadmap
3 pages
iran
No ratings yet
iran
7 pages
data engineer roadmap
No ratings yet
data engineer roadmap
2 pages
Data Analyst & Data Engineer
No ratings yet
Data Analyst & Data Engineer
4 pages
Ai for IT Coders
No ratings yet
Ai for IT Coders
18 pages
Data Engineer Roadmap
No ratings yet
Data Engineer Roadmap
4 pages
DE Weekly Learning Update Fakrul
No ratings yet
DE Weekly Learning Update Fakrul
7 pages
SYLLABUS FOR DATA MANIPULATION TECHNOLOGY (1)
No ratings yet
SYLLABUS FOR DATA MANIPULATION TECHNOLOGY (1)
3 pages
Ai for IT Non Coders
No ratings yet
Ai for IT Non Coders
14 pages
Data Engineering Nanodegree Program Syllabus PDF
No ratings yet
Data Engineering Nanodegree Program Syllabus PDF
5 pages
MIT data engineering
No ratings yet
MIT data engineering
20 pages
Data Engineering Roadmap
No ratings yet
Data Engineering Roadmap
2 pages
Ciencia Datos Corner
No ratings yet
Ciencia Datos Corner
6 pages
Data Engineering Notes
No ratings yet
Data Engineering Notes
4 pages
Data Engineering Nanodegree Program Syllabus
33% (3)
Data Engineering Nanodegree Program Syllabus
15 pages
Data Engineering for IoE V1.0
No ratings yet
Data Engineering for IoE V1.0
3 pages
2023713662-PythonSQLPyspark
No ratings yet
2023713662-PythonSQLPyspark
5 pages
22CS911-DEC_Unit_5
No ratings yet
22CS911-DEC_Unit_5
68 pages
Data Engineering Syllabus
No ratings yet
Data Engineering Syllabus
5 pages
NDS Data Practitioner Degree Curriculum
No ratings yet
NDS Data Practitioner Degree Curriculum
10 pages
Data Science CLP LED
No ratings yet
Data Science CLP LED
20 pages
Big data analytics practical through practice
No ratings yet
Big data analytics practical through practice
4 pages
data_engineering_roadmap
No ratings yet
data_engineering_roadmap
3 pages
Data Task Breakdown
No ratings yet
Data Task Breakdown
12 pages
DS&a + AI ML Nov 23 6868 - Calendar
No ratings yet
DS&a + AI ML Nov 23 6868 - Calendar
9 pages
PDF
No ratings yet
PDF
25 pages
2nd - Semester - Data Science - Final - Updated
No ratings yet
2nd - Semester - Data Science - Final - Updated
15 pages
7th Sem Syllabus
No ratings yet
7th Sem Syllabus
9 pages
Data Engineering Roadmap 2024
No ratings yet
Data Engineering Roadmap 2024
4 pages
1. Introduction of Subject
No ratings yet
1. Introduction of Subject
28 pages
Data Management for Machine Learning
No ratings yet
Data Management for Machine Learning
7 pages
Data Scientist Nanodegree Syllabus
No ratings yet
Data Scientist Nanodegree Syllabus
16 pages
Data Engineering Nanodegree Program Syllabus
No ratings yet
Data Engineering Nanodegree Program Syllabus
16 pages
Data Engineering Roadmap
No ratings yet
Data Engineering Roadmap
3 pages
DEV U2
No ratings yet
DEV U2
96 pages
Charles_Résumé_2024
No ratings yet
Charles_Résumé_2024
6 pages
NPN 1 Credit Course Learning Guide V1
No ratings yet
NPN 1 Credit Course Learning Guide V1
7 pages
Python AWS Data Engineering Course- Master PySpark, Kafka, SQL
No ratings yet
Python AWS Data Engineering Course- Master PySpark, Kafka, SQL
3 pages
Data Enginner Roadmap
No ratings yet
Data Enginner Roadmap
5 pages
Diploma in Data Science Online Training Content by MR Navin NareshIT Modified
No ratings yet
Diploma in Data Science Online Training Content by MR Navin NareshIT Modified
10 pages
Lab_01 - Data Engineering Practice
No ratings yet
Lab_01 - Data Engineering Practice
4 pages
Data Engineering UNIT-1 (2)
No ratings yet
Data Engineering UNIT-1 (2)
5 pages
1 Intro
No ratings yet
1 Intro
33 pages
21CSS303T DATA SCIENCE SYLLABUS
No ratings yet
21CSS303T DATA SCIENCE SYLLABUS
2 pages
2nd - Semester - Data Science
No ratings yet
2nd - Semester - Data Science
16 pages
Data Engineering Roadmap 2024
No ratings yet
Data Engineering Roadmap 2024
4 pages
2nd - Semester - Data Science - Modified
No ratings yet
2nd - Semester - Data Science - Modified
14 pages
Dsa Report
No ratings yet
Dsa Report
24 pages
CIT 4401Big Data Analytics Course Outline
No ratings yet
CIT 4401Big Data Analytics Course Outline
5 pages
Unit 1
No ratings yet
Unit 1
21 pages
SAP - FDP Content Outline - 2023
No ratings yet
SAP - FDP Content Outline - 2023
3 pages
Modern_DSR_3
No ratings yet
Modern_DSR_3
4 pages
DS Curriculum
No ratings yet
DS Curriculum
4 pages
Data Engineering 6 Months Plan
No ratings yet
Data Engineering 6 Months Plan
3 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
Mastering Power Apps-2
No ratings yet
Mastering Power Apps-2
4 pages
Management & Governance.doc
No ratings yet
Management & Governance.doc
6 pages
L17 Elementary Graph Algo
No ratings yet
L17 Elementary Graph Algo
19 pages
lec1b
No ratings yet
lec1b
17 pages
Ansible-Automation-Platform-for-zOS
No ratings yet
Ansible-Automation-Platform-for-zOS
33 pages
AWS WELL ARCHITECTED FRAMEWORK
No ratings yet
AWS WELL ARCHITECTED FRAMEWORK
54 pages
aws-certified-devops-slides-v14
No ratings yet
aws-certified-devops-slides-v14
172 pages
Arrays
No ratings yet
Arrays
39 pages
Jawaban Final Basis Data 1-60
No ratings yet
Jawaban Final Basis Data 1-60
5 pages
Database+testing+1 1
No ratings yet
Database+testing+1 1
21 pages
ABAP Course - Chapter 4 Database Accesses
No ratings yet
ABAP Course - Chapter 4 Database Accesses
16 pages
Cis 515
No ratings yet
Cis 515
2 pages
TaLend Online Training
No ratings yet
TaLend Online Training
4 pages
Course Outline
No ratings yet
Course Outline
5 pages
Rdbms Imp Questions
No ratings yet
Rdbms Imp Questions
4 pages
Assignment of HR Analytics
No ratings yet
Assignment of HR Analytics
3 pages
Database Recovery Techniques
No ratings yet
Database Recovery Techniques
22 pages
UNIT 3- Transaction
No ratings yet
UNIT 3- Transaction
18 pages
Oracle APPS-DBA Commands
No ratings yet
Oracle APPS-DBA Commands
18 pages
DBMS Lab Manual 2019-20
No ratings yet
DBMS Lab Manual 2019-20
47 pages
SQL queryFT2
No ratings yet
SQL queryFT2
11 pages
Star Schema
No ratings yet
Star Schema
5 pages
PLPDF Upgrade From v2 To v4
No ratings yet
PLPDF Upgrade From v2 To v4
5 pages
Yashika Vohra CV.
No ratings yet
Yashika Vohra CV.
4 pages
emp database
No ratings yet
emp database
8 pages
4 DWConcepDesign 2013 PDF
No ratings yet
4 DWConcepDesign 2013 PDF
16 pages
8.1, 8.2 Database
No ratings yet
8.1, 8.2 Database
26 pages
Membuat Database MySQL - Database Myshop
No ratings yet
Membuat Database MySQL - Database Myshop
4 pages
Extend 2-1 Databases Paper Draft
No ratings yet
Extend 2-1 Databases Paper Draft
3 pages
IPD - SQL Server
No ratings yet
IPD - SQL Server
61 pages
101 500.prepaway - Premium.exam.120q
No ratings yet
101 500.prepaway - Premium.exam.120q
38 pages
Data Dictionary Facebook
No ratings yet
Data Dictionary Facebook
39 pages
Microsoft PL 300 Dumps by Foreman 09 08 2024 7qa Go4braindumps
No ratings yet
Microsoft PL 300 Dumps by Foreman 09 08 2024 7qa Go4braindumps
9 pages
CP04 Business Intelligence
No ratings yet
CP04 Business Intelligence
25 pages
DBMS Notes by Dinudinesh
No ratings yet
DBMS Notes by Dinudinesh
18 pages
Comments On How To Install Primavera Project Planner P6: Kemzee
No ratings yet
Comments On How To Install Primavera Project Planner P6: Kemzee
12 pages

5-Day KVCET Bootcamp - Data Analytics

Uploaded by

5-Day KVCET Bootcamp - Data Analytics

Uploaded by

5-day KVCET Bootcamp - Data Analytics

TENTATIVE PROGRAM DURATION:

Unit 1: Data Gathering - Collecting Data from Diverse Sources (6 hours)

Sub-unit 1.1: Introduction to Data Sources and Collection Methods (2 hours)

Sub-unit 1.2: Real-Time Data Collection and Cloud Integration (4 hours)

Unit 2: Data Cleaning - Transforming Raw Data into Structured Format

Sub-unit 2.1: Handling Missing Data and Outliers (4 hours)

○​ Task: Transform a dataset using normalization, feature extraction, and feature

Sub-unit 3.1: Building and Automating ETL Pipelines (4 hours)

○​ Introduction to Big Data Technologies: Using Apache Spark for distributed

Unit 4: Final Project - From Raw Data to Actionable Insights (8 hours)

Sub-unit 4.2: Data Pipeline Automation and Reporting (4 hours)

Assessment and Certification:

●​ Languages: Python, SQL

You might also like

○ Task: Transform a dataset using normalization, feature extraction, and feature

○ Introduction to Big Data Technologies: Using Apache Spark for distributed

● Languages: Python, SQL