0% found this document useful (0 votes)

3 views

DataCentricComputing

The course 'Data Centric Computing' focuses on data as a primary driver for insights and innovation, providing hands-on experience with real-world data and techniques for data management and analysis. It covers topics such as data storage, preprocessing, big data frameworks, and programming paradigms using Python and R. The course aims to equip students with the necessary skills for effective data handling and analysis across various contexts.

Uploaded by

baalaji.cse

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

DataCentricComputing

Uploaded by

baalaji.cse

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Profile of an Individual Course

PART A- INTRODUCTION OF THE COURSE

The Data-centric computing prioritizes data as the key driver for insights and innovation. It develops
hands-on experience with large, real-world data that explores techniques and tools to derive
meaningful insights. It gains practical experience through data exploration and predictive modeling.

Course Code Course Course Title L T P C

Category 3 0 0 3
U20CSCT33 CORE (T) Pre- requisite:
DATA CENTRIC COMPUTING U20CSCJ11

Name of the Course Dr. R.J. Aarthi/ Associate Professor Contact Hrs.: 45
Coordinator:
Course Offering Department of CSE / School of Total Marks:100
Department/School: Computing

Course Objective and Summary

The course will enable the student to acquire the computational, mathematical and statistical
knowledge needed for responsible data management and data curation, ethically cleaning,
interpreting and using big data across a variety of contexts. To Equip with the skills to collect,
clean, integrate, normalize, and transform data for analysis, addressing challenges in real-world
data scenarios and to program in Python and R, using Jupyter notebooks and RStudio.
Design and query relational databases using various tools (e.g. MySQL via MySQL Workbench).

Course Outcomes (COs)

CO No. Course Blooms level

outcome
CO1 Explain the principles and significance of data-centric computing,
2
differentiating it from compute-centric approaches.
CO2 Apply appropriate data storage techniques to manage and organize data 3
effectively.
CO3 Demonstrate data collection, cleaning, and preprocessing techniques to 3
prepare datasets for analysis.
CO4 Apply the functionality of big data frameworks, parallel processing, and 3
distributed computing systems for handling large-scale data.
CO5 Inspect data-centric programming paradigms and tools viz., Python, R, 4
SQL to manipulate and query data for meaningful insights.
Mapping / Alignment of Cos with PO & PSO
(Tick marks or level of correlation: 3 — High, 2 — Medium, 1 — Low)

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
3 2 2 2
CO1
CO2 2 3 2 2

3 3 2 2 2
CO3
CO4 3 3 2 2 2
2
CO5 3 2 3 1 2

(Tick mark or level of correlation: 3-High, 2-Medium, l-Low)

PART B – CONTENT OF THE COURSE

MODULE I Introduction to Data-Centric Computing 9

Definition and significance of data-centric computing - Differences between data-centric and computer-
centric approaches. Impact of data centric computing, Advantages and Challenges Use cases in industry
- AI, big data analytics, IoT. – Limitations.

MODULE II Data Storage Representation 9

Overview of data types and representations -File systems and storage management techniques:
Traditional storage architectures – Hard Disk Drive - HDD, Solid State Device - SSD, Distributed
storage systems – Hadoop Distributed File System – HDFS, limitation of HDFS, Ceph Distributed
Storage – Reliable Autonomic Distributed Object Store – RADOS in cloud.

MODULE III Data Preprocessing Techniques 9

Data Collection and Integration from Heterogeneous Sources of data, methods and challenges – Data
Integration Tools: ETL tools- Data cleaning and Preprocessing for data quality - Data normalization
and transformation techniques, Handling imbalanced data

MODULE IV Big Data Frameworks, Parallel and Distributed Computing 9

Introduction to big data and distributed computing - Frameworks: Apache Hadoop: MapReduce
programming model - Apache Spark: In-memory distributed computing, Basics of parallel processing
and concurrency, Distributed system concepts: Data partitioning and replication, Consensus algorithms -
Paxos, Raft, Fault tolerance and scalability.

MODULE V Data-Centric Programming Paradigms 9

Overview of data-centric languages: Python for data manipulation - NumPy, pandas, R for statistical
computing - Query languages - SQL and their extensions for big data - HiveQL, SparkSQL - Tools and
Technologies Overview- Case studies in Agriculture, Education, Healthcare.

TOTAL HOURS: 45 Hours

Alignment of topics of the courses with CO

Assignme Alignment to
S.No SUMMARY OF COURSE CONTENT Hrs
nt COs
Introduction to Data-Centric Computing: Definition and
1 1 CO1
significance of data-centric computing
2 Explanation of compute-centric computing 1 CO1
Differences between data-centric and computer-centric
3 1 CO1
approaches.
4 Impact of data centric computing, 1 CO1
5 Advantages of data centric computing in real time data world 1 CO1
6 Challenges exist in data centric computing 1 CO1
Use cases in industry relevant to data centric and compute
7 1 CO1
centric computing
8 Use cases in industry - AI, big data analytics, IoT. 1 CO1
9 Limitations on data centric computing. 1 CO1
Data Storage Representation: Discussion on various data types CO2
10 1
and data structures to handle them
11 Overview of data types and representations 1 CO2
File systems and storage management techniques: Traditional CO2
12 storage architectures – Hard Disk Drive - HDD, Solid State 1
Device - SSD
13 Distributed storage systems and its necessities 1 CO2
14 Hadoop Distributed File System – HDFS 1 CO2

15 Limitation of HDFS 1 CO2

16 Ceph Distributed Storage 1 CO2

Reliable Autonomic Distributed Object Store – RADOS in
17 1 CO2
cloud
How Ceph distributed storage can be implemented using
18 1 CO2
RADOS.
Data Preprocessing Techniques Discussion and types CO3
19 1
Data Collection and Integration from Heterogeneous Sources of CO3
20 1
data
21 Data collection methods and challenges 1 CO3

22 Data Integration Tools: ETL tools 2 CO3

23 Data cleaning and Preprocessing for data quality 1 CO3

24 Data normalization and transformation techniques 2 CO3

25 Handling imbalanced data – SMOTE technqiues 1 CO3

Big Data Frameworks, Parallel and Distributed Computing
26 1 CO4
Introduction to big data and distributed computing
Frameworks: Apache Hadoop: MapReduce programming
27 2 CO4
model
28 Apache Spark: In-memory distributed computing, 1 CO4
29 Basics of parallel processing and concurrency 1 CO4
30 Distributed system concepts: Data partitioning and replication, 2 CO4
31 Consensus algorithms - Paxos, Raft 1 CO4
32 Fault tolerance and scalability 1 CO4
Introduction to Data-Centric Programming Paradigms
33 1 CO5
Overview of data-centric languages: Python for data
34 1 CO5
manipulation - NumPy, pandas
35 R for statistical computing in attain data centric computing 2 CO5
Query languages - SQL and their extensions for big data -
36 2 CO5
HiveQL, SparkSQL
37 Tools and Technologies Overview- 1 CO5
38 Case studies in Agriculture, Education, Healthcare 2 CO5
TOTAL HOURS 45

2. Lesson Plan / Topics assignment with CO’s, Teaching toolsused:

Hour Topic CO Text / Techniq Proposed Complete BT

Referen ue Tool Date d Date level
ce
book
1 Introduction to Data-Centric TB1 T1 2
Computing: Definition and
CO1
significance of data-centric
computing
2 Explanation of compute-centric TB I T1 2
CO1
computing
3 Differences between data-centric TB1 T1 2
CO1
and computer-centric approaches.
4 Impact of data centric computing, CO1 TB1 T1 2
5 Advantages of data centric TB1 T3 2
CO1
computing in real time data world
6 TB1 T2 2
Challenges exist in data centric
CO1
computing
7 Use cases in industry relevant to TBl T1 3
data centric and compute centric CO1
computing
8 Use cases in industry - AI, big data TB1 Tl 3
CO1
analytics, IoT.
9 Limitations on data centric TB1 T1 2
CO1
computing.
10 Data Storage Representation: CO2 TB1 T1 2
Discussion on various data types
and data structures to handle them

11 Overview of data types and CO2 TBl T1 2

representations
12 File systems and storage CO2 TB1 T2 2
management techniques:
Traditional storage architectures –
Hard Disk Drive - HDD, Solid
State Device - SSD
13 CO2 TB1 T1 2
Distributed storage systems and its
necessities
14 Hadoop Distributed File System – CO2 TBl T1 2
HDFS
15 CO2 TB1 T1 2
Limitation of HDFS

16 CO2 RB2 T1 2
Ceph Distributed Storage
17 Reliable Autonomic Distributed RB2 T1 2
CO2
Object Store – RADOS in cloud
18 How Ceph distributed storage can RB2 T2 3
CO2
be implemented using RADOS.
19 Data Preprocessing Techniques CO3 RB2 T1 2
Discussion and types
20 Data Collection and Integration CO3 RB1 T1 2
from Heterogeneous Sources of
data
21 Data collection methods and CO3 RB1 T1 2
challenges
22 CO3 RB2 T3 2
Data Integration Tools: ETL tools
23 Data cleaning and Preprocessing CO3 RB2 T1 2
for data quality
24 Data normalization and CO3 TB1 T1 2
transformation techniques
25 Handling imbalanced data – CO3 TBl T1 2
SMOTE techniques
26 Big Data Frameworks, Parallel WB1 T2 2
and Distributed Computing
CO4
Introduction to big data and
distributed computing
27 Frameworks: Apache Hadoop: WB1, RB2 T2 2
CO4
MapReduce programming model
28 Apache Spark: In-memory RB1 T1 3
CO4
distributed computing,
29 Basics of parallel processing and TB1 T1 3
CO4
concurrency
30 Distributed system concepts: Data TBl T1 3
CO4
partitioning and replication,
31 Consensus algorithms - Paxos, TB1 T1 3
CO4
Raft
32 Fault tolerance and scalability CO4 TB1 T1 3
33 Introduction to Data-Centric WB1 T1 3
Programming Paradigms CO5

34 Overview of data-centric WB1 T1 3

languages: Python for data CO5
manipulation - NumPy, pandas
35 R for statistical computing in attain WB1 T1 3
CO5
data centric computing
36 Query languages - SQL and their CO5 TB1 T1 3
extensions for big data - HiveQL,
SparkSQL
37 Tools and Technologies Overview- CO5 TBl T2 3
38 Case studies in Agriculture, TB1 T2 4
CO5
Education, Healthcare

Assignments:

S.No Topic CO Proposed Completed BT Level

Date Date
1 Simulate a virtualized cloud environment CO2 3
(e.g., AWS, GCP, Azure, or a local
virtual machine setup using VirtualBox
or VMware). Install and configure the
required components:1. Linux OS
(preferably Ubuntu/CentOS) 2. Ceph
software.

2 Data centric approach on the processes of CO5 3

weather, soil, and crop data to provide
actionable insights to farmers using
programming paradigms.

Technique Tools Planned:

Type Code Technique Tools Planned

T1 Black Board

T2 Power point Presentation

T3 Video Lectures

Reference Code Description

TB1 Designing Data-Intensive Applications, 2nd Edition, by Martin Kleppmann, Chris Riccomini,
Publisher(s): O'Reilly Media, Inc.

RB1 “Data Centric Artificial Intelligence: A Beginner’s Guide “ , By Parikshit N. Mahalle,

Gitanjali R. Shinde, Yashwant S. Ingle, Namrata N. Wasatkar, Springer

RB2 “A Data-Centric Introduction to Computing” - Kathi Fisler, Shriram Krishnamurthi,

Benjamin S. Lerner, Joe Gibbs Politz, MIT Press
WB1 Online Resources (Free/Open Access)
"Introduction to Data Science” Platforms: Coursera/EdX
Content: Introductory courses to Python, R, and big data frameworks.
Part C- Assessment and Evaluation

Assessment Pattern:

There are 4 Continuous Learning Assessments (CLA) for the subject and for CLA 1 for
30 marks, CLA 2 for 30 Marks and CLA3 for 30 Marks and CLA 4 for 10 Marks.

CO WEIGHTAGE

Weightage
CO’s (Theory)
CO1 20%

CO2 20%

CO3 20%

CO4 20%

CO5 20%

THEORY

CLA 1 portions will be Module I and Module II half part with 30 marks.

CLA 2 portions will be Module II second half and Module III with 30 Marks

CLA 3 portions will be Module IV and Module V with 30 marks.

CLA 4 will be assignments.

Continuous Learning Assessment (CLA) -

Weightages (50%) THEORY

CO’s Test (Marks) CLA 4

CLA 1 CLA 2 CLA 3 Assignments (mark)

CO1 20

CO2 10 10

CO3 20

CO4 15 05

CO5 15 05
Final Examination – Weightage 50%

CO’s Mars (Theory)

CO1 20
CO2 20
CO3 20
CO4 20
C05 20

Evaluation Policy
EXAMS Total Mars split up WEIGHTAGE TOTAL MARS
Continuous Internal 100
Assessment Theory
(CLA 1, CLA 2, CLA 50% of Average 100 Mars
3, CLA 4)
End Semester Exam 100
theory

CO’s Marks (Theory)

CO1 20

CO2 20

CO3 20

CO4 20

CO5 20

Part D – Learning Resources

TEXTBOOKS

Designing Data-Intensive Applications, 2nd Edition, by Martin Kleppmann, Chris Riccomini,

Publisher(s): O'Reilly Media, Inc.

REFERENCE BOOKS
1. “Data Centric Artificial Intelligence: A Beginner’s Guide “ , By Parikshit N. Mahalle, Gitanjali R.
Shinde, Yashwant S. Ingle, Namrata N. Wasatkar, Springer
2. “A Data-Centric Introduction to Computing” - Kathi Fisler, Shriram Krishnamurthi, Benjamin S.
Lerner, Joe Gibbs Politz, MIT Press

1. https://www.jenkins.io/user-handbook.pdf

20IT503 - Big Data Analytics - Unit4
No ratings yet
20IT503 - Big Data Analytics - Unit4
73 pages
2CS702-CPD-Odd 23 24
No ratings yet
2CS702-CPD-Odd 23 24
9 pages
Green Computing-COURSE DATA SHEET
No ratings yet
Green Computing-COURSE DATA SHEET
5 pages
Institute of Technology: Practical List
No ratings yet
Institute of Technology: Practical List
4 pages
Big Data
No ratings yet
Big Data
4 pages
BCA103 Database Management System
No ratings yet
BCA103 Database Management System
4 pages
Mtech-Syllabus-Data Science - Sem1
No ratings yet
Mtech-Syllabus-Data Science - Sem1
25 pages
DS_BDA_QB_UG24
No ratings yet
DS_BDA_QB_UG24
28 pages
21ai402 Data Analytics Unit-3
No ratings yet
21ai402 Data Analytics Unit-3
150 pages
113 Ce 74
No ratings yet
113 Ce 74
4 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
2 pages
2020 R CSBS FM Code UPDATED-80-82
No ratings yet
2020 R CSBS FM Code UPDATED-80-82
3 pages
BCSC_0034
No ratings yet
BCSC_0034
2 pages
BDA CSEN3101 Syllabus
No ratings yet
BDA CSEN3101 Syllabus
3 pages
20IT503 - Big Data Analytics - Unit2
No ratings yet
20IT503 - Big Data Analytics - Unit2
62 pages
TMC 403 (5) Data Mining and Warehousing
No ratings yet
TMC 403 (5) Data Mining and Warehousing
3 pages
20EC4702D Syllabus
No ratings yet
20EC4702D Syllabus
2 pages
22CS911-DEC_Unit_5
No ratings yet
22CS911-DEC_Unit_5
68 pages
DSCC UNIT 3 DIGITAL NOTES
No ratings yet
DSCC UNIT 3 DIGITAL NOTES
65 pages
22IS61 Big data analytics 2025
No ratings yet
22IS61 Big data analytics 2025
4 pages
RDBMS Theory & Lab Syllabus
No ratings yet
RDBMS Theory & Lab Syllabus
5 pages
Big Daa R18 Manual
No ratings yet
Big Daa R18 Manual
84 pages
21uad404-Dwdm April 2024 QB
No ratings yet
21uad404-Dwdm April 2024 QB
10 pages
DCC UNIT 1 DIGITAL NOTES
No ratings yet
DCC UNIT 1 DIGITAL NOTES
60 pages
III-II Big Data Analytics Question Bank
100% (1)
III-II Big Data Analytics Question Bank
3 pages
CS8091 BDA Unit1
No ratings yet
CS8091 BDA Unit1
63 pages
Fundamentals of Data Structures- Syllabus
No ratings yet
Fundamentals of Data Structures- Syllabus
2 pages
20IT503 - Big Data Analytics - Unit1
No ratings yet
20IT503 - Big Data Analytics - Unit1
59 pages
Sir C.R.Reddy College of Engineering, Eluru Department of Information Technology Course Handout
No ratings yet
Sir C.R.Reddy College of Engineering, Eluru Department of Information Technology Course Handout
12 pages
Mtech-Syllabus-Data Science - Sem2
No ratings yet
Mtech-Syllabus-Data Science - Sem2
18 pages
DSCC UNIT 2 DIGITAL NOTES
No ratings yet
DSCC UNIT 2 DIGITAL NOTES
75 pages
BIG DATA AND ANALYTICS Q&A
No ratings yet
BIG DATA AND ANALYTICS Q&A
18 pages
MCA Syllabus - 1st Sem PDF
No ratings yet
MCA Syllabus - 1st Sem PDF
32 pages
Course Plan DWDM BTech IT 2020-21
No ratings yet
Course Plan DWDM BTech IT 2020-21
6 pages
DM Handout
No ratings yet
DM Handout
41 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Data Engineering Syllabus
No ratings yet
Data Engineering Syllabus
5 pages
Assignment NO 1 BDA 24-25
No ratings yet
Assignment NO 1 BDA 24-25
2 pages
Big Data Theory
No ratings yet
Big Data Theory
3 pages
Final 7th Sem Syllabus
No ratings yet
Final 7th Sem Syllabus
39 pages
FrmCourseSyllabus Aspx
No ratings yet
FrmCourseSyllabus Aspx
2 pages
CSE411 - Updated Course Outline -with Explaination- (1)
No ratings yet
CSE411 - Updated Course Outline -with Explaination- (1)
11 pages
BDA CE Question Bank
No ratings yet
BDA CE Question Bank
5 pages
Syllabus
No ratings yet
Syllabus
2 pages
2_22CST602 - Cloud Computing
No ratings yet
2_22CST602 - Cloud Computing
3 pages
BCSG_0034
No ratings yet
BCSG_0034
2 pages
2024 25 ODD CE449 BDA Syllabus
No ratings yet
2024 25 ODD CE449 BDA Syllabus
4 pages
Coa Unit 2 Digital Notes
No ratings yet
Coa Unit 2 Digital Notes
91 pages
21CSA301 Datamining-Final
No ratings yet
21CSA301 Datamining-Final
10 pages
Dbms Lab Manual
No ratings yet
Dbms Lab Manual
60 pages
Syllabus 5 Sem Jul 2024
No ratings yet
Syllabus 5 Sem Jul 2024
19 pages
Syllabus_OE_AIDSML.
No ratings yet
Syllabus_OE_AIDSML.
7 pages
DMR Question Bank
No ratings yet
DMR Question Bank
11 pages
Coa Unit 1 Digital Notes
No ratings yet
Coa Unit 1 Digital Notes
100 pages
QB Unit-4
No ratings yet
QB Unit-4
3 pages
2 DBMS Syllubusdbotpj - Vip
No ratings yet
2 DBMS Syllubusdbotpj - Vip
4 pages
Institute of Aeronautical Engineering: Tutorial Question Bank
100% (1)
Institute of Aeronautical Engineering: Tutorial Question Bank
21 pages
Course handout-21SC1101
No ratings yet
Course handout-21SC1101
32 pages
CS221_DBMS.docx
No ratings yet
CS221_DBMS.docx
5 pages
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
(SOLVED) Data Error (Cyclic Redundancy Check) - Driver Easy
No ratings yet
(SOLVED) Data Error (Cyclic Redundancy Check) - Driver Easy
10 pages
Battery
No ratings yet
Battery
4 pages
Storage Devices 3
No ratings yet
Storage Devices 3
29 pages
DVR Movil Geovision GVLX4C2V
No ratings yet
DVR Movil Geovision GVLX4C2V
3 pages
Secondary Storage Introduction
No ratings yet
Secondary Storage Introduction
82 pages
Operating Instructions: DMR-EH50
No ratings yet
Operating Instructions: DMR-EH50
72 pages
Peripherals and Interfaces
No ratings yet
Peripherals and Interfaces
42 pages
Suraj SIngh PCAT Practical
No ratings yet
Suraj SIngh PCAT Practical
3 pages
Price
No ratings yet
Price
83 pages
Components ММ07112018
No ratings yet
Components ММ07112018
6 pages
B CML Admin Guide 2-0
No ratings yet
B CML Admin Guide 2-0
38 pages
RZNC-0501 Manual PDF
100% (5)
RZNC-0501 Manual PDF
56 pages
3.1 Computer Hardware Component
No ratings yet
3.1 Computer Hardware Component
70 pages
Dell Service Repair Form US 140613
No ratings yet
Dell Service Repair Form US 140613
2 pages
hsv300 Event CR18CB 09534000
No ratings yet
hsv300 Event CR18CB 09534000
1,102 pages
A Linux User's Guide To Logical Volume Management
No ratings yet
A Linux User's Guide To Logical Volume Management
10 pages
Mounting the PS3 internal memory on Linux (2022-04-16)
No ratings yet
Mounting the PS3 internal memory on Linux (2022-04-16)
11 pages
Western Digital Ultrastar Hs14 DS
No ratings yet
Western Digital Ultrastar Hs14 DS
2 pages
Chapter 1 QuestionsR
No ratings yet
Chapter 1 QuestionsR
8 pages
VDR Explorer, User Manual, DBS00294-31
No ratings yet
VDR Explorer, User Manual, DBS00294-31
90 pages
PC Hardware Servicing Learning Module
No ratings yet
PC Hardware Servicing Learning Module
113 pages
HP Personal Media Drive User's Guide
No ratings yet
HP Personal Media Drive User's Guide
16 pages
US - IBM System Storage DS3500 Storage Systems and EXP3500
No ratings yet
US - IBM System Storage DS3500 Storage Systems and EXP3500
34 pages
Windows Desktop Interview Questions and Answers
No ratings yet
Windows Desktop Interview Questions and Answers
5 pages
HP Workstation
No ratings yet
HP Workstation
48 pages
Superparamagnetic Materials: Seminar I - 4th Year (Old Program)
No ratings yet
Superparamagnetic Materials: Seminar I - 4th Year (Old Program)
12 pages
COMPUTER STUDIES Question N Answer
0% (1)
COMPUTER STUDIES Question N Answer
134 pages
IT Essentials (ITE v6.0) A+ Cert Practice Exam 2 Answers 2016
0% (1)
IT Essentials (ITE v6.0) A+ Cert Practice Exam 2 Answers 2016
11 pages
Advanced Storage Systems: Hossein Asadi Department of Computer Engineering Sharif University of Technology
No ratings yet
Advanced Storage Systems: Hossein Asadi Department of Computer Engineering Sharif University of Technology
33 pages
Ict Book
No ratings yet
Ict Book
100 pages