0% found this document useful (0 votes)
3 views

DataCentricComputing

The course 'Data Centric Computing' focuses on data as a primary driver for insights and innovation, providing hands-on experience with real-world data and techniques for data management and analysis. It covers topics such as data storage, preprocessing, big data frameworks, and programming paradigms using Python and R. The course aims to equip students with the necessary skills for effective data handling and analysis across various contexts.

Uploaded by

baalaji.cse
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

DataCentricComputing

The course 'Data Centric Computing' focuses on data as a primary driver for insights and innovation, providing hands-on experience with real-world data and techniques for data management and analysis. It covers topics such as data storage, preprocessing, big data frameworks, and programming paradigms using Python and R. The course aims to equip students with the necessary skills for effective data handling and analysis across various contexts.

Uploaded by

baalaji.cse
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Profile of an Individual Course

PART A- INTRODUCTION OF THE COURSE

The Data-centric computing prioritizes data as the key driver for insights and innovation. It develops
hands-on experience with large, real-world data that explores techniques and tools to derive
meaningful insights. It gains practical experience through data exploration and predictive modeling.

Course Code Course Course Title L T P C


Category 3 0 0 3
U20CSCT33 CORE (T) Pre- requisite:
DATA CENTRIC COMPUTING U20CSCJ11

Name of the Course Dr. R.J. Aarthi/ Associate Professor Contact Hrs.: 45
Coordinator:
Course Offering Department of CSE / School of Total Marks:100
Department/School: Computing

Course Objective and Summary


The course will enable the student to acquire the computational, mathematical and statistical
knowledge needed for responsible data management and data curation, ethically cleaning,
interpreting and using big data across a variety of contexts. To Equip with the skills to collect,
clean, integrate, normalize, and transform data for analysis, addressing challenges in real-world
data scenarios and to program in Python and R, using Jupyter notebooks and RStudio.
Design and query relational databases using various tools (e.g. MySQL via MySQL Workbench).

Course Outcomes (COs)

CO No. Course Blooms level


outcome
CO1 Explain the principles and significance of data-centric computing,
2
differentiating it from compute-centric approaches.
CO2 Apply appropriate data storage techniques to manage and organize data 3
effectively.
CO3 Demonstrate data collection, cleaning, and preprocessing techniques to 3
prepare datasets for analysis.
CO4 Apply the functionality of big data frameworks, parallel processing, and 3
distributed computing systems for handling large-scale data.
CO5 Inspect data-centric programming paradigms and tools viz., Python, R, 4
SQL to manipulate and query data for meaningful insights.
Mapping / Alignment of Cos with PO & PSO
(Tick marks or level of correlation: 3 — High, 2 — Medium, 1 — Low)

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
3 2 2 2
CO1
CO2 2 3 2 2

3 3 2 2 2
CO3
CO4 3 3 2 2 2
2
CO5 3 2 3 1 2

(Tick mark or level of correlation: 3-High, 2-Medium, l-Low)

PART B – CONTENT OF THE COURSE

MODULE I Introduction to Data-Centric Computing 9

Definition and significance of data-centric computing - Differences between data-centric and computer-
centric approaches. Impact of data centric computing, Advantages and Challenges Use cases in industry
- AI, big data analytics, IoT. – Limitations.

MODULE II Data Storage Representation 9

Overview of data types and representations -File systems and storage management techniques:
Traditional storage architectures – Hard Disk Drive - HDD, Solid State Device - SSD, Distributed
storage systems – Hadoop Distributed File System – HDFS, limitation of HDFS, Ceph Distributed
Storage – Reliable Autonomic Distributed Object Store – RADOS in cloud.

MODULE III Data Preprocessing Techniques 9

Data Collection and Integration from Heterogeneous Sources of data, methods and challenges – Data
Integration Tools: ETL tools- Data cleaning and Preprocessing for data quality - Data normalization
and transformation techniques, Handling imbalanced data

MODULE IV Big Data Frameworks, Parallel and Distributed Computing 9

Introduction to big data and distributed computing - Frameworks: Apache Hadoop: MapReduce
programming model - Apache Spark: In-memory distributed computing, Basics of parallel processing
and concurrency, Distributed system concepts: Data partitioning and replication, Consensus algorithms -
Paxos, Raft, Fault tolerance and scalability.

MODULE V Data-Centric Programming Paradigms 9

Overview of data-centric languages: Python for data manipulation - NumPy, pandas, R for statistical
computing - Query languages - SQL and their extensions for big data - HiveQL, SparkSQL - Tools and
Technologies Overview- Case studies in Agriculture, Education, Healthcare.

TOTAL HOURS: 45 Hours


Alignment of topics of the courses with CO

Assignme Alignment to
S.No SUMMARY OF COURSE CONTENT Hrs
nt COs
Introduction to Data-Centric Computing: Definition and
1 1 CO1
significance of data-centric computing
2 Explanation of compute-centric computing 1 CO1
Differences between data-centric and computer-centric
3 1 CO1
approaches.
4 Impact of data centric computing, 1 CO1
5 Advantages of data centric computing in real time data world 1 CO1
6 Challenges exist in data centric computing 1 CO1
Use cases in industry relevant to data centric and compute
7 1 CO1
centric computing
8 Use cases in industry - AI, big data analytics, IoT. 1 CO1
9 Limitations on data centric computing. 1 CO1
Data Storage Representation: Discussion on various data types CO2
10 1
and data structures to handle them
11 Overview of data types and representations 1 CO2
File systems and storage management techniques: Traditional CO2
12 storage architectures – Hard Disk Drive - HDD, Solid State 1
Device - SSD
13 Distributed storage systems and its necessities 1 CO2
14 Hadoop Distributed File System – HDFS 1 CO2

15 Limitation of HDFS 1 CO2

16 Ceph Distributed Storage 1 CO2


Reliable Autonomic Distributed Object Store – RADOS in
17 1 CO2
cloud
How Ceph distributed storage can be implemented using
18 1 CO2
RADOS.
Data Preprocessing Techniques Discussion and types CO3
19 1
Data Collection and Integration from Heterogeneous Sources of CO3
20 1
data
21 Data collection methods and challenges 1 CO3

22 Data Integration Tools: ETL tools 2 CO3

23 Data cleaning and Preprocessing for data quality 1 CO3

24 Data normalization and transformation techniques 2 CO3

25 Handling imbalanced data – SMOTE technqiues 1 CO3


Big Data Frameworks, Parallel and Distributed Computing
26 1 CO4
Introduction to big data and distributed computing
Frameworks: Apache Hadoop: MapReduce programming
27 2 CO4
model
28 Apache Spark: In-memory distributed computing, 1 CO4
29 Basics of parallel processing and concurrency 1 CO4
30 Distributed system concepts: Data partitioning and replication, 2 CO4
31 Consensus algorithms - Paxos, Raft 1 CO4
32 Fault tolerance and scalability 1 CO4
Introduction to Data-Centric Programming Paradigms
33 1 CO5
Overview of data-centric languages: Python for data
34 1 CO5
manipulation - NumPy, pandas
35 R for statistical computing in attain data centric computing 2 CO5
Query languages - SQL and their extensions for big data -
36 2 CO5
HiveQL, SparkSQL
37 Tools and Technologies Overview- 1 CO5
38 Case studies in Agriculture, Education, Healthcare 2 CO5
TOTAL HOURS 45

2. Lesson Plan / Topics assignment with CO’s, Teaching toolsused:

Hour Topic CO Text / Techniq Proposed Complete BT


Referen ue Tool Date d Date level
ce
book
1 Introduction to Data-Centric TB1 T1 2
Computing: Definition and
CO1
significance of data-centric
computing
2 Explanation of compute-centric TB I T1 2
CO1
computing
3 Differences between data-centric TB1 T1 2
CO1
and computer-centric approaches.
4 Impact of data centric computing, CO1 TB1 T1 2
5 Advantages of data centric TB1 T3 2
CO1
computing in real time data world
6 TB1 T2 2
Challenges exist in data centric
CO1
computing
7 Use cases in industry relevant to TBl T1 3
data centric and compute centric CO1
computing
8 Use cases in industry - AI, big data TB1 Tl 3
CO1
analytics, IoT.
9 Limitations on data centric TB1 T1 2
CO1
computing.
10 Data Storage Representation: CO2 TB1 T1 2
Discussion on various data types
and data structures to handle them

11 Overview of data types and CO2 TBl T1 2


representations
12 File systems and storage CO2 TB1 T2 2
management techniques:
Traditional storage architectures –
Hard Disk Drive - HDD, Solid
State Device - SSD
13 CO2 TB1 T1 2
Distributed storage systems and its
necessities
14 Hadoop Distributed File System – CO2 TBl T1 2
HDFS
15 CO2 TB1 T1 2
Limitation of HDFS

16 CO2 RB2 T1 2
Ceph Distributed Storage
17 Reliable Autonomic Distributed RB2 T1 2
CO2
Object Store – RADOS in cloud
18 How Ceph distributed storage can RB2 T2 3
CO2
be implemented using RADOS.
19 Data Preprocessing Techniques CO3 RB2 T1 2
Discussion and types
20 Data Collection and Integration CO3 RB1 T1 2
from Heterogeneous Sources of
data
21 Data collection methods and CO3 RB1 T1 2
challenges
22 CO3 RB2 T3 2
Data Integration Tools: ETL tools
23 Data cleaning and Preprocessing CO3 RB2 T1 2
for data quality
24 Data normalization and CO3 TB1 T1 2
transformation techniques
25 Handling imbalanced data – CO3 TBl T1 2
SMOTE techniques
26 Big Data Frameworks, Parallel WB1 T2 2
and Distributed Computing
CO4
Introduction to big data and
distributed computing
27 Frameworks: Apache Hadoop: WB1, RB2 T2 2
CO4
MapReduce programming model
28 Apache Spark: In-memory RB1 T1 3
CO4
distributed computing,
29 Basics of parallel processing and TB1 T1 3
CO4
concurrency
30 Distributed system concepts: Data TBl T1 3
CO4
partitioning and replication,
31 Consensus algorithms - Paxos, TB1 T1 3
CO4
Raft
32 Fault tolerance and scalability CO4 TB1 T1 3
33 Introduction to Data-Centric WB1 T1 3
Programming Paradigms CO5

34 Overview of data-centric WB1 T1 3


languages: Python for data CO5
manipulation - NumPy, pandas
35 R for statistical computing in attain WB1 T1 3
CO5
data centric computing
36 Query languages - SQL and their CO5 TB1 T1 3
extensions for big data - HiveQL,
SparkSQL
37 Tools and Technologies Overview- CO5 TBl T2 3
38 Case studies in Agriculture, TB1 T2 4
CO5
Education, Healthcare

Assignments:

S.No Topic CO Proposed Completed BT Level


Date Date
1 Simulate a virtualized cloud environment CO2 3
(e.g., AWS, GCP, Azure, or a local
virtual machine setup using VirtualBox
or VMware). Install and configure the
required components:1. Linux OS
(preferably Ubuntu/CentOS) 2. Ceph
software.

2 Data centric approach on the processes of CO5 3


weather, soil, and crop data to provide
actionable insights to farmers using
programming paradigms.

Technique Tools Planned:


Type Code Technique Tools Planned

T1 Black Board

T2 Power point Presentation

T3 Video Lectures

Reference Code Description

TB1 Designing Data-Intensive Applications, 2nd Edition, by Martin Kleppmann, Chris Riccomini,
Publisher(s): O'Reilly Media, Inc.

RB1 “Data Centric Artificial Intelligence: A Beginner’s Guide “ , By Parikshit N. Mahalle,


Gitanjali R. Shinde, Yashwant S. Ingle, Namrata N. Wasatkar, Springer

RB2 “A Data-Centric Introduction to Computing” - Kathi Fisler, Shriram Krishnamurthi,


Benjamin S. Lerner, Joe Gibbs Politz, MIT Press
WB1 Online Resources (Free/Open Access)
"Introduction to Data Science” Platforms: Coursera/EdX
Content: Introductory courses to Python, R, and big data frameworks.
Part C- Assessment and Evaluation

Assessment Pattern:

There are 4 Continuous Learning Assessments (CLA) for the subject and for CLA 1 for
30 marks, CLA 2 for 30 Marks and CLA3 for 30 Marks and CLA 4 for 10 Marks.

CO WEIGHTAGE

Weightage
CO’s (Theory)
CO1 20%

CO2 20%

CO3 20%

CO4 20%

CO5 20%

THEORY

CLA 1 portions will be Module I and Module II half part with 30 marks.

CLA 2 portions will be Module II second half and Module III with 30 Marks

CLA 3 portions will be Module IV and Module V with 30 marks.

CLA 4 will be assignments.

Continuous Learning Assessment (CLA) -

Weightages (50%) THEORY

CO’s Test (Marks) CLA 4

CLA 1 CLA 2 CLA 3 Assignments (mark)

CO1 20

CO2 10 10

CO3 20

CO4 15 05

CO5 15 05
Final Examination – Weightage 50%

CO’s Mars (Theory)


CO1 20
CO2 20
CO3 20
CO4 20
C05 20

Evaluation Policy
EXAMS Total Mars split up WEIGHTAGE TOTAL MARS
Continuous Internal 100
Assessment Theory
(CLA 1, CLA 2, CLA 50% of Average 100 Mars
3, CLA 4)
End Semester Exam 100
theory

CO’s Marks (Theory)

CO1 20

CO2 20

CO3 20

CO4 20

CO5 20

Part D – Learning Resources

TEXTBOOKS

Designing Data-Intensive Applications, 2nd Edition, by Martin Kleppmann, Chris Riccomini,


Publisher(s): O'Reilly Media, Inc.

REFERENCE BOOKS
1. “Data Centric Artificial Intelligence: A Beginner’s Guide “ , By Parikshit N. Mahalle, Gitanjali R.
Shinde, Yashwant S. Ingle, Namrata N. Wasatkar, Springer
2. “A Data-Centric Introduction to Computing” - Kathi Fisler, Shriram Krishnamurthi, Benjamin S.
Lerner, Joe Gibbs Politz, MIT Press

1. https://www.jenkins.io/user-handbook.pdf

You might also like