0% found this document useful (0 votes)

48 views

Big Data Syllabus

Uploaded by

Dr. Neetu Sharma

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

Big Data Syllabus

Uploaded by

Dr. Neetu Sharma

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

BCA-IOP BADA Theory Practical

Name of The Course

Introduction to Big Data

Science L T P C IA MTE ETE PR ETE

Course Code BCABA1101 3 0 2 4 20 15 30 15 20

Prerequisite

Co requisite

Ant requisite

Course Objectives:

The student should be made to:

Course Outcomes

CO1 Describe what Data Science is and the skill sets needed to be a data scientist.

CO2 Explain in basic terms what Statistical Inference means. Identify probability distributions

commonly used as foundations for statistical modeling. Fit a model to data

CO3 Explain the significance of exploratory data analysis (EDA) in data science. Apply basic

tools (plots, graphs, summary statistics) to carry out EDA.

CO4 Describe the Data Science Process and how its components interact. Use APIs and other

tools to scrap the Web and collect data.

CO5 Identify and explain fundamental mathematical and algorithmic ingredients that constitute a

Recommendation Engine (dimensionality reduction, singular value decomposition,

principal component analysis). Build their own recommendation system using existing

components.

CO6 Describe advances and the latest trends in data science.

Text Book (s)

1.Cathy O‟Neil and Rachel Schutt. Doing Data Science, Straight Talk From The Frontline. O‟Reilly. 2014.

Reference Book (s)

1. Jure Leskovek, Anand Rajaraman and Jeffrey Ullman. Mining of Massive Datasets. v2.1, Cambridge

University Press. 2014. (free online)

2. Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. ISBN 0262018020. 2013.

3. Foster Provost and Tom Fawcett. Data Science for Business: What You Need to Know about Data
Mining

and Data-analytic Thinking. ISBN 1449361323. 2013.

4. Trevor Hastie, Robert Tibshirani and Jerome Friedman. Elements of Statistical Learning, Second
Edition.

ISBN 0387952845. 2009. (free online)

5. Avrim Blum, John Hopcroft and Ravindran Kannan. Foundations of Data Science. (Note: this is a book

currently being written by the three authors. The authors have made the first draft of their notes for the

book available online. The material is intended for a modern theoretical course in computer science.)

6. Mohammed J. Zaki and Wagner Miera Jr. Data Mining and Analysis: Fundamental Concepts and

Algorithms. Cambridge University Press. 2014.

7. Jiawei Han, Micheline Kamber and Jian Pei. Data Mining: Concepts and Techniques, Third Edition. ISBN

0123814790. 2011.

Unit-1 Introduction to BI 8 hours

What is Data Science? - Big Data and Data Science hype – and getting past the hype - Why now? –

Datafication - Current landscape of perspectives - Skill sets needed 2. Statistical Inference -

Populations and samples - Statistical modelling, probability distributions, fitting a model - Intro to

Unit-2 . Exploratory Data Analysis and the Data Science Process 8 hours

Exploratory Data Analysis and the Data Science Process - Basic tools (plots, graphs and summary

statistics) of EDA - Philosophy of EDA - The Data Science Process - Case Study: RealDirect

(online real estate firm) 4. Three Basic Machine Learning Algorithms - Linear Regression - k-
Nearest Neighbors (k-NN) - k-means.

Unit-3 Machine Learning Algorithm and Usage in Applications 8 hours

Motivating application: Filtering Spam - Why Linear Regression and k-NN are poor choices for

Filtering Spam - Naive Bayes and why it works for Filtering Spam - Data Wrangling: APIs and

other tools for scrapping the Web 6. Feature Generation and Feature Selection (Extracting Meaning

From Data) - Motivating application: user (customer) retention - Feature Generation

(brainstorming, role of domain expertise, and place for imagination) - Feature Selection algorithms

– Filters; Wrappers; Decision Trees; Random Forests.

Unit-4 Building a User-Facing Data Product 8 hours

Algorithmic ingredients of a Recommendation Engine - Dimensionality Reduction - Singular Value

Decomposition - Principal Component Analysis - Exercise: build your own recommendation

system 8. Mining Social-Network Graphs - Social networks as graphs - Clustering of graphs -

Direct discovery of communities in graphs - Partitioning of graphs - Neighborhood properties in

graphs.

Unit-5 Data Visualization and Ethical Issues 8 hours

Basic principles, ideas and tools for data visualization , Examples of inspiring (industry) projects -

Exercise: create your own visualization of a complex dataset Discussions on privacy, security,

ethics - A look back at Data Science - Next-generation data scientists.

Unit-6 Research 8 hours

The advances and the latest trends in the course as well as the latest applications of the areas

covered in the course.

The latest research conducted in the areas covered in the course.

Discussion of some latest papers published in IEEE transactions and ACM transactions, Web of

Science and SCOPUS indexed journals as well as high impact factor conferences as well as

symposiums.

Discussion on some of the latest products available in the market based on the areas covered in the

course and patents filed in the areas covered.

BCA-IOP Big Data Theory Practical

Name of The Course

Foundation of Big Data

System L T P C IA MTE ETE PR ETE

Course Code BCABI1101 3 0 2 4 20 15 30 15 20

Prerequisite

Co requisite

Ant requisite

COURSE OBJECTIVES:

Understanding Data Science Process and learning techniques, tools, Statistical Methodologies and

Machine learning algorithms used in the process.

COURSE OUTCOMES:

Course Outcomes

CO1 Students should know about design issues of Hadoop Architecture.

CO2 Students should learn various techniques for big data analytics.

CO3 Students able to identify the real time problems and able to design solution using

various big data analytics techniques.

CO4 Students use prediction of supervised and unsupervised learning.

CO5 Students can use classification of clustering algorithms

CO6 Student can understand current research trends in big data

COURSE CONTENT: Hours

UNIT I INTRODUCTION TO BIG DATA: 9

Introduction – distributed file system – Big Data and its importance, Four V‟s in bigdata, Drivers for Big
data, Big data analytics, Big data applications. Algorithms using map reduce, Matrix-Vector

Multiplication by Map Reduce.

UNIT II INTRODUCTION HADOOP : 9

Big Data – Apache Hadoop & Hadoop EcoSystem – Moving Data in and out of Hadoop –

Understanding inputs and outputs of MapReduce - Data Serialization.

UNIT- III HADOOP ARCHITECTURE: 9

Hadoop Architecture, Hadoop Storage: HDFS, Common Hadoop Shell commands , Anatomy of File

Write and Read., NameNode, Secondary NameNode, and DataNode, Hadoop MapReduce paradigm,

Map and Reduce tasks, Job, Tasktrackers - Cluster Setup – SSH & Hadoop Configuration – HDFS

Administering –Monitoring & Maintenance.

UNIT-IV HADOOP ECOSYSTEM AND YARN : 9 Hadoop

ecosystem components - Schedulers - Fair and Capacity, Hadoop 2.0 New Features- NameNode High

Availability, HDFS Federation, MRv2, YARN, Running MRv1 in YARN.

UNIT-V HIVE AND HIVEQL, HBASE: 9 Hive

Architecture and Installation, Comparison with Traditional Database, HiveQL - Querying Data - Sorting

And Aggregating, Map Reduce Scripts, Joins & Subqueries, HBase concepts- Advanced Usage, Schema

Design, Advance Indexing - PIG, Zookeeper - how it helps in monitoring a cluster, HBase uses

Zookeeper and how to Build Applications with Zookeeper.

Unit VI 5 hours

The advances and the latest trends in the course as well as the latest applications of the areas covered in
the course.

The latest research conducted in the areas covered in the course.

Discussion of some latest papers published in IEEE transactions and ACM transactions, Web of Science
and

SCOPUS indexed journals as well as high impact factor conferences as well as symposiums.

Discussion on some of the latest products available in the market based on the areas covered in the
course and

patents filed in the areas covered.

Reference Books

1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”,

2. Wiley, ISBN: 9788126551071, 2015.

3. Chris Eaton, Dirk deroos et al. , “Understanding Big data ”, McGraw Hill, 2012.

4. Tom White, “HADOOP: The definitive Guide” , O Reilly 2012.

5. Vignesh Prajapati, “Big Data Analytics with R and Haoop”, Packet Publishing 2013.

6. Tom Plunkett, Brian Macdonald et al, “Oracle Big Data Handbook”, Oracle Press, 2014.

7. Jy Liebowitz, “Big Data and Business analytics”,CRC press, 2013.

Health Statistics and Survey Final Exam, 2023
83% (6)
Health Statistics and Survey Final Exam, 2023
3 pages
Data Analytics Quantum
75% (4)
Data Analytics Quantum
142 pages
Data Science and Big Data Analytics
No ratings yet
Data Science and Big Data Analytics
2 pages
3174207
No ratings yet
3174207
4 pages
Unit 1-Big Data Analytics & Lifecycle
No ratings yet
Unit 1-Big Data Analytics & Lifecycle
130 pages
Introduction To Data Science: Cpts 483-06 - Syllabus
No ratings yet
Introduction To Data Science: Cpts 483-06 - Syllabus
5 pages
Big Data Analytics-Syllabus
No ratings yet
Big Data Analytics-Syllabus
3 pages
6th Semester Syllabus
No ratings yet
6th Semester Syllabus
20 pages
AIADS 7th Sem Syllabus Signed
No ratings yet
AIADS 7th Sem Syllabus Signed
19 pages
E - TC and Elex - Syllabus - 4102017 PDF
No ratings yet
E - TC and Elex - Syllabus - 4102017 PDF
3 pages
Da Quantum
No ratings yet
Da Quantum
143 pages
Data Warehouse and Data Mining Syllabus
No ratings yet
Data Warehouse and Data Mining Syllabus
5 pages
Code No.: MCA 202 LTC Paper: Design and Analysis of Algorithms 3 1 4
No ratings yet
Code No.: MCA 202 LTC Paper: Design and Analysis of Algorithms 3 1 4
13 pages
Se7204 Big Data Analytics L T P C
No ratings yet
Se7204 Big Data Analytics L T P C
2 pages
DATA ANALYTICS Lab
No ratings yet
DATA ANALYTICS Lab
3 pages
Common To B.E / B.Tech. - CS & IT Programmes
No ratings yet
Common To B.E / B.Tech. - CS & IT Programmes
1 page
Bda Aids Syllabus
No ratings yet
Bda Aids Syllabus
3 pages
DA
No ratings yet
DA
1 page
2021 22 4th Year
No ratings yet
2021 22 4th Year
8 pages
It6006 Data Analytics Syllabus
No ratings yet
It6006 Data Analytics Syllabus
1 page
2nd - Semester - Data Science - Final - Updated
No ratings yet
2nd - Semester - Data Science - Final - Updated
15 pages
Data Analytics Quantum
No ratings yet
Data Analytics Quantum
143 pages
17ci18 - Big Data Analytics
No ratings yet
17ci18 - Big Data Analytics
2 pages
Question Bank R
No ratings yet
Question Bank R
19 pages
CSE2002 Data - Structures - and - Algorithm - BL2023241000790 - CCM - DR Dheresh Soni
No ratings yet
CSE2002 Data - Structures - and - Algorithm - BL2023241000790 - CCM - DR Dheresh Soni
57 pages
CS8091-Big-Data-Analytics
No ratings yet
CS8091-Big-Data-Analytics
28 pages
Syllabus
No ratings yet
Syllabus
3 pages
TE7265 - Introduction To Data Science
No ratings yet
TE7265 - Introduction To Data Science
4 pages
Big Data Analytics
No ratings yet
Big Data Analytics
3 pages
Data Analytics Quantum
No ratings yet
Data Analytics Quantum
144 pages
Syllabus
No ratings yet
Syllabus
2 pages
RTNU PHD Syllabus - Computer Application
No ratings yet
RTNU PHD Syllabus - Computer Application
14 pages
Zero Lecture: Big Data Analytics Lab BCA04206 From: Megha Garg
No ratings yet
Zero Lecture: Big Data Analytics Lab BCA04206 From: Megha Garg
19 pages
2nd - Semester - Data Science - Modified
No ratings yet
2nd - Semester - Data Science - Modified
14 pages
CS250
No ratings yet
CS250
55 pages
311292053_DS-301 Introduction to Data Science
No ratings yet
311292053_DS-301 Introduction to Data Science
2 pages
BCSE206L_FOUNDATIONS-OF-DATA-SCIENCE_TH_1.0_71_BCSE206L_66 ACP
No ratings yet
BCSE206L_FOUNDATIONS-OF-DATA-SCIENCE_TH_1.0_71_BCSE206L_66 ACP
2 pages
Course Outline PDF
No ratings yet
Course Outline PDF
2 pages
Cloud Computing
0% (1)
Cloud Computing
5 pages
PythonData_Scientist_Roadmap_v2
No ratings yet
PythonData_Scientist_Roadmap_v2
5 pages
24CS3019-DATA ANALYTICS AND VISUALIZATION
No ratings yet
24CS3019-DATA ANALYTICS AND VISUALIZATION
2 pages
DAV Quantum
No ratings yet
DAV Quantum
143 pages
Python For Data Science
No ratings yet
Python For Data Science
5 pages
391 - CS8091 Big Data Analytics - Anna University 2017 Regulation Syllabus
0% (2)
391 - CS8091 Big Data Analytics - Anna University 2017 Regulation Syllabus
2 pages
326E5E
No ratings yet
326E5E
2 pages
Data Analyse
No ratings yet
Data Analyse
7 pages
Research
No ratings yet
Research
47 pages
Unit 1
No ratings yet
Unit 1
19 pages
IE494_Big_Data_Processing_Course_File_Autumn24_PMJ - PM Jat
No ratings yet
IE494_Big_Data_Processing_Course_File_Autumn24_PMJ - PM Jat
5 pages
Ip 12
No ratings yet
Ip 12
4 pages
Data Analytics
No ratings yet
Data Analytics
1 page
4th Sem Syllabus
No ratings yet
4th Sem Syllabus
12 pages
Computer Networks (CSGE301) Generic Elective - (GE) Credit:6
No ratings yet
Computer Networks (CSGE301) Generic Elective - (GE) Credit:6
16 pages
Syllabus of Course Big Data Integration
No ratings yet
Syllabus of Course Big Data Integration
9 pages
Big Data Management Syllabus
100% (1)
Big Data Management Syllabus
5 pages
VI Sem Syllabus 23-24 updated041023112550 (1)
No ratings yet
VI Sem Syllabus 23-24 updated041023112550 (1)
13 pages
Syllabus - KIT 601
No ratings yet
Syllabus - KIT 601
1 page
BIG DATA AND ANALYTICS Q&A
No ratings yet
BIG DATA AND ANALYTICS Q&A
18 pages
Master of Science in Big Data Science Modules
No ratings yet
Master of Science in Big Data Science Modules
4 pages
Big Data-2
No ratings yet
Big Data-2
3 pages
Data Science with Python: Unlocking the Power of Pandas and Numpy
From Everand
Data Science with Python: Unlocking the Power of Pandas and Numpy
Robert Johnson
No ratings yet
Inheritance in java
No ratings yet
Inheritance in java
8 pages
Java 2 Collections
No ratings yet
Java 2 Collections
32 pages
Sandeep Pal 22SCSE1040532 (1)
No ratings yet
Sandeep Pal 22SCSE1040532 (1)
16 pages
Inheritance in Java
No ratings yet
Inheritance in Java
41 pages
7-Cloud - Computing - MCAE1131-Anurag Gupta
No ratings yet
7-Cloud - Computing - MCAE1131-Anurag Gupta
8 pages
Big Data Technology E1UJ502B
No ratings yet
Big Data Technology E1UJ502B
11 pages
13 Moderator, Mediator, SEM
No ratings yet
13 Moderator, Mediator, SEM
38 pages
MA111 Exam 2019
No ratings yet
MA111 Exam 2019
4 pages
Pattern Classification
No ratings yet
Pattern Classification
39 pages
MCQ (UNIT-4, Differentiation Answers)
No ratings yet
MCQ (UNIT-4, Differentiation Answers)
5 pages
Assignment # 02: Date of Submission
No ratings yet
Assignment # 02: Date of Submission
8 pages
Solution: K K Ks S Ss S
No ratings yet
Solution: K K Ks S Ss S
3 pages
Hill Cipher With Parallel Processing
No ratings yet
Hill Cipher With Parallel Processing
7 pages
Ii (2) Tutorial Sheet ELC3410: N N NT X
No ratings yet
Ii (2) Tutorial Sheet ELC3410: N N NT X
2 pages
6 - Signal Flow Graphs
No ratings yet
6 - Signal Flow Graphs
59 pages
Mca 202 Data Structures
No ratings yet
Mca 202 Data Structures
2 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
29 pages
Unit I Discrete Fourier Transform Part B April 2018) : Ec 8553 - Discrete Time Signal Processing Question Bank
No ratings yet
Unit I Discrete Fourier Transform Part B April 2018) : Ec 8553 - Discrete Time Signal Processing Question Bank
8 pages
Connect Checker
No ratings yet
Connect Checker
9 pages
Question Bank: Subject Code & Name: Staff Incharge
No ratings yet
Question Bank: Subject Code & Name: Staff Incharge
10 pages
Cheat SHeet ECON 334
No ratings yet
Cheat SHeet ECON 334
2 pages
Immediate download Making Sense of Statistical Mechanics 1st Edition Jean Bricmont ebooks 2024
100% (6)
Immediate download Making Sense of Statistical Mechanics 1st Edition Jean Bricmont ebooks 2024
40 pages
QFTmanual
No ratings yet
QFTmanual
226 pages
Seasonal Decomposition
No ratings yet
Seasonal Decomposition
13 pages
Curriculum Vitae: Personal Details
No ratings yet
Curriculum Vitae: Personal Details
5 pages
Step Response and Frequency Response Methods
No ratings yet
Step Response and Frequency Response Methods
8 pages
Linear Programming: Simplex Method: Dr. R. K Singh Professor, Operations Management MDI, Gurgaon
No ratings yet
Linear Programming: Simplex Method: Dr. R. K Singh Professor, Operations Management MDI, Gurgaon
58 pages
Prelim Exam - Data Structures and Algorithms
100% (1)
Prelim Exam - Data Structures and Algorithms
17 pages
6 Text Clustering
No ratings yet
6 Text Clustering
66 pages
Index Models.
No ratings yet
Index Models.
26 pages
SSRN d2715517
No ratings yet
SSRN d2715517
327 pages
CISE-301: Numerical Methods: Topic 1
No ratings yet
CISE-301: Numerical Methods: Topic 1
63 pages
6.2 Extra Practice
No ratings yet
6.2 Extra Practice
2 pages
(Week 6) Test1PQ - 10 - 11 - 22
No ratings yet
(Week 6) Test1PQ - 10 - 11 - 22
1 page
2 Scad
No ratings yet
2 Scad
13 pages