Big Data Analytics-Syllabus
Big Data Analytics-Syllabus
3 0 0 40 60 3
Big data overview, BI versus data science, current analytical architecture, emerging big data
ecosystem and new approach to analytics, key roles for new big data ecosystem, big data
analytics examples, analysis vs reporting
Details: In this unit case study of flipkart, amazon, twitter, Facebook, etc. from big data
characteristics point of view needs to be taken. The students should be able to comment on
volume, data type, density of data, verocity, velocity of data for different applications. Discuss
with students the big data characteristics for different applications. Students can give
presentations of same and the course instructor can discuss it in class. (minimum three case
studies)
Details: In this unit case study of flipkart, amazon, twitter, Facebook, etc. should be taken. The
course instructor should present an analytical plan for any one business problem related to any
application. A group of students can present analytical plan for different applications. Discuss
the analytical plan given by students in the class. (minimum 3 plans needs to be discussed)
Details : Reference 1:
K-means, determining clusters in kmeans, reasons to choose and caution, linear regression
model in detail, logistic regression in detail, customer churn case study, reason to choose and
caution for regression models, chapter 9 in details from Reference 1
Unit 4 Hadoop 9
MKS
History of Hadoop- the Hadoop Distributed File System – Components of Hadoop Analyzing the
Data with Hadoop- Scaling Out- Hadoop Streaming- Design of HDFS- How Map Reduce
Works-Anatomy of a Map Reduce Job run-Failures-Job Scheduling-Shuffle and Sort – Apache
Spark , Spark ML libraries
Details : Reference 3
Chapter 1 – brief history of Hadoop, Chapter 2 – weather dataset, analyzing data using unix
tools, analyzing the data with Hadoop, scaling out, chapter 3 – design of hdfs, hdfs concept, data
flow, anatomy of file read, anatomy of file write, Chapter 6 – all topics.
Unit 5 Advanced Analytics - Tools and technology 9 MKS
Applications on Big Data Using Pig and Hive – Data processing operators in Pig – Hive services
– HiveQL – Querying Data in Hive - fundamentals of HBase and ZooKeeper, .Flume, SQOOP
Details: The architecture, functionalities, limitations of each tool, differences between different
tools. Reference 3 and Reference 1 can be used. The unit should have taught from more practical
approach rather than theory.
References:
1. Data Science & Big Data Analytics, Discovering, Analyzing, Visualizing and Presenting
Data , EMC Education Services, WILEY, ISBN: 978-1-118-87613-8 January
2015 432 Pages
2. Chris Eaton, Dirk deroos et al. “Understanding Big data ”, McGraw Hill, 2012
3. Tom White “Hadoop: The Definitive Guide” Third Edition, O’reilly Media, 2012.
4. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”,
Wiley, ISBN: 9788126551071, 2015.
5. Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured
Streaming and Spark Machine Learning library by Hien Luu , Apress
6. Using Flume , by Hari Shreedharan , Released September 2014 , Publisher(s): O'Reilly
Media, Inc. ,ISBN: 9781449368302
7. Apache Sqoop Cookbook, by Kathleen Ting, Jarek Jarcec Cecho,Released July
2013,Publisher(s): O'Reilly Media, Inc.,ISBN: 9781449364625
Outcomes:
Students will be able to:
1. Work with big data platform and explore the big data analytics techniques business
applications.
2. Design efficient algorithms for mining the data from large volumes and analyzing of the
data and extracting knowledge.
3. Analyze the HADOOP and Map Reduce technologies associated with big data analytics.
4. Explore on Big Data applications Using Pig and Hive.
5. Understand the fundamentals of various big data analytics techniques.