0% found this document useful (0 votes)
44 views

2024 25 ODD CE449 BDA Syllabus

Uploaded by

vekejo8617
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

2024 25 ODD CE449 BDA Syllabus

Uploaded by

vekejo8617
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CE449: BIG DATA ANALYTICS

Credits and Hours:

Teaching Scheme Theory Practical Tutorial Total Credit

Hours/week 3 2 0 5
4
Marks 100 50 0 150

Pre-requisite courses:
● Linux Operating System
● Database Management System
Objectives of the Course:
● Correlations of large amounts of data to uncover hidden patterns, and other insights
● To learn new services and products that will utilize for Big Data.

● Apply different technique for various sector of Big Data Analytics

Outline of the Course:


Sr. Title of the unit Minimum number
No. of hours
1. Big Data and Analytics 02
2. Introduction to Hadoop and Hadoop Architecture 07
3. HDFS, HIVE AND HBASE 08
4. Apache Spark 10
5. Spark SQL and Spark Streaming 10
6. Graph Analytics and Data Visualization 08
Total hours (Theory) : 45
Total hours (Lab) : 30
Total hours : 75
Detailed Syllabus:
1. Big Data and Analytics 02 Hours 10%
Introduction to Big Data, Big Data Characteristics, Types of Big
Data, Traditional Versus Big Data Approach, Technologies
Available for Big Data, Infrastructure for Big Data, Use of Data
Analytics, Big Data Challenges.
2. Introduction to Hadoop and Hadoop Architecture 07 Hours 15%
Big Data – Apache Hadoop & Hadoop EcoSystem, Moving
Data in and out of Hadoop – Understanding inputs and outputs
in Hadoop, Data Serialization
3. HDFS, HIVE AND HBASE 08 Hours 20%
HDFS-Overview, Installation and Shell, Hive Architecture and
Installation, Comparison with Traditional Database, HiveQL
Querying Data, Sorting And Aggregating, Map Reduce Scripts,
Joins & Sub queries, HBase concepts, Advanced Usage, Schema
Design, Advance Indexing
4. Apache Spark 10 Hours 20%
Introduction to Data Analysis with Spark, Downloading Spark and
Getting Started, Apache Spark components and API stack,
Application and Spark Session, Introduction to RDD, RDD and
Data Frames.
5. Spark SQL and Spark Streaming 10 Hours 20%
Big Data and Spark SQL, Spark-Managed Tables, Reading Tables
into Data Frames, Aggregations, Joins, Creating Views, Spark
Streaming and Challenges of Stream Processing, Spark’s
Streaming APIs, Spark streaming case study.
6. Graph Analytics and Data Visualization 08 Hours 15%
Apache Spark GraphX: Property Graph, Graph Operator,
SubGraph, Triplet, Neo4j: Modeling data with Neo4j, Cypher
Query Language: General clauses, Read and Write clauses.
Big Data Visualization with Power BI, Apache Super-Set
Course Outcome (COs):
At the end of the course, the students will be able to
CO1 Understand the key issues in big data management and its associated applications in intelligent
business and scientific computing
CO2 Acquire fundamental enabling techniques and scalable algorithms like Hadoop, Map Reduce, Hive
and Spark in big data analytics.
CO3 Evaluate and apply appropriate principles, techniques and theories to large-scale data science
problems using various databases with analytics and visualizations.

Sr. No Course Outcomes (Cos) Employability/


Entrepreneurship/
Skill development
1.
Understand the key issues in big data management
and its associated applications in intelligent Skill development
business and scientific computing

2. Acquire fundamental enabling techniques and scalable


algorithms like Hadoop, Map Reduce and NO SQL in big Employability
data analytics.
3. Interpret business models and scientific computing Entrepreneurship
paradigms and apply software tools for big data analytics. Employability

Course Articulation Matrix:


PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2

CO1 2 2 1 - - - - - - - - - 1 1
CO2 1 2 3 1 3 - - - - - - - 2 -
CO3 - 1 3 3 3 - - - - - - - 2 -

Enter correlation levels 1, 2 or 3 as defined below:


1: Slight (Low) 2: Moderate (Medium) 3: Substantial (High)
If there is no correlation, put “-”

Recommended Study Material:


❖ Text book:
1. Bart Baesens , Analytics in a Big Data World: The Essential Guide to Data
Science and its Applications, ,Wiley, 2014
2. Jules S. Damji, Learning SparkLightning-Fast Data Analytics, O’Reilly
Media Inc, 2020.
3. Spark: The Definitive Guide by Bill Chambers and Matei Zaharia, O’Reilly
Media Inc. 2018.

❖ Reference book:
1. Xyz Dirk Deroos et al., Hadoop for Dummies, Dreamtech Press, 2014.
2. Chuck Lam, Hadoop in Action, December, 2010.
3. Leskovec, Rajaraman, Ullman, Mining of Massive Datasets, Cambridge
University Press.
4. I.H. Witten and E. Frank, Data Mining: Practical Machine learning tools
and techniques.
❖ Web material:
1. https://cognitiveclass.ai/
2. https://codelabs.developers.google.com/
❖ Software & Platform:
1. Python, Scala, R
2. Hadoop, HBase, Hive, Spark
3. Casandra, Neo4j, NoSQL

You might also like