0% found this document useful (0 votes)
32 views1 page

Koe097big Data

The document provides details on the syllabus for a course on big data. It includes 5 units covering topics like Hadoop, HDFS, MapReduce, YARN, Spark and other big data tools. Each unit lists the topics to be covered and the proposed number of lectures.

Uploaded by

Vimal Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views1 page

Koe097big Data

The document provides details on the syllabus for a course on big data. It includes 5 units covering topics like Hadoop, HDFS, MapReduce, YARN, Spark and other big data tools. Each unit lists the topics to be covered and the proposed number of lectures.

Uploaded by

Vimal Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

KOE097: BIG DATA

DETAILED SYLLABUS 3-1-0


Unit Topic Proposed
Lecture
I Introduction to Big Data: Types of digital data, history of Big Data innovation, 08
introduction to Big Data platform, drivers for Big Data, Big Data architecture and
characteristics, 5 Vs of Big Data, Big Data technology components, Big Data
importance and applications, Big Data features – security, compliance, auditing and
protection, Big Data privacy and ethics, Big Data Analytics, Challenges of
conventional systems, intelligent data analysis, nature of data, analytic processes
and tools, analysis vs reporting, modern data analytic tools.
II Hadoop: History of Hadoop, Apache Hadoop, the Hadoop Distributed File System, 08
components of Hadoop, data format, analyzing data with Hadoop, scaling out,
Hadoop streaming, Hadoop pipes, Hadoop Echo System.
Map-Reduce: Map-Reduce framework and basics, how Map Reduce works,
developing a Map Reduce application, unit tests with MR unit, test data and local
tests, anatomy of a Map Reduce job run, failures, job scheduling, shuffle and sort,
task execution, Map Reduce types, input formats, output formats, Map Reduce
features, Real-world Map Reduce
III HDFS (Hadoop Distributed File System): Design of HDFS, HDFS concepts, 08
benefits and challenges, file sizes, block sizes and block abstraction in HDFS, data
replication, how does HDFS store, read, and write files, Java interfaces to HDFS,
command line interface, Hadoop file system interfaces, data flow, data ingest with
Flume and Scoop, Hadoop archives, Hadoop I/O: Compression, serialization, Avro
and file-based data structures. Hadoop Environment: Setting up a Hadoop cluster,
cluster specification, cluster setup and installation, Hadoop configuration, security
in Hadoop, administering Hadoop, HDFS monitoring & maintenance, Hadoop
benchmarks, Hadoop in the cloud
IV Hadoop Eco System and YARN: Hadoop ecosystem components, schedulers, fair 08
and capacity, Hadoop 2.0 New Features – Name Node high availability, HDFS
federation, MRv2, YARN, Running MRv1 in YARN.
NoSQL Databases: Introduction to NoSQL MongoDB: Introduction, data types,
creating, updating and deleing documents, querying, introduction to indexing,
capped collections
Spark: Installing spark, spark applications, jobs, stages and tasks, Resilient
Distributed Databases, anatomy of a Spark job run, Spark on YARN
SCALA: Introduction, classes and objects, basic types and operators, built-in
control structures, functions and closures, inheritance.
V Hadoop Eco System Frameworks: Applications on Big Data using Pig, Hive and 08
HBase
Pig : Introduction to PIG, Execution Modes of Pig, Comparison of Pig with
Databases, Grunt, Pig Latin, User Defined Functions, Data Processing operators,
Hive - Apache Hive architecture and installation, Hive shell, Hive services, Hive
metastore, comparison with traditional databases, HiveQL, tables, querying data and
user defined functions, sorting and aggregating, Map Reduce scripts, joins &
subqueries.
HBase – Hbase concepts, clients, example, Hbase vs RDBMS, advanced usage,
schema design, advance indexing, Zookeeper – how it helps in monitoring a cluster,
how to build applications with Zookeeper. IBM Big Data strategy, introduction to
Infosphere, BigInsights and Big Sheets, introduction to Big SQL.
Suggested Readings:
1. Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging Business Intelligence
and Analytic Trends for Today's Businesses", Wiley.
2. Big-Data Black Book, DT Editorial Services, Wiley.
3. Dirk deRoos, Chris Eaton, George Lapis, Paul Zikopoulos, Tom Deutsch, “Understanding Big Data Analytics for
Enterprise Class Hadoop and Streaming Data”, McGrawHill.
4. Thomas Erl, Wajid Khattak, Paul Buhler, “Big Data Fundamentals: Concepts, Drivers and Techniques”, Prentice Hall.

Open Elective List (VIII Semester) 2021-22 Page 20

You might also like