The document provides details on the syllabus for a course on big data. It includes 5 units covering topics like Hadoop, HDFS, MapReduce, YARN, Spark and other big data tools. Each unit lists the topics to be covered and the proposed number of lectures.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
32 views1 page
Koe097big Data
The document provides details on the syllabus for a course on big data. It includes 5 units covering topics like Hadoop, HDFS, MapReduce, YARN, Spark and other big data tools. Each unit lists the topics to be covered and the proposed number of lectures.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1
KOE097: BIG DATA
DETAILED SYLLABUS 3-1-0
Unit Topic Proposed Lecture I Introduction to Big Data: Types of digital data, history of Big Data innovation, 08 introduction to Big Data platform, drivers for Big Data, Big Data architecture and characteristics, 5 Vs of Big Data, Big Data technology components, Big Data importance and applications, Big Data features – security, compliance, auditing and protection, Big Data privacy and ethics, Big Data Analytics, Challenges of conventional systems, intelligent data analysis, nature of data, analytic processes and tools, analysis vs reporting, modern data analytic tools. II Hadoop: History of Hadoop, Apache Hadoop, the Hadoop Distributed File System, 08 components of Hadoop, data format, analyzing data with Hadoop, scaling out, Hadoop streaming, Hadoop pipes, Hadoop Echo System. Map-Reduce: Map-Reduce framework and basics, how Map Reduce works, developing a Map Reduce application, unit tests with MR unit, test data and local tests, anatomy of a Map Reduce job run, failures, job scheduling, shuffle and sort, task execution, Map Reduce types, input formats, output formats, Map Reduce features, Real-world Map Reduce III HDFS (Hadoop Distributed File System): Design of HDFS, HDFS concepts, 08 benefits and challenges, file sizes, block sizes and block abstraction in HDFS, data replication, how does HDFS store, read, and write files, Java interfaces to HDFS, command line interface, Hadoop file system interfaces, data flow, data ingest with Flume and Scoop, Hadoop archives, Hadoop I/O: Compression, serialization, Avro and file-based data structures. Hadoop Environment: Setting up a Hadoop cluster, cluster specification, cluster setup and installation, Hadoop configuration, security in Hadoop, administering Hadoop, HDFS monitoring & maintenance, Hadoop benchmarks, Hadoop in the cloud IV Hadoop Eco System and YARN: Hadoop ecosystem components, schedulers, fair 08 and capacity, Hadoop 2.0 New Features – Name Node high availability, HDFS federation, MRv2, YARN, Running MRv1 in YARN. NoSQL Databases: Introduction to NoSQL MongoDB: Introduction, data types, creating, updating and deleing documents, querying, introduction to indexing, capped collections Spark: Installing spark, spark applications, jobs, stages and tasks, Resilient Distributed Databases, anatomy of a Spark job run, Spark on YARN SCALA: Introduction, classes and objects, basic types and operators, built-in control structures, functions and closures, inheritance. V Hadoop Eco System Frameworks: Applications on Big Data using Pig, Hive and 08 HBase Pig : Introduction to PIG, Execution Modes of Pig, Comparison of Pig with Databases, Grunt, Pig Latin, User Defined Functions, Data Processing operators, Hive - Apache Hive architecture and installation, Hive shell, Hive services, Hive metastore, comparison with traditional databases, HiveQL, tables, querying data and user defined functions, sorting and aggregating, Map Reduce scripts, joins & subqueries. HBase – Hbase concepts, clients, example, Hbase vs RDBMS, advanced usage, schema design, advance indexing, Zookeeper – how it helps in monitoring a cluster, how to build applications with Zookeeper. IBM Big Data strategy, introduction to Infosphere, BigInsights and Big Sheets, introduction to Big SQL. Suggested Readings: 1. Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley. 2. Big-Data Black Book, DT Editorial Services, Wiley. 3. Dirk deRoos, Chris Eaton, George Lapis, Paul Zikopoulos, Tom Deutsch, “Understanding Big Data Analytics for Enterprise Class Hadoop and Streaming Data”, McGrawHill. 4. Thomas Erl, Wajid Khattak, Paul Buhler, “Big Data Fundamentals: Concepts, Drivers and Techniques”, Prentice Hall.
Open Elective List (VIII Semester) 2021-22 Page 20