0% found this document useful (0 votes)
122 views

Big Data - Hadoop & Spark Training Syllabus: Tamilboomi

The document outlines the syllabus for a Big Data training course on Hadoop and Spark. The course aims to provide both theoretical knowledge and hands-on experience working with Hadoop and Spark tools. Students will learn about Hadoop architecture and components like HDFS, MapReduce, Pig and Hive. They will also learn Spark programming with RDDs and DataFrames, develop streaming applications using Spark Streaming and integrate Spark with data sources like Kafka. The course combines lectures, demonstrations and real-world projects to help students understand big data systems and develop skills for working with Hadoop and Spark.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views

Big Data - Hadoop & Spark Training Syllabus: Tamilboomi

The document outlines the syllabus for a Big Data training course on Hadoop and Spark. The course aims to provide both theoretical knowledge and hands-on experience working with Hadoop and Spark tools. Students will learn about Hadoop architecture and components like HDFS, MapReduce, Pig and Hive. They will also learn Spark programming with RDDs and DataFrames, develop streaming applications using Spark Streaming and integrate Spark with data sources like Kafka. The course combines lectures, demonstrations and real-world projects to help students understand big data systems and develop skills for working with Hadoop and Spark.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Big Data – Hadoop & Spark Training Syllabus Tamilboomi

What is hadoop? After this class you will be able to,

Hadoop is a platform written in java where we  Have in-depth knowledge about


can able to process large amount of data. hadoop.
Hadoop eco system has lots of tools which  Have hands-on experience on hadoop.
make processing the bigdata made easy.  Complete a project on hadoop
independently.
Let’s learn how to do that end to end..!!
 Know how to switch career to hadoop
Objective: from any other technology.
 Develop your own spark application.
Over the past years, Hadoop & Spark has seen  Understanding different components of
enormous industry adoption and facing lack of spark.
skills in the market. To help bridge the gap we  Performance tune a spark application.
have designed this course with industry
 Prepare and complete Horton works
expectations with real time examples. This is
spark developer certification (With min
course will help you understand variety of big
1 month of practice)
data application development options and let
 Build data pipeline using spark API’s and
you develop your own and Performance tune
Dataframes.
the same.
 Analyze Spark jobs using the UI’s and
This course is for, logs.
 Create Streaming jobs and run on YARN
 Professionals who wants to learn & cluster.
develop Hadoop & Spark applications.
 Professionals who wants to do Course Overview:
certification (Hortonworks : HDPCD,
HDPCDSPARK)(Cloudera: CCA175,  Introduction to Hadoop
CCA159).  Hadoop Architecture In-depth
 And those are is interested to learn travel.
about latest technology for their career  Map Reduce 1.0 & YARN
improvement.  Pig & Hive
Course Structure:  Sqoop & Flume
 Hbase, oozie & Zookeeper
 This course is designed with 50% theory  Welcome to Spark.
and 50% Hands on.  Programming with RDD.
 You will be given real time POC to solve  SparkSQL & DataFrames.
and learn.  Spark Job Execution.
Hadoop – Project (English) – Click Here  Cluster Architecture for Spark.
 Introduction to Kafka.
Hadoop - Intro Session (Tamil) – Click Here  Introduction to Spark Streaming.

SPARK - Intro Session (Tamil) – Click Here

Tamilboomi Page 1
Big Data – Hadoop & Spark Training Syllabus Tamilboomi

Module 1: Introduction to Hadoop World: Module 4: Pig & Hive.

 Dataaaaaaa.....Bigdata..!  Hive introduction.


 What is bigdata? 3 + 1 V's.  Hive data model.
 What is Hadoop , why hadoop & Its  Hive implementation of sample project.
history.  Pig Introduction.
 Hadoop Eco System an overview.  Pig Data structure.
(HDFS,MAPREDUCE,SQOOP,FLUME,PIG,  Pig Implementation on sample project.
HIVE,OOZIE,HBASE..etc)  How pig & hive is used in real time
 Current Requirements and Future project?
possibilities in Hadoop.  Module 4 assignment.
 RDBMS vs Hadoop
 Wait..Finally what hadoop is not? Module 5: Sqoop & Flume.
 Do we need java to learn hadoop?  Flume introduction.
 Hadoop installation  Flume configuration.
Module 2: Hadoop Architecture In-depth  Flume sample Project.
 Sqoop Introduction.
travel:
 Sqoop configuration.
 HDFS - An introduction.  Sqoop Sample project.
 How data is stored in hdfs? (Travel of a
byte). Module 6: Hbase, oozie & Zookeeper
 Hadoop Daemons:  oozie introduction.
o Name node.  oozie Overview and configuration.
o Data node.  zookeeper overview.
o Job Tracker.  HBASE Introduction.
o Task tracker.  HBASE Overview.
 Fault tolerance in hadoop.  SPARK Over view
 HA mode in HDFS.
 How files are handled in projects SPARK
(sample Project Scenario Execution)
Intro Session(Tamil) – Click Here
Module 3: Map Reduce 1.0 & YARN.
Module 1: Welcome to Spark:
 Mapreduce history.
 Welcome to the world of Spark.
 How Map Reduce is being used in
 Bye Bye Hadoop? (Hadoop Vs Spark).
Projects.
 Spark Components:
 Mapreduce architecture,Key-Value pair.
o Spark Core
 YARN 2.0 architecture.
o Spark SQL
 Java Implementation of map reduce.
o Graphx
(Sample POC)
o Mlib
 Mapper, Reducer, Combiner Different
 Spark Use cases in real time.
combination.

Tamilboomi Page 2
Big Data – Hadoop & Spark Training Syllabus Tamilboomi

Hands on:  Job Performance (tuning).

 Installing and configuring spark in your Hands on:


machine.
 Running a sample program in spark.  Visualizing DAG execution.
 Executing a spark use case.  Measuring memory usage.
 Understanding
Module 2: Programming with RDD: performance.

 What is RDD? Module 5: Introduction to Kafka.


 Why RDD?
 How RDD gets executed in a spark  Introduction to Kafka.
application.  Kafka architecture.
 Producers,Consumers in Kafka.
 Transformations in RDD.
 Actions in RDD.  Working with kafka.
 RDD Programming API’s. Hands on:
Hands On:  Installing & configuring
 Creating RDD from a Data file. kafka.
 Applying transformations &  Producing and consuming
actions in RDD. messages.
 Interactive queries using RDD. Module 6: Spark Streaming.
Module 3: Spark SQL/DataFrames.  Introduction to Spark Streaming.
 SparkSQL/Dataframe Uses.  DSTREAM API’s and Stateful
 DataFrame / SQL API’s Streams.
 Spark & Hive Integration.  Realiablity and fault recovery.
 Catalyst query optimization. Hands on:
Hands on:  Creating DStream from source.
 Create dataframe from a file.  Integration of Kafka and Spark
 Create dataframe from a table. streaming.
 Caching and reusing  Developing a kafka-spark
dataframes. application.
 Query with dataframes API and  Viewing Stream jobs in WebUI.
SQL. ----------------------------------------------------------
Module 4: Spark Execution & Optimization.

 Jobs Stages & tasks.


 Partitions and Shuffles. For More details :
 Data locality.

Tamilboomi Page 3
Big Data – Hadoop & Spark Training Syllabus Tamilboomi

Mail: [email protected],
[email protected]
Whatsapp: +91 9619663272
Visit to: www.tamilboomi.com
For cloudera VM and Free Bigdata Startup
kit: Startup kit link Click here.

Happy Learning..!

Tamilboomi Page 4

You might also like