0% found this document useful (0 votes)
6 views

Hadoop

Uploaded by

babjeereddy
Copyright
© © All Rights Reserved
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Hadoop

Uploaded by

babjeereddy
Copyright
© © All Rights Reserved
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Big Data: Module

Module Name: Big Data


C

Topic # Learning
Topic Name Objectiv
e#

1 Introduction Big Data and Hadoop


1
2
3
4

2 HDFS and MapReduce


1
2
3
4
5

3 Advanced MapReduce and Design Patterns


1
2

4 Sqoop and Flume


1
2
3
4

5 Pig
1
2

6 Data Processing using Pig


1

7 Hive
1
2
3

8 Hive Cont.
1
2
3
4

9 HBase, Yarn, Spark and Storm


1
2
3
4
5

10 Scala: Getting Started


1
2
3
4
5
6

11 Spark
1
2
3
4
5
6
7
8
9
10
11

12 Data Streaming with SPARK


1
2
3
4
5

13 Handling Fast Data with Apache Spark SQL and Streaming


1
2
3
4
5
6

14 Applying the Lambda Architecture with Spark


1
2
3
4
5
6

15 Hadoop Solution discussion


1
2

16 Case Study
1
Big Data: Module Table of Contents

Coverage of Each Module

Learning Objective for the Topics

Current Market Challenges, Why Big data ?


Hadoop Architecture, Name node, Secondary name node, Data Nodes
File system metadata storage, fsimage, Editlog
Introduction to storage(HDFS) and Processing(MapReduce), Responsibilities of Job tracker and Task tracker
Estimated Time Duration for this Topic

File copy from Local to HDFS and HDFS to Local


Different file formats available in HDFS
Custom input format and Different input/output format classes
Sequence files processing
Partitioners, Combiners and Distributed Cache
Estimated Time Duration for this Topic

Unstructured data processing


different joining techniques in Map Reduces and Map Reduce design patterns
Estimated Time Duration for this Topic

Import external Relational database data into Hadoop using Sqoop


Export hdfs data to external relational database using Sqoop
Import only incremental data, How to set number of mappers in Sqoop job
Bringing weblog and social media data into HDFS using Flume
Estimated Time Duration for this Topic

Use of Pig in ETL processing


Different example of using pig to analysing big data sets
Estimated Time Duration for this Topic

UDF and Performance optimization techniques available in Pig


Estimated Time Duration for this Topic

Hive-meta Store
Hive Architecture
Hive UDF
Estimated Time Duration for this Topic

Partitioning
Bucking
Indexing
Different Performance Optimization techniques
Estimated Time Duration for this Topic

NoSQL discussion
Architecture and role of HBase
Other NoSQL databases and their use cases
Yarn Vs Hadoop 1.X
Need of real time data analysis and benefits of using Storm and Spark
Estimated Time Duration for this Topic

Introduction
Building blocks
Diving for Data
Wrapping up
Need of real time data analysis and benefits of using Storm and Spark
Doubt Clarification
Estimated Time Duration for this Topic

Introduction to Spark
Transformations
Key Value Methods and Caching Data
Distribution and Instrumentation
Spark Streaming
Optimization
Data Exploration and Analysis
Transforming and Cleaning Unstructured Data
Summarizing Data Along Dimensions
Modeling Relationships
Doubt Clarification
Estimated Time Duration for this Topic

Getting Started with Discretized Streams


Transforming Blocks of Data with Dstreams
Applying ML Algorithms on Dstreams
Building a Robust Spark Streaming Application
Doubt Clarification
Estimated Time Duration for this Topic

Introduction
Querying Data with the DataFrames
Improving Type Safety with Datasets
Processing Data with the Streaming API
Optimizing, Structured Streaming, and Spark 2.x
Doubt Clarification
Estimated Time Duration for this Topic
Introduction
Batch Layer with Apache Spark
Speed Layer with Spark Streaming
Advanced Streaming Operations
Streaming Ingest with Spark Streaming
Doubt Clarification
Estimated Time Duration for this Topic

Different ways and aspect of designing Big data solution stack


Real time
Estimated Time Duration for this Topic

Case Study
Estimated Time Duration for this Topic

Total Time Duration


Total Time Duration (In Hours)
Estimate
Estimate
d Total
d
Duration Estimate
Duration
In Mins d
In Mins
for Duration
for
Hands- In Mins
Theory
on

60 60 120

60 60 120

500 300
400

500 300 400

0 0 0

300 150 450

300 150 450

300 200 400

300 200 400

300 100 400


300 100 400

300 100 400

300 100 400

0
0

0 0 0

300 100 400

300 100 400

120 100 220

180 180
300 100 400

100 80 180

60 80 140

100 100
260 160 420

0
0 0 0

300 360 660

120 120
420 360 780
0

0 0 0

650 650
0 650 650

3040 2280 4820


50.66667 38 80.33333

You might also like