Module 2.1

This document provides an introduction to big data frameworks and Hadoop. It discusses the challenges of distributed systems and how Hadoop provides a scalable and fault-tolerant solution. Hadoop uses a distributed file system called HDFS that stores data across commodity hardware. It also uses a MapReduce programming model to process large datasets in parallel. The core Hadoop components include HDFS, MapReduce, and YARN.

Uploaded by

Priyanka Bandagale

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Module 2.1

Uploaded by

Priyanka Bandagale

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 21

Module 2

Introduction to Big Data Frameworks:

Hadoop, NOSQL

Big Data Analytics

BEITC801 Prof. Priyanka Bandagale
FAMT, Ratnagiri
Learning Objectives
• In this lesson you will learn about:
• Distributed system and challenges
• Hadoop Introduction
• History
• The Hadoop Ecosystem
Big Data Challenges
Solution: Distributed system
New Approach to distributed
computing
Hadoop:
A scalable fault-tolerant distributed system for data storage and
processing
• Distribute data when the data is stored
• Process data where the data is
• Data is replicated
Hadoop Introduction
• Apache Hadoop is an open-source software framework for
storage and large-scale processing of data-sets on clusters of
commodity hardware.
• Some of the characteristics:
 Open source
 Distributed processing
 Distributed storage
 Scalable
 Reliable
 Fault-tolerant
 Economical
 Flexible
History
• Originally built as a Infrastructure for the “Nutch” project.
• Based on Google’s map reduce and google File System.
• Created by Doug Cutting in 2005 at Yahoo
• Named after his son’s toy yellow elephant.
Core Hadoop Components
1. Hadoop common package
2. Hadoop Distributed File System (HDFS)
3. Hadoop MapReduce
4. Hadoop Yet Another Resource Negotiator (YARN)
HDFS
• Hadoop Distributed File System (HDFS) is designed to reliably
store very large files across machines in a large cluster. It is
inspired by the GoogleFileSystem.
• Distribute large data file into blocks
• Blocks are managed by different nodes in the cluster
• Each block is replicated on multiple nodes
• Name node stored metadata information about files and
blocks
Nodes
• NameNode:
• Master of the system
• Maintains and manages the blocks which are present
on the DataNodes

• DataNodes:
• Slaves which are deployed on each machine and
provide the actual storage
• Responsible for serving read and write requests for
the clients
Name Node and Data Node block
Replication
How is a 400 MB file Saved on HDFS with hdfs
block size of 100 MB.
Example
Map Reduce
• Input data set is spilt into independent chunks.
• The Mapper:
 Each block is processed in isolation by a map task called
mapper
 Map task runs on the node where the block is stored

• The Reducer:
 Consolidate result from different mappers
 Produce final output
MapReduce Programming Phases and
Deamons

Map Reduce Framework

Phases:
Daemons:
Map- Converts I/P into
Job Tracker- Master
Key Value Pair
Schedule task
Reduce- Combines O/P
Task Tracker- Slave
produce of mappers
executesTask
and a resulted set
•Jobtracker:
• takes care of all the job scheduling and assign tasks to Task Trackers.
•TaskTracker:
• a node in the cluster that accepts tasks - Map, Reduce and Shuffle operations -
from a jobtracker
Map Reduce Architecture
Versions of Hadoop
YARN
Hadoop Ecosystem
Hadoop Ecosystem
• HDFS -> Hadoop Distributed File System
• YARN -> Yet Another Resource Negotiator
• MapReduce -> Data processing using programming
• Spark -> In-memory Data Processing
• PIG, HIVE-> Data Processing Services using Query (SQL-like)
• HBase -> NoSQL Database
• Mahout, Spark MLlib -> Machine Learning
• Apache Drill -> SQL on Hadoop
• Zookeeper -> Managing Cluster
• Oozie -> Job Scheduling
• Flume, Sqoop -> Data Ingesting Services
• Solr & Lucene -> Searching & Indexing
• Ambari -> Provision, Monitor and Maintain cluster

Upwork Full Course
100% (10)
Upwork Full Course
23 pages
Eod & Bod
No ratings yet
Eod & Bod
36 pages
OFM 2012 Administration O
100% (1)
OFM 2012 Administration O
262 pages
Software Project Management: Chapter Five Software Effort Estimation
No ratings yet
Software Project Management: Chapter Five Software Effort Estimation
61 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Shortnotes For Cloud
No ratings yet
Shortnotes For Cloud
22 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
No ratings yet
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
20 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
Hadoop and Related Tools
No ratings yet
Hadoop and Related Tools
57 pages
Unit V Cloud Technologies and Advancements
No ratings yet
Unit V Cloud Technologies and Advancements
33 pages
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
No ratings yet
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
53 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
CC unit5
No ratings yet
CC unit5
27 pages
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
No ratings yet
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
23 pages
Unit 2
No ratings yet
Unit 2
7 pages
Hadoop Overview: Open Source Framework Processing Large Amounts of Heterogeneous Data Sets Distributed Fashion
No ratings yet
Hadoop Overview: Open Source Framework Processing Large Amounts of Heterogeneous Data Sets Distributed Fashion
62 pages
Unit-2 1
No ratings yet
Unit-2 1
93 pages
Hadoop Important Lecture
No ratings yet
Hadoop Important Lecture
38 pages
Unit 2 BDA
No ratings yet
Unit 2 BDA
4 pages
Hadoop Intro - Part1
No ratings yet
Hadoop Intro - Part1
45 pages
Hadoop Notesforstudents
No ratings yet
Hadoop Notesforstudents
13 pages
Hadoop and Pig Overview - Hands-On: Outline of Tutorial
No ratings yet
Hadoop and Pig Overview - Hands-On: Outline of Tutorial
52 pages
Unit-3 BDA
No ratings yet
Unit-3 BDA
30 pages
Unit 3 Da
No ratings yet
Unit 3 Da
43 pages
BDA-Module2
No ratings yet
BDA-Module2
43 pages
Unit Iv-1
No ratings yet
Unit Iv-1
84 pages
learn
No ratings yet
learn
16 pages
Introduction To Big Data and Hadoop
100% (1)
Introduction To Big Data and Hadoop
29 pages
Big Data NoSLQ Kopyası
No ratings yet
Big Data NoSLQ Kopyası
51 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
CC 2
No ratings yet
CC 2
25 pages
Lecture 5 - Hadoop and Mapreduce
No ratings yet
Lecture 5 - Hadoop and Mapreduce
30 pages
module -1-Part II
No ratings yet
module -1-Part II
45 pages
HDFS 79
No ratings yet
HDFS 79
74 pages
UNIT 4
No ratings yet
UNIT 4
85 pages
Big Data Analytics unit wise short note
No ratings yet
Big Data Analytics unit wise short note
6 pages
Unit 3 ETI (BDA)
No ratings yet
Unit 3 ETI (BDA)
34 pages
Introduction to Hadoop
No ratings yet
Introduction to Hadoop
18 pages
Hadoop Ankit
No ratings yet
Hadoop Ankit
20 pages
Unit 1 Haoop Architecture
No ratings yet
Unit 1 Haoop Architecture
26 pages
Hadoop: - Introduction To Hadoop - Hadoop Distributed File System (HDFS) - Mapreduce - Other Tools / Frameworks
No ratings yet
Hadoop: - Introduction To Hadoop - Hadoop Distributed File System (HDFS) - Mapreduce - Other Tools / Frameworks
8 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
Module 2
No ratings yet
Module 2
23 pages
DS Unit 4.1
No ratings yet
DS Unit 4.1
14 pages
Cse3002 Big Data m1
No ratings yet
Cse3002 Big Data m1
62 pages
Unit V FRAMEWORKS AND VISUALIZATION
No ratings yet
Unit V FRAMEWORKS AND VISUALIZATION
71 pages
Chapter 4 - Big Data Tools, Techniques, and Systems
No ratings yet
Chapter 4 - Big Data Tools, Techniques, and Systems
19 pages
Intro ToHadoop-Unit 04
No ratings yet
Intro ToHadoop-Unit 04
24 pages
BDM HadoopMapReduce
No ratings yet
BDM HadoopMapReduce
63 pages
Module 4_hadoop
No ratings yet
Module 4_hadoop
5 pages
The Hadoop Fair Scheduler: Matei Zaharia
No ratings yet
The Hadoop Fair Scheduler: Matei Zaharia
31 pages
BDS-Session6.pptx
No ratings yet
BDS-Session6.pptx
7 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
BDA Unit-2
No ratings yet
BDA Unit-2
90 pages
Module II
No ratings yet
Module II
46 pages
DSCI 5350 - Lecture 2 PDF
No ratings yet
DSCI 5350 - Lecture 2 PDF
54 pages
Bda Unit-2
No ratings yet
Bda Unit-2
52 pages
Chapter 10
No ratings yet
Chapter 10
45 pages
BY K.Karthikeyan: Hadoop & Map Reduce
No ratings yet
BY K.Karthikeyan: Hadoop & Map Reduce
7 pages
Bda 18CS72 Mod-2
No ratings yet
Bda 18CS72 Mod-2
152 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
From Everand
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
William Smith
No ratings yet
Frontline Solvers User Guide
No ratings yet
Frontline Solvers User Guide
271 pages
Aplikasi 2
No ratings yet
Aplikasi 2
24 pages
Purchasing Tables Sapreader: Message Determination Deactivated
No ratings yet
Purchasing Tables Sapreader: Message Determination Deactivated
11 pages
Tutorial EN IrriPro3.7X PDF
No ratings yet
Tutorial EN IrriPro3.7X PDF
63 pages
Psat-1 3 4
No ratings yet
Psat-1 3 4
7 pages
The Myth of Metaphor
No ratings yet
The Myth of Metaphor
7 pages
BeeRebel - Getting Started Guide
No ratings yet
BeeRebel - Getting Started Guide
11 pages
Antivirus Scanning Exclusions For Skype For Business Server 2015
No ratings yet
Antivirus Scanning Exclusions For Skype For Business Server 2015
3 pages
Base Gecor Manual
No ratings yet
Base Gecor Manual
40 pages
Project Browser or Properties window is missing in Revit
No ratings yet
Project Browser or Properties window is missing in Revit
4 pages
Unit 2 Electronic Spreadsheet
No ratings yet
Unit 2 Electronic Spreadsheet
6 pages
Software Engineering For MSBTE I Scheme (IV - Comp
No ratings yet
Software Engineering For MSBTE I Scheme (IV - Comp
122 pages
APSI
No ratings yet
APSI
10 pages
Praveen Experience 2+ Years (MERN) Stack Developer
No ratings yet
Praveen Experience 2+ Years (MERN) Stack Developer
3 pages
SAP BI Authorizations
No ratings yet
SAP BI Authorizations
8 pages
Class 6th Computer Notes
100% (1)
Class 6th Computer Notes
5 pages
SAP Basis Sample Resume
No ratings yet
SAP Basis Sample Resume
2 pages
Ambler S.W.-Agile Modeling (AM) - A Practices-Based Approach For Real-World Development
No ratings yet
Ambler S.W.-Agile Modeling (AM) - A Practices-Based Approach For Real-World Development
80 pages
Pooja Bhat: Work Experience
No ratings yet
Pooja Bhat: Work Experience
2 pages
SQL Server Security Best Practices
No ratings yet
SQL Server Security Best Practices
11 pages
Awp 119
No ratings yet
Awp 119
61 pages
QFlash Guide
No ratings yet
QFlash Guide
3 pages
Chapter 2 Multimedia Authoring and Tools
No ratings yet
Chapter 2 Multimedia Authoring and Tools
3 pages
How To Create Simple Popup Modal in Django Using CSS, Js in 5 Minutes - by Bilge Demirkaya - Medium
No ratings yet
How To Create Simple Popup Modal in Django Using CSS, Js in 5 Minutes - by Bilge Demirkaya - Medium
6 pages
PHP Chap-1 Notes
No ratings yet
PHP Chap-1 Notes
28 pages
Oop Practice Problems
No ratings yet
Oop Practice Problems
6 pages

Module 2.1

Uploaded by

Module 2.1

Uploaded by

Module 2

Introduction to Big Data Frameworks:

Big Data Analytics

Map Reduce Framework

You might also like