SlideShare a Scribd company logo
A
Seminar presentation
On
 What is Big-Data?
 What is Hadoop?
 Why Distributed File System?
 Hadoop Distributed File System (HDFS)
 Replication & Rack Awareness
 Major Problems in Distributed File
System
 Hadoop Computing Model(MapReduce)
 Advantages Of Hadoop
 Disadvantages Of Hadoop
 Prominent Users
 Tools
 Big data refers to data volumes in the range of
exabytes (1018) and beyond.i.e.large amount of data
 We define “Big Data” as the amount of data just beyond
technology’s capability to store,manage and process efficiently.
Hadoop: A distributed framework for Big Data
Doug Cutting
2005: Doug Cutting and Michael J. Cafarella developed
Hadoop to support distribution for the Nutch search
engine project.
The project was funded by Yahoo.
2006: Yahoo gave the project to Apache
Software Foundation.
• Hadoop was created by Doug Cutting and Mike
Cafarella in 2005. Cutting, who was working at Yahoo!
• Hadoop is a software framework for distributed
processing of large datasets across large clusters of
computers
• Hadoop is open-source implementation for Google
MapReduce
• Hadoop is based on a simple programming model called
MapReduce
• Hadoop is based on a simple data model, any data will
fit.
• ApacheHadoop is an open-source software
framework written in Java for distributed storage
• Hadoop framework consists on two main layers
• Distributed file system (HDFS)
• Execution engine (MapReduce)
• Hadoop is one time write many time read.
Parallel processing used in hadoop for processing data so less time
required for processing huge amount of data.
Datanodes can be
organized into racks
 Single name node and many data nodes
 Name node maintains the file system
metadata
 Files are split into fixed sized blocks
and stored on data nodes (Default
64MB)
 Data blocks are replicated for fault
tolerance and fast access (Default is 3)
 Datanodes periodically send heartbeats
to namenode
 HDFS is a master-slave architecture
 Master: name node
 Slaves: data nodes (100s or 1000s of
nodes)
JOB TRACKER
TASK
TRACKER
Reduce
Map
TASK
TRACKER
Map
Reduce
TASK
TRACKER
Map
Reduce
Client
 Under Replication:- Total Replication < Replication
Factor
 Over Replication:- Total Replication > Replication
Factor
Hadoop: A distributed framework for Big Data
Hadoop: A distributed framework for Big Data
Hadoop: A distributed framework for Big Data
1)Hardware Failure
2)Large Data Sets
3) Redundancy Of Data
Two main phases: Map and Reduce
• Any job is converted into map and reduce tasks
• Developers need ONLY to implement the Map
and Reduce classes
MapReduce is a master-slave architecture
• Master: JobTracker
• Slaves: TaskTrackers (100s or 1000s of
tasktrackers)
• Every data node is running a tasktracker
Mapper and Reducers consume and produce (Key, Value) pairs
• Users define the data type of the Key and Value
• Shuffling & Sorting phase:
• Map output is shuffled such that all same-key records go to the same reducer
• Each reducer may receive multiple key sets
• Each reducer sorts its records to group similar keys, then process each group
Job: Count the occurrences of each word in a data set
Map
Tasks
Reduce
Tasks
Reduce phase is optional: Jobs can be Map Only
1)Scalable
2)Cost effective
3)Flexible
4)Fast
5)Resilient to failure
1)Security Concerns
2)Vulnerable By Nature
3)Not Fit for Small Data
4)Potential Stability Issues
5)General Limitations
1)Yahoo!
2)Facebook
3)Hadoop hosting in the Cloud
4)Hadoop on Microsoft Azure
5)Hadoop on Amazon EC2/S3 services
6)Amazon Elastic MapReduce
NoSQL:-
Databases,MongoDB, CouchDB, Cassandra, Redis,
BigTable, Hbase, Hypertable, ZooKeeper .
MapReduce :-
Hadoop, Hive, Pig, Cascading, Caffeine, S4, MapR,
Flume, Kafka, Oozie, Greenplum
Storage:-
S3, Hadoop Distributed File System
Servers :-
EC2, Google App Engine, Elastic, Beanstalk.
Processing :-
R, Yahoo! Pipes, Mechanical Turk,ElasticSearch,
BigSheets, Tinkerpop.
Hadoop: A distributed framework for Big Data
Ad

More Related Content

What's hot (20)

Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
Ajit Koti
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Ece Seçil AKBAŞ
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Atul Kushwaha
 
Hadoop
HadoopHadoop
Hadoop
Mallikarjuna G D
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
Kelly Technologies
 
Analytics 3
Analytics 3Analytics 3
Analytics 3
Srikanth Ayithy
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at mylife
responseteam
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Hadoop data analysis
Hadoop data analysisHadoop data analysis
Hadoop data analysis
Vakul Vankadaru
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
Sandeep Singh
 
Hadoop
HadoopHadoop
Hadoop
Oded Rotter
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
MaharajothiP
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
Sohini~~ Music
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
Kibrom Gebrehiwot
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Chirag Ahuja
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
Lokesh Ramaswamy
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
Tarak Tar
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
Ajit Koti
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
Kelly Technologies
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at mylife
responseteam
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
Sandeep Singh
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
MaharajothiP
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Chirag Ahuja
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
Tarak Tar
 

Similar to Hadoop: A distributed framework for Big Data (20)

Hadoop
HadoopHadoop
Hadoop
Anil Reddy
 
Fundamental of Big Data with Hadoop and Hive
Fundamental of Big Data with Hadoop and HiveFundamental of Big Data with Hadoop and Hive
Fundamental of Big Data with Hadoop and Hive
Sharjeel Imtiaz
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
York University
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - Hadoop
Talentica Software
 
Hadoop and MapReduce addDdaDadadDDAD.pptx
Hadoop and MapReduce addDdaDadadDDAD.pptxHadoop and MapReduce addDdaDadadDDAD.pptx
Hadoop and MapReduce addDdaDadadDDAD.pptx
ms236400269
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
Jazan University
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
harithakannan
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
Big data applications
Big data applicationsBig data applications
Big data applications
Juan Pablo Paz Grau, Ph.D., PMP
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 
hadoop
hadoophadoop
hadoop
Deep Mehta
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
Federico Cargnelutti
 
002 Introduction to hadoop v3
002   Introduction to hadoop v3002   Introduction to hadoop v3
002 Introduction to hadoop v3
Dendej Sawarnkatat
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
Tarak Tar
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
maharajothip1
 
Big data & hadoop
Big data & hadoopBig data & hadoop
Big data & hadoop
Abhi Goyan
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
KennyPratheepKumar
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
Kunal Khanna
 
Fundamental of Big Data with Hadoop and Hive
Fundamental of Big Data with Hadoop and HiveFundamental of Big Data with Hadoop and Hive
Fundamental of Big Data with Hadoop and Hive
Sharjeel Imtiaz
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - Hadoop
Talentica Software
 
Hadoop and MapReduce addDdaDadadDDAD.pptx
Hadoop and MapReduce addDdaDadadDDAD.pptxHadoop and MapReduce addDdaDadadDDAD.pptx
Hadoop and MapReduce addDdaDadadDDAD.pptx
ms236400269
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
harithakannan
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
Tarak Tar
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
maharajothip1
 
Big data & hadoop
Big data & hadoopBig data & hadoop
Big data & hadoop
Abhi Goyan
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
Kunal Khanna
 
Ad

Recently uploaded (20)

AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Journal of Soft Computing in Civil Engineering
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
Ad

Hadoop: A distributed framework for Big Data

  • 2.  What is Big-Data?  What is Hadoop?  Why Distributed File System?  Hadoop Distributed File System (HDFS)  Replication & Rack Awareness
  • 3.  Major Problems in Distributed File System  Hadoop Computing Model(MapReduce)  Advantages Of Hadoop  Disadvantages Of Hadoop  Prominent Users  Tools
  • 4.  Big data refers to data volumes in the range of exabytes (1018) and beyond.i.e.large amount of data  We define “Big Data” as the amount of data just beyond technology’s capability to store,manage and process efficiently.
  • 6. Doug Cutting 2005: Doug Cutting and Michael J. Cafarella developed Hadoop to support distribution for the Nutch search engine project. The project was funded by Yahoo. 2006: Yahoo gave the project to Apache Software Foundation.
  • 7. • Hadoop was created by Doug Cutting and Mike Cafarella in 2005. Cutting, who was working at Yahoo! • Hadoop is a software framework for distributed processing of large datasets across large clusters of computers • Hadoop is open-source implementation for Google MapReduce • Hadoop is based on a simple programming model called MapReduce
  • 8. • Hadoop is based on a simple data model, any data will fit. • ApacheHadoop is an open-source software framework written in Java for distributed storage • Hadoop framework consists on two main layers • Distributed file system (HDFS) • Execution engine (MapReduce) • Hadoop is one time write many time read.
  • 9. Parallel processing used in hadoop for processing data so less time required for processing huge amount of data.
  • 11.  Single name node and many data nodes  Name node maintains the file system metadata  Files are split into fixed sized blocks and stored on data nodes (Default 64MB)  Data blocks are replicated for fault tolerance and fast access (Default is 3)  Datanodes periodically send heartbeats to namenode  HDFS is a master-slave architecture  Master: name node  Slaves: data nodes (100s or 1000s of nodes)
  • 13.  Under Replication:- Total Replication < Replication Factor  Over Replication:- Total Replication > Replication Factor
  • 17. 1)Hardware Failure 2)Large Data Sets 3) Redundancy Of Data
  • 18. Two main phases: Map and Reduce • Any job is converted into map and reduce tasks • Developers need ONLY to implement the Map and Reduce classes MapReduce is a master-slave architecture • Master: JobTracker • Slaves: TaskTrackers (100s or 1000s of tasktrackers) • Every data node is running a tasktracker
  • 19. Mapper and Reducers consume and produce (Key, Value) pairs • Users define the data type of the Key and Value • Shuffling & Sorting phase: • Map output is shuffled such that all same-key records go to the same reducer • Each reducer may receive multiple key sets • Each reducer sorts its records to group similar keys, then process each group
  • 20. Job: Count the occurrences of each word in a data set Map Tasks Reduce Tasks Reduce phase is optional: Jobs can be Map Only
  • 22. 1)Security Concerns 2)Vulnerable By Nature 3)Not Fit for Small Data 4)Potential Stability Issues 5)General Limitations
  • 23. 1)Yahoo! 2)Facebook 3)Hadoop hosting in the Cloud 4)Hadoop on Microsoft Azure 5)Hadoop on Amazon EC2/S3 services 6)Amazon Elastic MapReduce
  • 24. NoSQL:- Databases,MongoDB, CouchDB, Cassandra, Redis, BigTable, Hbase, Hypertable, ZooKeeper . MapReduce :- Hadoop, Hive, Pig, Cascading, Caffeine, S4, MapR, Flume, Kafka, Oozie, Greenplum Storage:- S3, Hadoop Distributed File System
  • 25. Servers :- EC2, Google App Engine, Elastic, Beanstalk. Processing :- R, Yahoo! Pipes, Mechanical Turk,ElasticSearch, BigSheets, Tinkerpop.