SlideShare a Scribd company logo
HADOOP
Presentation by:
Harshdeep kaur Roll
no: 7704
Submitted To:
Mrs Anu Singla
Things included:
 History of data
 What is Big Data?
 Big data challenges
 Big data solution
 Hadoop
 Hadoop architecture
 HDFS
 MapReduce
 How does hadoop work
 Enviorment setup
 Who uses hadoop
HISTORY OF DATA!!!
• Today we all generate data
• Data is in TB even in PB
• Today anything we want we just
look it on internet
• Even the K.G child rhyme is on
internet
• Earlier we need floppy's to save
our data now we move to clouds
• 90% of the data in the world today
has been created in the last two years
alone.
Organization
Est. amount of data
stored
Est. amount of data
processed per day
Ebay 200 PB 100 PB
Google 1500 PB 100 PB
Facebook 300 PB 600 TB
Twitter 200 PB 100 TB
Flood of data is coming from many resources
What is Big Data?
Big data means really a big data, it is a collection of large datasets that cannot
be processed using traditional computing techniques. It is not a single
technique or a tool, rather it involves many areas of business and technology.
According to IBM, 80% of data captured today is unstructured
That is gathered from :
 Posts to social media sites
 digital pictures and videos
 purchase transaction records
 cell phone GPS signals
 From sensors used to gather climate information
 Black box data
 Social media data
 power grid data
 Search engine data
This diagram includes
Big Data Challenges
The major challenges associated with big data are as follows:
 Capturing data
 Curation
 Storage
 searching
 Sharing
 Transfer
 Analysis
 Presentation
To fulfill the above challenges, organizations normally take the help of enterprise servers.
BIG DATA SOLUTIONS
Traditional Enterprise Approach
In this approach, an enterprise will have a computer to store and process big
data. For storage purpose, the programmers will take the help of their choice of
database vendors such as Oracle, IBM, etc.
Limitation
when it comes to dealing with huge amounts of scalable data, it is a hectic task to
process such data through a single database bottleneck.
Google’s Solution
Google solved this problem using an algorithm called MapReduce.
Above diagram shows various commodity hardware which could be
single CPU machines or servers with higher capacity
Hadoop
• Doug Cutting and Mike cafarella developed an Open Source Project
called HADOOP
• Hadoop is an Apache open source framework written in java
• Hadoop allows distributed processing of large datasets across clusters
of computers using simple programming models
• Hadoop is designed to scale up from single server to thousands of
machines, each offering local computation and storage.
• Hadoop runs applications using the MapReduce algorithm.
Hadoop Architecture
Hadoop Architecture
Hadoop designed and build on two independent framework
Hadoop ═ HDFS + Map Reduce
Hadoop has a master / slave architecture for both storage and processing
Hadoop File System (HDFS) was developed using distributed file system design.
It is run on commodity hardware.
Unlike other distributed systems, HDFS is highly fault-tolerant and designed
using low-cost hardware.
MapReduce is a parallel programming model for writing distributed
applications for efficient processing of large amounts of data on large clusters
(thousands of nodes) of commodity
COMPONENTS OF (HDFS)
Namenode
 The namenode i contains the GNU/Linux
operating system and the namenode software.
 The system having the namenode acts as the
master server and it does the following tasks:
 Manages the file system namespace.
 Regulates client’s access to files.
 It also executes file system operations such
as renaming, closing, and opening files
and directories.
COMPONENTS OF (HDFS)
Datanode
 The datanode contains GNU/Linux operating
system and datanode software.
 For every node in a cluster, there will be a
datanode. These nodes manage the data
storage of their system.
 Datanodes perform read-write operations on the
file systems, as per client request.
 They also perform operations such as block
creation, deletion, and replication according to
the instructions of the namenode.
MAP REDUCE
• MapReduce is a parallel programming model for writing distributed
applications devised and efficient processing of large amounts of data
• It is a reliable, fault-tolerant manner. The MapReduce program runs on
Hadoop
• It contains Job trackers and Task trackers
JOB TRACKER TASK TESTERS
How Does Hadoop Work?
 Hadoop runs code across a cluster of computers.
 Data is initially divided into directories and files. Files are
divided into uniform sized blocks of 128M and 64M
(preferably 128M).
 These files are then distributed across various cluster
nodes for further processing.
 HDFS, being on top of the local file system, supervises
the processing.
 Blocks are replicated for handling hardware failure.
 Checking that the code was executed successfully.
ENVIORNMENT SETUP
• Hadoop is supported by Linux platform and its flavors
• In case you have an OS other than Linux, you can install a Virtualbox
software in it
Pre-installation Setup:- we need to set up Linux using ssh (Secure Shell).
 Creating a User
 Installing Java
 Downloading Hadoop
Installing hadoop
Hadoop Operation Modes Once you have downloaded Hadoop, you can
operate your Hadoop cluster in one of the three supported mode
• Local/Standalone Mode: After downloading Hadoop in your system by
default, it is configured in a standalone mode and can be run as a single
java process.
• Pseudo Distributed Mode: It is a distributed simulation on single machine.
Each Hadoop daemon such as (hdfs), MapReduce etc., will run as a
separate java process. This mode is useful for development.
• Fully Distributed Mode: This mode is fully distributed with minimum two or
more machines as a cluster.
Installing hadoop in standalone mode
• There are no daemons running and everything runs in a single JVM.
• Standalone mode is suitable for running MapReduce programs during
development,
• since it is easy to test and debug them.
Installing hadoop in pseudo distributed mode
Step 1: Setting Up Hadoop
Step 2:Hadoop Configuration
Verifying Hadoop Installation
• Step 1:Name Node Setup
• Step 2:Verifying Hadoop dfs
• Step 3: Verifying Yarn Script
• Step 4: Accessing Hadoop on Browser
• Step 5: Verify All Applications for Cluster
Hadoop browser
Who uses hadoop
Amazon
Facebook
Last.fm
New york times
Google
Ibm
Yahoo
Twitter
Linkedln
List toooo big now
Queries
Thank you
Ad

More Related Content

What's hot (20)

Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Hadoop
Hadoop Hadoop
Hadoop
ABHIJEET RAJ
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
joelcrabb
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Atul Kushwaha
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
Edureka!
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
Anand Pandey
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
Jeff Hammerbacher
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
Mahmood Reza Esmaili Zand
 
Apache hive
Apache hiveApache hive
Apache hive
pradipbajpai68
 
Hadoop
HadoopHadoop
Hadoop
Nishant Gandhi
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
Sudipta Ghosh
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
joelcrabb
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
Edureka!
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
Anand Pandey
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 

Viewers also liked (7)

Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Cloudera, Inc.
 
Using Big Data to create a data drive organization
Using Big Data to create a data drive organizationUsing Big Data to create a data drive organization
Using Big Data to create a data drive organization
Edward Chenard
 
SAS and Cloudera – Analytics at Scale
SAS and Cloudera – Analytics at ScaleSAS and Cloudera – Analytics at Scale
SAS and Cloudera – Analytics at Scale
Cloudera, Inc.
 
SAS Modernization architectures - Big Data Analytics
SAS Modernization architectures - Big Data AnalyticsSAS Modernization architectures - Big Data Analytics
SAS Modernization architectures - Big Data Analytics
Deepak Ramanathan
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoop
Ambuj Kumar
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
DataStax
 
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Cloudera, Inc.
 
Using Big Data to create a data drive organization
Using Big Data to create a data drive organizationUsing Big Data to create a data drive organization
Using Big Data to create a data drive organization
Edward Chenard
 
SAS and Cloudera – Analytics at Scale
SAS and Cloudera – Analytics at ScaleSAS and Cloudera – Analytics at Scale
SAS and Cloudera – Analytics at Scale
Cloudera, Inc.
 
SAS Modernization architectures - Big Data Analytics
SAS Modernization architectures - Big Data AnalyticsSAS Modernization architectures - Big Data Analytics
SAS Modernization architectures - Big Data Analytics
Deepak Ramanathan
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoop
Ambuj Kumar
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
DataStax
 
Ad

Similar to Hadoop and Big Data (20)

Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
KrishnenduKrishh
 
Hadoop
HadoopHadoop
Hadoop
chandinisanz
 
Anju
AnjuAnju
Anju
Anju Shekhawat
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
KennyPratheepKumar
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
Mahmoud Yassin
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
Sonal Tiwari
 
Hadoop
HadoopHadoop
Hadoop
Oded Rotter
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
Kibrom Gebrehiwot
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
Humoyun Ahmedov
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
John Sing
 
Unit 3 intro.pptx
Unit 3 intro.pptxUnit 3 intro.pptx
Unit 3 intro.pptx
AkhilJoseph63
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
 
Module 1- Introduction to Big Data and Hadoop
Module 1- Introduction to Big Data and HadoopModule 1- Introduction to Big Data and Hadoop
Module 1- Introduction to Big Data and Hadoop
SiddheshMhatre27
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Prashanth Yennampelli
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
saili mane
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
TIB Academy
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
 
Hadoop
HadoopHadoop
Hadoop
yasser hassen
 
Ad

Recently uploaded (20)

Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 

Hadoop and Big Data

  • 1. HADOOP Presentation by: Harshdeep kaur Roll no: 7704 Submitted To: Mrs Anu Singla
  • 2. Things included:  History of data  What is Big Data?  Big data challenges  Big data solution  Hadoop  Hadoop architecture  HDFS  MapReduce  How does hadoop work  Enviorment setup  Who uses hadoop
  • 3. HISTORY OF DATA!!! • Today we all generate data • Data is in TB even in PB • Today anything we want we just look it on internet • Even the K.G child rhyme is on internet • Earlier we need floppy's to save our data now we move to clouds • 90% of the data in the world today has been created in the last two years alone.
  • 4. Organization Est. amount of data stored Est. amount of data processed per day Ebay 200 PB 100 PB Google 1500 PB 100 PB Facebook 300 PB 600 TB Twitter 200 PB 100 TB Flood of data is coming from many resources
  • 5. What is Big Data? Big data means really a big data, it is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it involves many areas of business and technology. According to IBM, 80% of data captured today is unstructured That is gathered from :  Posts to social media sites  digital pictures and videos  purchase transaction records  cell phone GPS signals  From sensors used to gather climate information
  • 6.  Black box data  Social media data  power grid data  Search engine data This diagram includes
  • 7. Big Data Challenges The major challenges associated with big data are as follows:  Capturing data  Curation  Storage  searching  Sharing  Transfer  Analysis  Presentation To fulfill the above challenges, organizations normally take the help of enterprise servers.
  • 8. BIG DATA SOLUTIONS Traditional Enterprise Approach In this approach, an enterprise will have a computer to store and process big data. For storage purpose, the programmers will take the help of their choice of database vendors such as Oracle, IBM, etc. Limitation when it comes to dealing with huge amounts of scalable data, it is a hectic task to process such data through a single database bottleneck.
  • 9. Google’s Solution Google solved this problem using an algorithm called MapReduce. Above diagram shows various commodity hardware which could be single CPU machines or servers with higher capacity
  • 10. Hadoop • Doug Cutting and Mike cafarella developed an Open Source Project called HADOOP • Hadoop is an Apache open source framework written in java • Hadoop allows distributed processing of large datasets across clusters of computers using simple programming models • Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage. • Hadoop runs applications using the MapReduce algorithm.
  • 12. Hadoop Architecture Hadoop designed and build on two independent framework Hadoop ═ HDFS + Map Reduce Hadoop has a master / slave architecture for both storage and processing Hadoop File System (HDFS) was developed using distributed file system design. It is run on commodity hardware. Unlike other distributed systems, HDFS is highly fault-tolerant and designed using low-cost hardware. MapReduce is a parallel programming model for writing distributed applications for efficient processing of large amounts of data on large clusters (thousands of nodes) of commodity
  • 13. COMPONENTS OF (HDFS) Namenode  The namenode i contains the GNU/Linux operating system and the namenode software.  The system having the namenode acts as the master server and it does the following tasks:  Manages the file system namespace.  Regulates client’s access to files.  It also executes file system operations such as renaming, closing, and opening files and directories.
  • 14. COMPONENTS OF (HDFS) Datanode  The datanode contains GNU/Linux operating system and datanode software.  For every node in a cluster, there will be a datanode. These nodes manage the data storage of their system.  Datanodes perform read-write operations on the file systems, as per client request.  They also perform operations such as block creation, deletion, and replication according to the instructions of the namenode.
  • 15. MAP REDUCE • MapReduce is a parallel programming model for writing distributed applications devised and efficient processing of large amounts of data • It is a reliable, fault-tolerant manner. The MapReduce program runs on Hadoop • It contains Job trackers and Task trackers
  • 16. JOB TRACKER TASK TESTERS
  • 17. How Does Hadoop Work?  Hadoop runs code across a cluster of computers.  Data is initially divided into directories and files. Files are divided into uniform sized blocks of 128M and 64M (preferably 128M).  These files are then distributed across various cluster nodes for further processing.  HDFS, being on top of the local file system, supervises the processing.  Blocks are replicated for handling hardware failure.  Checking that the code was executed successfully.
  • 18. ENVIORNMENT SETUP • Hadoop is supported by Linux platform and its flavors • In case you have an OS other than Linux, you can install a Virtualbox software in it Pre-installation Setup:- we need to set up Linux using ssh (Secure Shell).  Creating a User  Installing Java  Downloading Hadoop
  • 19. Installing hadoop Hadoop Operation Modes Once you have downloaded Hadoop, you can operate your Hadoop cluster in one of the three supported mode • Local/Standalone Mode: After downloading Hadoop in your system by default, it is configured in a standalone mode and can be run as a single java process. • Pseudo Distributed Mode: It is a distributed simulation on single machine. Each Hadoop daemon such as (hdfs), MapReduce etc., will run as a separate java process. This mode is useful for development. • Fully Distributed Mode: This mode is fully distributed with minimum two or more machines as a cluster.
  • 20. Installing hadoop in standalone mode • There are no daemons running and everything runs in a single JVM. • Standalone mode is suitable for running MapReduce programs during development, • since it is easy to test and debug them.
  • 21. Installing hadoop in pseudo distributed mode Step 1: Setting Up Hadoop Step 2:Hadoop Configuration Verifying Hadoop Installation • Step 1:Name Node Setup • Step 2:Verifying Hadoop dfs • Step 3: Verifying Yarn Script • Step 4: Accessing Hadoop on Browser • Step 5: Verify All Applications for Cluster
  • 23. Who uses hadoop Amazon Facebook Last.fm New york times Google Ibm Yahoo Twitter Linkedln List toooo big now