SlideShare a Scribd company logo
WHAT STARTS HERE CHANGES THE WORLD




           and MapReduce
Hemanth Kumar Mantri
  Graduate Student
     UT-Austin



   November 9th 2011
WHAT STARTS HERE CHANGES THE WORLD




                 Agenda
•   What is Hadoop?
•   Where is MapReduce used?
•   HDFS and MapReduce
•   Amazon Web Services
•   Map Reduce Demo on Hadoop
WHAT STARTS HERE CHANGES THE WORLD




            What is Hadoop?
• Inspired by Google File System (GFS) and
  MapReduce.
• Supports data-intensive distributed
  applications.
• Thousands of nodes and PBytes of data.
• Apache project – Open Source
• Implemented in Java
• Yahoo! - largest contributor
WHAT STARTS HERE CHANGES THE WORLD




Typical Hadoop Cluster!
WHAT STARTS HERE CHANGES THE WORLD




Who Uses Hadoop?
WHAT STARTS HERE CHANGES THE WORLD




                    Who Uses Hadoop?
•   At Google:
     – Index construction for Google Search
     – Popular Passages in Google Books
     – Article clustering for Google News

•   At Yahoo!:
     – “Web map” powering Yahoo! Search
     – Spam detection for Yahoo! Mail
     – More than 100,000 CPUs in >36,000 computers

•   At Facebook:
     – Used in reporting/analytics and machine learning
          • Data Mining, Spam detection
     – as storage engine for logs.
     – 1100-machine cluster with 8800 cores and about 12 PB raw storage.
WHAT STARTS HERE CHANGES THE WORLD




FaceBook Lexicon
WHAT STARTS HERE CHANGES THE WORLD




                           Yelp!
• Uses Amazon S3 to store daily logs and photos,
   – generating around 100GB of logs per day.
• Amazon Elastic MapReduce for:
   –   People Who Viewed this Also Viewed
   –   Review highlights
   –   Auto complete as you type on search
   –   Search spelling suggestions
   –   Top searches
   –   Ads
• Yelp runs approximately 200 Elastic MapReduce jobs
  processing 3TB of data per day.
WHAT STARTS HERE CHANGES THE WORLD




          Hadoop Components
• Distributed file system (HDFS)
  – Single namespace for entire cluster
  – Almost same as GFS
  – Replicates data 3x for fault-tolerance

• MapReduce framework
  – Executes user jobs specified as “map” and
    “reduce” functions
  – Manages work distribution & fault-tolerance
WHAT STARTS HERE CHANGES THE WORLD




Hadoop Architecture
WHAT STARTS HERE CHANGES THE WORLD




The Big Picture
WHAT STARTS HERE CHANGES THE WORLD




                         Using the HDFS
• hadoop dfs
   –   [-ls <path>]
   –   [-du <path>]
   –   [-cp <src> <dst>]
   –   [-rm <path>]
   –   [-put <localsrc> <dst>]
   –   [-copyFromLocal <localsrc> <dst>]
   –   [-moveFromLocal <localsrc> <dst>]
   –   [-get [-crc] <src> <localdst>]
   –   [-cat <src>]
   –   [-copyToLocal [-crc] <src> <localdst>]
   –   [-moveToLocal [-crc] <src> <localdst>]
   –   [-mkdir <path>]
   –   [-touchz <path>]
   –   [-test -[ezd] <path>]
   –   [-stat [format] <path>]
   –   [-help [cmd]]
WHAT STARTS HERE CHANGES THE WORLD




AWS and Cloud
WHAT STARTS HERE CHANGES THE WORLD




           Amazon Web Services
• Collection of services – Pay as you use!
   – S3 (Simple Storage Service)
       Storage in the Cloud ($0.140/GB/Month)
       Key Value Store (Big HashMap!)
   – EC2 (Elastic Compute Cloud)
       Compute in the Cloud ($0.085 - $2.6 /computing hour)
   – Elastic MapReduce
       Run Hadoop Jobs on EC2 using Data stored in S3
   – Email Service
   – …. Many more
WHAT STARTS HERE CHANGES THE WORLD




       Map Reduce on EC2 Cluster
• Create AWS account and get the keys for authentication
• Go to src/contrib/ec2 in Hadoop directory
• Launch a cluster on EC2
   – % bin/hadoop-ec2 launch-cluster <cluster-name> <#nodes>
• Login to the cluster
   – % bin/hadoop-ec2 login test-cluster
• Start Computation
   – # cd /usr/local/hadoop-*
   – # bin/hadoop jar hadoop-*-examples.jar pi 10 10000000
• Terminate the Cluster after use!!!!!
   – % bin/hadoop-ec2 terminate-cluster test-cluster
WHAT STARTS HERE CHANGES THE WORLD




                References
• Hadoop Project Page:
  – http://hadoop.apache.org/
• Amazon Web Services:
  – http://aws.amazon.com/
WHAT STARTS HERE CHANGES THE WORLD




Thank You!
Ad

More Related Content

What's hot (19)

Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copy
Mohammad_Tariq
 
Hadoop hbase introduction
Hadoop hbase introductionHadoop hbase introduction
Hadoop hbase introduction
Jakub Stransky
 
Geek camp
Geek campGeek camp
Geek camp
jdhok
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
Adeel Ahmad
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Steve Watt
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
 
Hadoop online trainings
Hadoop online trainingsHadoop online trainings
Hadoop online trainings
Geek Trainings
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
dzhou
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
Hadoop User Group
 
The Bixo Web Mining Toolkit
The Bixo Web Mining ToolkitThe Bixo Web Mining Toolkit
The Bixo Web Mining Toolkit
Tom Croucher
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Hadoop User Group
 
Hadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたHadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきました
moai kids
 
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton WorksHadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Cloudera, Inc.
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache Cassandra
Jeremy Hanna
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
Smartittrainings
 
מיכאל
מיכאלמיכאל
מיכאל
sqlserver.co.il
 
2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure
DataPlato, Crossing the line
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
Ryan Hennig
 
HBase backups and performance on MapR
HBase backups and performance on MapRHBase backups and performance on MapR
HBase backups and performance on MapR
lohitvijayarenu
 
Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copy
Mohammad_Tariq
 
Hadoop hbase introduction
Hadoop hbase introductionHadoop hbase introduction
Hadoop hbase introduction
Jakub Stransky
 
Geek camp
Geek campGeek camp
Geek camp
jdhok
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
Adeel Ahmad
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Steve Watt
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
 
Hadoop online trainings
Hadoop online trainingsHadoop online trainings
Hadoop online trainings
Geek Trainings
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
dzhou
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
Hadoop User Group
 
The Bixo Web Mining Toolkit
The Bixo Web Mining ToolkitThe Bixo Web Mining Toolkit
The Bixo Web Mining Toolkit
Tom Croucher
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Hadoop User Group
 
Hadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたHadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきました
moai kids
 
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton WorksHadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Cloudera, Inc.
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache Cassandra
Jeremy Hanna
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
Ryan Hennig
 
HBase backups and performance on MapR
HBase backups and performance on MapRHBase backups and performance on MapR
HBase backups and performance on MapR
lohitvijayarenu
 

Similar to Hadoop and MapReduce (20)

Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
Steve Staso
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
Chris Purrington
 
Hadoop
HadoopHadoop
Hadoop
Yojana Nanaware
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
Corley S.r.l.
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
Manaranjan Pradhan
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
Vibrant Technologies & Computers
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOP
Krishna Sujeer
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
Delhi/NCR HUG
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
Sandeep Singh
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Vijay Rayapati
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
elliando dias
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
VMware Tanzu
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
Lukas Vlcek
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
Taldor Group
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
chariorienit
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
Chris Purrington
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
Corley S.r.l.
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOP
Krishna Sujeer
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
Delhi/NCR HUG
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
Sandeep Singh
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Vijay Rayapati
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
elliando dias
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
VMware Tanzu
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
Lukas Vlcek
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
Taldor Group
 
Ad

More from Hemanth Kumar Mantri (8)

TCP Issues in DataCenter Networks
TCP Issues in DataCenter NetworksTCP Issues in DataCenter Networks
TCP Issues in DataCenter Networks
Hemanth Kumar Mantri
 
Basic Paxos Implementation in Orc
Basic Paxos Implementation in OrcBasic Paxos Implementation in Orc
Basic Paxos Implementation in Orc
Hemanth Kumar Mantri
 
Neural Networks in File access Prediction
Neural Networks in File access PredictionNeural Networks in File access Prediction
Neural Networks in File access Prediction
Hemanth Kumar Mantri
 
Connected Components Labeling
Connected Components LabelingConnected Components Labeling
Connected Components Labeling
Hemanth Kumar Mantri
 
JPEG Image Compression
JPEG Image CompressionJPEG Image Compression
JPEG Image Compression
Hemanth Kumar Mantri
 
Traffic Simulation using NetLogo
Traffic Simulation using NetLogoTraffic Simulation using NetLogo
Traffic Simulation using NetLogo
Hemanth Kumar Mantri
 
Search Engine Switching
Search Engine SwitchingSearch Engine Switching
Search Engine Switching
Hemanth Kumar Mantri
 
Auto Tuning
Auto TuningAuto Tuning
Auto Tuning
Hemanth Kumar Mantri
 
Ad

Recently uploaded (20)

Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 

Hadoop and MapReduce

  • 1. WHAT STARTS HERE CHANGES THE WORLD and MapReduce Hemanth Kumar Mantri Graduate Student UT-Austin November 9th 2011
  • 2. WHAT STARTS HERE CHANGES THE WORLD Agenda • What is Hadoop? • Where is MapReduce used? • HDFS and MapReduce • Amazon Web Services • Map Reduce Demo on Hadoop
  • 3. WHAT STARTS HERE CHANGES THE WORLD What is Hadoop? • Inspired by Google File System (GFS) and MapReduce. • Supports data-intensive distributed applications. • Thousands of nodes and PBytes of data. • Apache project – Open Source • Implemented in Java • Yahoo! - largest contributor
  • 4. WHAT STARTS HERE CHANGES THE WORLD Typical Hadoop Cluster!
  • 5. WHAT STARTS HERE CHANGES THE WORLD Who Uses Hadoop?
  • 6. WHAT STARTS HERE CHANGES THE WORLD Who Uses Hadoop? • At Google: – Index construction for Google Search – Popular Passages in Google Books – Article clustering for Google News • At Yahoo!: – “Web map” powering Yahoo! Search – Spam detection for Yahoo! Mail – More than 100,000 CPUs in >36,000 computers • At Facebook: – Used in reporting/analytics and machine learning • Data Mining, Spam detection – as storage engine for logs. – 1100-machine cluster with 8800 cores and about 12 PB raw storage.
  • 7. WHAT STARTS HERE CHANGES THE WORLD FaceBook Lexicon
  • 8. WHAT STARTS HERE CHANGES THE WORLD Yelp! • Uses Amazon S3 to store daily logs and photos, – generating around 100GB of logs per day. • Amazon Elastic MapReduce for: – People Who Viewed this Also Viewed – Review highlights – Auto complete as you type on search – Search spelling suggestions – Top searches – Ads • Yelp runs approximately 200 Elastic MapReduce jobs processing 3TB of data per day.
  • 9. WHAT STARTS HERE CHANGES THE WORLD Hadoop Components • Distributed file system (HDFS) – Single namespace for entire cluster – Almost same as GFS – Replicates data 3x for fault-tolerance • MapReduce framework – Executes user jobs specified as “map” and “reduce” functions – Manages work distribution & fault-tolerance
  • 10. WHAT STARTS HERE CHANGES THE WORLD Hadoop Architecture
  • 11. WHAT STARTS HERE CHANGES THE WORLD The Big Picture
  • 12. WHAT STARTS HERE CHANGES THE WORLD Using the HDFS • hadoop dfs – [-ls <path>] – [-du <path>] – [-cp <src> <dst>] – [-rm <path>] – [-put <localsrc> <dst>] – [-copyFromLocal <localsrc> <dst>] – [-moveFromLocal <localsrc> <dst>] – [-get [-crc] <src> <localdst>] – [-cat <src>] – [-copyToLocal [-crc] <src> <localdst>] – [-moveToLocal [-crc] <src> <localdst>] – [-mkdir <path>] – [-touchz <path>] – [-test -[ezd] <path>] – [-stat [format] <path>] – [-help [cmd]]
  • 13. WHAT STARTS HERE CHANGES THE WORLD AWS and Cloud
  • 14. WHAT STARTS HERE CHANGES THE WORLD Amazon Web Services • Collection of services – Pay as you use! – S3 (Simple Storage Service) Storage in the Cloud ($0.140/GB/Month) Key Value Store (Big HashMap!) – EC2 (Elastic Compute Cloud) Compute in the Cloud ($0.085 - $2.6 /computing hour) – Elastic MapReduce Run Hadoop Jobs on EC2 using Data stored in S3 – Email Service – …. Many more
  • 15. WHAT STARTS HERE CHANGES THE WORLD Map Reduce on EC2 Cluster • Create AWS account and get the keys for authentication • Go to src/contrib/ec2 in Hadoop directory • Launch a cluster on EC2 – % bin/hadoop-ec2 launch-cluster <cluster-name> <#nodes> • Login to the cluster – % bin/hadoop-ec2 login test-cluster • Start Computation – # cd /usr/local/hadoop-* – # bin/hadoop jar hadoop-*-examples.jar pi 10 10000000 • Terminate the Cluster after use!!!!! – % bin/hadoop-ec2 terminate-cluster test-cluster
  • 16. WHAT STARTS HERE CHANGES THE WORLD References • Hadoop Project Page: – http://hadoop.apache.org/ • Amazon Web Services: – http://aws.amazon.com/
  • 17. WHAT STARTS HERE CHANGES THE WORLD Thank You!