SlideShare a Scribd company logo
Abhishek Mukherjee
Utkarsh Srivastava
13th,September
Not everything that can be counted counts, and not
everything that counts can be counted.
WELCOME TO BIG DATA
TRANING
What are we going to cover today?
 Uses of Big Data
 What is Hadoop?
 Short intro to the HDFS architecture.
 What is Map Reduce?
 The components of Map Reduce Algorithm
 Hello world of map reduce i.e. Word Count Algorithm
 Tips and Tricks of Map Reduce
 Distribution of twitter data to test Map Reduce jars
 Big data is an evolving term that describes any voluminous
amount of structured, semi-structured and
unstructured data that has the potential to be mined for
information.
 Lots of Data(Zetabytes or Terabytes or Petabytes)
 Systems / Enterprises generate huge amount of data from
Terabytes to and even Petabytes of information.
 A airline jet collects 10 terabytes of sensor data for every
30 minutes of flying time.
What is Big Data?
HDFS ARCHITECTURE
HDFS ARCHITECTURE CONTD.
 Map Phase
 Combiner Phase(Optional)
 Sort Phase
 Shuffle Phase
 Partition Phase(Optional)
 Reducer Phase
Key points
Map Reduce Algorithm
Map Reduce basics
 Hello my name is abhishek Hello my name is utsav
 Hello my passion is cricket
Imagine this as the input file:
Map Phase
This file has 2 lines. Each line in the file has a byte offset of
its own which serves as a key to the mapper and the
value of the mapper is the data which is present In the
line.
Operation on output of map phase
Hello 1
my 1
name 1
is 1
abhishek 1
Hello 1
my 1
name 1
is 1
utsav 1
Hello 1
my 1
passion 1
is 1
cricket 1
Hello(1,1,1)
my(1,1,1)
name(1,1,1)
is(1,1,1)
abhishek(1)
utsav(1)
passion(1)
cricket(1)
Key(tuple of values)
 The key points are as follows:
 Sort the key value pairs according to the key values
 Shuffle the mapped output to get values with same key to
create a tuple of values with same key
 This output is fed to the reducer which in turn maps the
values of the tuple by returning a single value for a list of
values present in the tuple
Explaination of sort and shuffle phase
Reducer phase
Hello(1,1,1)
my(1,1,1)
name(1,1,1)
is(1,1,1)
abhishek(1)
utsav(1)
passion(1)
cricket(1)
Key(tuple of values)
abhishek(1)
cricket(1)
Hello(3)
is(3)
my(3)
name(3)
passion(1)
utsav(1)
Key(single value)
ANY QUERIES?
Ad

More Related Content

What's hot (16)

Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
sreehari orienit
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
WANdisco Plc
 
Cppt
CpptCppt
Cppt
chunkypandey12
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2
Giovanna Roda
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
Pallav Jha
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 
Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2
EasyMedico.com
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
jeffturner
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
Newvewm
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
Farzad Nozarian
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 
Introduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopIntroduction to MapReduce and Hadoop
Introduction to MapReduce and Hadoop
Mohamed Elsaka
 
hadoop
hadoophadoop
hadoop
swatic018
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
pappupassindia
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
Victoria López
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
ateeq ateeq
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
sreehari orienit
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
WANdisco Plc
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2
Giovanna Roda
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
Pallav Jha
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
jeffturner
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
Newvewm
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
Farzad Nozarian
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 
Introduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopIntroduction to MapReduce and Hadoop
Introduction to MapReduce and Hadoop
Mohamed Elsaka
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
pappupassindia
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
Victoria López
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
ateeq ateeq
 

Viewers also liked (18)

C# basics...
C# basics...C# basics...
C# basics...
Abhishek Mukherjee
 
Net framework
Net frameworkNet framework
Net framework
Abhishek Mukherjee
 
Meeting directors closing 2010
Meeting directors closing 2010Meeting directors closing 2010
Meeting directors closing 2010
Kusuma Dewi
 
(Jaký) má sběr dat o sobě smysl?
(Jaký) má sběr dat o sobě smysl?(Jaký) má sběr dat o sobě smysl?
(Jaký) má sběr dat o sobě smysl?
Sona Priborska
 
Student sample
Student sampleStudent sample
Student sample
taykem12
 
Meeting Leader Maret 2011
Meeting Leader Maret 2011Meeting Leader Maret 2011
Meeting Leader Maret 2011
Kusuma Dewi
 
The Solar System
The Solar SystemThe Solar System
The Solar System
taykem12
 
To ποδοσφαιρο στη Θάσο
To ποδοσφαιρο στη ΘάσοTo ποδοσφαιρο στη Θάσο
To ποδοσφαιρο στη Θάσο
Sotia Siamantoura
 
Digitální sebekvantifikace – MiQuik 2013
Digitální sebekvantifikace – MiQuik 2013Digitální sebekvantifikace – MiQuik 2013
Digitální sebekvantifikace – MiQuik 2013
Sona Priborska
 
INSPIRING DESIGNS
INSPIRING DESIGNSINSPIRING DESIGNS
INSPIRING DESIGNS
Abhishek Mukherjee
 
Data jako životní styl
Data jako životní stylData jako životní styl
Data jako životní styl
Sona Priborska
 
Abhishek_Mukherjee
Abhishek_MukherjeeAbhishek_Mukherjee
Abhishek_Mukherjee
Abhishek Mukherjee
 
WELCOME TO BIG DATA TRANING
WELCOME TO BIG DATA TRANINGWELCOME TO BIG DATA TRANING
WELCOME TO BIG DATA TRANING
Abhishek Mukherjee
 
Photoshop Basics
Photoshop BasicsPhotoshop Basics
Photoshop Basics
Abhishek Mukherjee
 
Digitální stopy: Sebekvantifikace
Digitální stopy: SebekvantifikaceDigitální stopy: Sebekvantifikace
Digitální stopy: Sebekvantifikace
Sona Priborska
 
Php Bascis
Php BascisPhp Bascis
Php Bascis
Abhishek Mukherjee
 
Presentasi prospek
Presentasi prospekPresentasi prospek
Presentasi prospek
Kusuma Dewi
 
Filtering an image is to apply a convolution
Filtering an image is to apply a convolutionFiltering an image is to apply a convolution
Filtering an image is to apply a convolution
Abhishek Mukherjee
 
Meeting directors closing 2010
Meeting directors closing 2010Meeting directors closing 2010
Meeting directors closing 2010
Kusuma Dewi
 
(Jaký) má sběr dat o sobě smysl?
(Jaký) má sběr dat o sobě smysl?(Jaký) má sběr dat o sobě smysl?
(Jaký) má sběr dat o sobě smysl?
Sona Priborska
 
Student sample
Student sampleStudent sample
Student sample
taykem12
 
Meeting Leader Maret 2011
Meeting Leader Maret 2011Meeting Leader Maret 2011
Meeting Leader Maret 2011
Kusuma Dewi
 
The Solar System
The Solar SystemThe Solar System
The Solar System
taykem12
 
To ποδοσφαιρο στη Θάσο
To ποδοσφαιρο στη ΘάσοTo ποδοσφαιρο στη Θάσο
To ποδοσφαιρο στη Θάσο
Sotia Siamantoura
 
Digitální sebekvantifikace – MiQuik 2013
Digitální sebekvantifikace – MiQuik 2013Digitální sebekvantifikace – MiQuik 2013
Digitální sebekvantifikace – MiQuik 2013
Sona Priborska
 
Data jako životní styl
Data jako životní stylData jako životní styl
Data jako životní styl
Sona Priborska
 
Digitální stopy: Sebekvantifikace
Digitální stopy: SebekvantifikaceDigitální stopy: Sebekvantifikace
Digitální stopy: Sebekvantifikace
Sona Priborska
 
Presentasi prospek
Presentasi prospekPresentasi prospek
Presentasi prospek
Kusuma Dewi
 
Filtering an image is to apply a convolution
Filtering an image is to apply a convolutionFiltering an image is to apply a convolution
Filtering an image is to apply a convolution
Abhishek Mukherjee
 
Ad

Similar to Map Reduce basics (20)

WELCOME TO BIG DATA TRANING
WELCOME TO BIG DATA TRANINGWELCOME TO BIG DATA TRANING
WELCOME TO BIG DATA TRANING
Utkarsh Srivastava
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
Jazan University
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
Avinash Pandu
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
TheEdge10 : Big Data is Here - Hadoop to the Rescue
TheEdge10 : Big Data is Here - Hadoop to the RescueTheEdge10 : Big Data is Here - Hadoop to the Rescue
TheEdge10 : Big Data is Here - Hadoop to the Rescue
Shay Sofer
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
CheeWeiTan10
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
Yu Liu
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 
What is MapReduce ?
What is MapReduce ?What is MapReduce ?
What is MapReduce ?
ShilpaKrishna6
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
 
C Interview Questions for Fresher
C Interview Questions for FresherC Interview Questions for Fresher
C Interview Questions for Fresher
Javed Ahmad
 
C interview Question and Answer
C interview Question and AnswerC interview Question and Answer
C interview Question and Answer
Jagan Mohan Bishoyi
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
sonu sharma
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
Kgr Sushmitha
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
scottcrespo
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
softwarequery
 
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBase
Cloudera, Inc.
 
Introduction to MapReduce Data Transformations
Introduction to MapReduce Data TransformationsIntroduction to MapReduce Data Transformations
Introduction to MapReduce Data Transformations
swooledge
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
AnilVijayagiri
 
Mapreduce: Theory and implementation
Mapreduce: Theory and implementationMapreduce: Theory and implementation
Mapreduce: Theory and implementation
Sri Prasanna
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
Avinash Pandu
 
TheEdge10 : Big Data is Here - Hadoop to the Rescue
TheEdge10 : Big Data is Here - Hadoop to the RescueTheEdge10 : Big Data is Here - Hadoop to the Rescue
TheEdge10 : Big Data is Here - Hadoop to the Rescue
Shay Sofer
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
CheeWeiTan10
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
Yu Liu
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
 
C Interview Questions for Fresher
C Interview Questions for FresherC Interview Questions for Fresher
C Interview Questions for Fresher
Javed Ahmad
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
sonu sharma
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
Kgr Sushmitha
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
scottcrespo
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
softwarequery
 
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBase
Cloudera, Inc.
 
Introduction to MapReduce Data Transformations
Introduction to MapReduce Data TransformationsIntroduction to MapReduce Data Transformations
Introduction to MapReduce Data Transformations
swooledge
 
Mapreduce: Theory and implementation
Mapreduce: Theory and implementationMapreduce: Theory and implementation
Mapreduce: Theory and implementation
Sri Prasanna
 
Ad

Recently uploaded (20)

Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
pankaj6188303
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
pankaj6188303
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 

Map Reduce basics

  • 1. Abhishek Mukherjee Utkarsh Srivastava 13th,September Not everything that can be counted counts, and not everything that counts can be counted. WELCOME TO BIG DATA TRANING
  • 2. What are we going to cover today?  Uses of Big Data  What is Hadoop?  Short intro to the HDFS architecture.  What is Map Reduce?  The components of Map Reduce Algorithm  Hello world of map reduce i.e. Word Count Algorithm  Tips and Tricks of Map Reduce  Distribution of twitter data to test Map Reduce jars
  • 3.  Big data is an evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information.  Lots of Data(Zetabytes or Terabytes or Petabytes)  Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information.  A airline jet collects 10 terabytes of sensor data for every 30 minutes of flying time. What is Big Data?
  • 6.  Map Phase  Combiner Phase(Optional)  Sort Phase  Shuffle Phase  Partition Phase(Optional)  Reducer Phase Key points Map Reduce Algorithm
  • 8.  Hello my name is abhishek Hello my name is utsav  Hello my passion is cricket Imagine this as the input file: Map Phase This file has 2 lines. Each line in the file has a byte offset of its own which serves as a key to the mapper and the value of the mapper is the data which is present In the line.
  • 9. Operation on output of map phase Hello 1 my 1 name 1 is 1 abhishek 1 Hello 1 my 1 name 1 is 1 utsav 1 Hello 1 my 1 passion 1 is 1 cricket 1 Hello(1,1,1) my(1,1,1) name(1,1,1) is(1,1,1) abhishek(1) utsav(1) passion(1) cricket(1) Key(tuple of values)
  • 10.  The key points are as follows:  Sort the key value pairs according to the key values  Shuffle the mapped output to get values with same key to create a tuple of values with same key  This output is fed to the reducer which in turn maps the values of the tuple by returning a single value for a list of values present in the tuple Explaination of sort and shuffle phase
  • 11. Reducer phase Hello(1,1,1) my(1,1,1) name(1,1,1) is(1,1,1) abhishek(1) utsav(1) passion(1) cricket(1) Key(tuple of values) abhishek(1) cricket(1) Hello(3) is(3) my(3) name(3) passion(1) utsav(1) Key(single value)