SlideShare a Scribd company logo
Myself Archana R
Assistant Professor In
Dept Of CS
SACWC.
I am here because I love
to give presentations.
MapReduce
 MapReduce is a programming model for efficient distributed computing
 It works like a Unix pipeline
 cat input | grep | sort | uniq -c | cat > output
 Input | Map | Shuffle & Sort | Reduce | Output
 Efficiency from
 Streaming through data, reducing seeks
 Pipelining
 A good fit for a lot of applications
 Log processing
 Web index building
MapReduce - Dataflow
MapReduce - Features
 Fine grained Map and Reduce tasks
 Improved load balancing
 Faster recovery from failed tasks
 Automatic re-execution on failure
 In a large cluster, some nodes are always slow or flaky
 Framework re-executes failed tasks
 Locality optimizations
 With large data, bandwidth to data is a problem
 Map-Reduce + HDFS is a very effective solution
 Map-Reduce queries HDFS for locations of input data
 Map tasks are scheduled close to the inputs when possible
Word Count Example
 Mapper
 Input: value: lines of text of input
 Output: key: word, value: 1
 Reducer
 Input: key: word, value: set of counts
 Output: key: word, value: sum
 Launching program
 Defines this job
 Submits job to cluster
Word Count Dataflow
Word Count Mapper
public static class Map extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable> {
private static final IntWritable one = new IntWritable(1);
private Text word = new Text();
public static void map(LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer = new StringTokenizer(line);
while(tokenizer.hasNext()) {
word.set(tokenizer.nextToken());
output.collect(word,one);
}
}
}
Word Count Example
 Jobs are controlled by configuring JobConfs
 JobConfs are maps from attribute names to string values
 The framework defines attributes to control how the job is executed
 conf.set(“mapred.job.name”, “MyApp”);
 Applications can add arbitrary values to the JobConf
 conf.set(“my.string”, “foo”);
 conf.set(“my.integer”, 12);
 JobConf is available to all tasks
Putting it all together
 Create a launching program for your application
 The launching program configures:
 The Mapper and Reducer to use
 The output key and value types (input types are inferred from the InputFormat)
 The locations for your input and output
 The launching program then submits the job and typically waits for it to complete
Input and Output Formats
 A Map/Reduce may specify how it’s input is to be read by specifying an InputFormat to be used
 A Map/Reduce may specify how it’s output is to be written by specifying an OutputFormat to be used
 These default to TextInputFormat and TextOutputFormat, which process line-based text data
 Another common choice is SequenceFileInputFormat and SequenceFileOutputFormat for binary data
 These are file-based, but they are not required to be
How many Maps and Reduces
 Maps
 Usually as many as the number of HDFS blocks being processed, this is the default
 Else the number of maps can be specified as a hint
 The number of maps can also be controlled by specifying the minimum split size
 The actual sizes of the map inputs are computed by:
 max(min(block_size,data/#maps), min_split_size
 Reduces
 Unless the amount of data being processed is small
 0.95*num_nodes*mapred.tasktracker.tasks.maximum
Some handy tools
 Partitioners
 Combiners
 Compression
 Counters
 Speculation
 Zero Reduces
 Distributed File Cache
 Tool
Partitioners
 Partitioners are application code that define how keys are assigned to reduces
 Default partitioning spreads keys evenly, but randomly
 Uses key.hashCode() % num_reduces
 Custom partitioning is often required, for example, to produce a total order in the output
 Should implement Partitioner interface
 Set by calling conf.setPartitionerClass(MyPart.class)
 To get a total order, sample the map output keys and pick values to divide the keys into roughly equal
buckets and use that in your partitioner
Compression
 Compressing the outputs and intermediate data will often yield huge performance gains
 Can be specified via a configuration file or set programmatically
 Set mapred.output.compress to true to compress job output
 Set mapred.compress.map.output to true to compress map outputs
 Compression Types (mapred(.map)?.output.compression.type)
 “block” - Group of keys and values are compressed together
 “record” - Each value is compressed individually
 Block compression is almost always best
 Compression Codecs (mapred(.map)?.output.compression.codec)
 Default (zlib) - slower, but more compression
 LZO - faster, but less compression
Counters
 Often Map/Reduce applications have countable events
 For example, framework counts records in to and out of Mapper and Reducer
 To define user counters:
static enum Counter {EVENT1, EVENT2};
reporter.incrCounter(Counter.EVENT1, 1);
 Define nice names in a MyClass_Counter.properties file
CounterGroupName=MyCounters
EVENT1.name=Event 1
EVENT2.name=Event 2
Map reduce in Hadoop BIG DATA ANALYTICS
Ad

More Related Content

What's hot (20)

MapReduce
MapReduceMapReduce
MapReduce
Surinder Kaur
 
Ch4.mapreduce algorithm design
Ch4.mapreduce algorithm designCh4.mapreduce algorithm design
Ch4.mapreduce algorithm design
AllenWu
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
tugrulh
 
Map reduce
Map reduceMap reduce
Map reduce
대호 김
 
Ppt 2 d ploting k10998
Ppt 2 d ploting k10998Ppt 2 d ploting k10998
Ppt 2 d ploting k10998
Vinit Rajput
 
Match-n-Freq
Match-n-FreqMatch-n-Freq
Match-n-Freq
Phil Brubaker
 
Finite state automaton
Finite state automatonFinite state automaton
Finite state automaton
guest350909
 
Training Storyboard
Training StoryboardTraining Storyboard
Training Storyboard
haven832
 
Map reduce
Map reduceMap reduce
Map reduce
Syed Measum Haider Bokhari
 
Introduction to computer_lec_03
Introduction to computer_lec_03Introduction to computer_lec_03
Introduction to computer_lec_03
Ramadan Babers, PhD
 
02 Map Reduce
02 Map Reduce02 Map Reduce
02 Map Reduce
Omid Djoudi
 
Tutorial ground classification with Laserdata LiS
Tutorial ground classification with Laserdata LiSTutorial ground classification with Laserdata LiS
Tutorial ground classification with Laserdata LiS
Frederic Petrini-Monteferri
 
Dfg &amp; sg ppt (1)
Dfg &amp; sg ppt (1)Dfg &amp; sg ppt (1)
Dfg &amp; sg ppt (1)
shrutishreya14
 
1 Anne complains that defining functions to use in her programs is a lot of ...
1 Anne complains that defining functions to use in her programs is a lot of  ...1 Anne complains that defining functions to use in her programs is a lot of  ...
1 Anne complains that defining functions to use in her programs is a lot of ...
hwbloom59
 
Flowcharting and pseudocoding
Flowcharting and pseudocodingFlowcharting and pseudocoding
Flowcharting and pseudocoding
Sara Corpuz
 
15 functional programming
15 functional programming15 functional programming
15 functional programming
jigeno
 
Hadoop Map Reduce OS
Hadoop Map Reduce OSHadoop Map Reduce OS
Hadoop Map Reduce OS
Vedant Mane
 
5 Ways to Improve Your LiDAR Workflows
5 Ways to Improve Your LiDAR Workflows5 Ways to Improve Your LiDAR Workflows
5 Ways to Improve Your LiDAR Workflows
Safe Software
 
Linear Referencing (LRS): How FME Measures Up
Linear Referencing (LRS): How FME Measures UpLinear Referencing (LRS): How FME Measures Up
Linear Referencing (LRS): How FME Measures Up
Safe Software
 
MapReduce
MapReduceMapReduce
MapReduce
hyun soomyung
 
Ch4.mapreduce algorithm design
Ch4.mapreduce algorithm designCh4.mapreduce algorithm design
Ch4.mapreduce algorithm design
AllenWu
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
tugrulh
 
Ppt 2 d ploting k10998
Ppt 2 d ploting k10998Ppt 2 d ploting k10998
Ppt 2 d ploting k10998
Vinit Rajput
 
Finite state automaton
Finite state automatonFinite state automaton
Finite state automaton
guest350909
 
Training Storyboard
Training StoryboardTraining Storyboard
Training Storyboard
haven832
 
Tutorial ground classification with Laserdata LiS
Tutorial ground classification with Laserdata LiSTutorial ground classification with Laserdata LiS
Tutorial ground classification with Laserdata LiS
Frederic Petrini-Monteferri
 
1 Anne complains that defining functions to use in her programs is a lot of ...
1 Anne complains that defining functions to use in her programs is a lot of  ...1 Anne complains that defining functions to use in her programs is a lot of  ...
1 Anne complains that defining functions to use in her programs is a lot of ...
hwbloom59
 
Flowcharting and pseudocoding
Flowcharting and pseudocodingFlowcharting and pseudocoding
Flowcharting and pseudocoding
Sara Corpuz
 
15 functional programming
15 functional programming15 functional programming
15 functional programming
jigeno
 
Hadoop Map Reduce OS
Hadoop Map Reduce OSHadoop Map Reduce OS
Hadoop Map Reduce OS
Vedant Mane
 
5 Ways to Improve Your LiDAR Workflows
5 Ways to Improve Your LiDAR Workflows5 Ways to Improve Your LiDAR Workflows
5 Ways to Improve Your LiDAR Workflows
Safe Software
 
Linear Referencing (LRS): How FME Measures Up
Linear Referencing (LRS): How FME Measures UpLinear Referencing (LRS): How FME Measures Up
Linear Referencing (LRS): How FME Measures Up
Safe Software
 

Similar to Map reduce in Hadoop BIG DATA ANALYTICS (20)

Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
ateeq ateeq
 
map reduce Technic in big data
map reduce Technic in big data map reduce Technic in big data
map reduce Technic in big data
Jay Nagar
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
HARIKRISHNANU13
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_Pennonsoft
PennonSoft
 
MapReduce and Hadoop Introcuctory Presentation
MapReduce and Hadoop Introcuctory PresentationMapReduce and Hadoop Introcuctory Presentation
MapReduce and Hadoop Introcuctory Presentation
ssuserb91a20
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
Shubham Bansal
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
anh tuan
 
Map reduce
Map reduceMap reduce
Map reduce
Shahbaz Sidhu
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
AnilVijayagiri
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
Vibrant Technologies & Computers
 
Lecture 04 big data analytics | map reduce
Lecture 04 big data analytics | map reduceLecture 04 big data analytics | map reduce
Lecture 04 big data analytics | map reduce
anasbro009
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbai
Unmesh Baile
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
coolmirza143
 
Map reduce
Map reduceMap reduce
Map reduce
xydii
 
Mypreson 27
Mypreson 27Mypreson 27
Mypreson 27
Venkatesh Nandigama
 
Unit 2
Unit 2Unit 2
Unit 2
vishal choudhary
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
softwarequery
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Sri Prasanna
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
ateeq ateeq
 
map reduce Technic in big data
map reduce Technic in big data map reduce Technic in big data
map reduce Technic in big data
Jay Nagar
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
HARIKRISHNANU13
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_Pennonsoft
PennonSoft
 
MapReduce and Hadoop Introcuctory Presentation
MapReduce and Hadoop Introcuctory PresentationMapReduce and Hadoop Introcuctory Presentation
MapReduce and Hadoop Introcuctory Presentation
ssuserb91a20
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
anh tuan
 
Lecture 04 big data analytics | map reduce
Lecture 04 big data analytics | map reduceLecture 04 big data analytics | map reduce
Lecture 04 big data analytics | map reduce
anasbro009
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbai
Unmesh Baile
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
coolmirza143
 
Map reduce
Map reduceMap reduce
Map reduce
xydii
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
softwarequery
 
Ad

More from Archana Gopinath (20)

Introduction-to-Binary-Tree-Traversal.pptx
Introduction-to-Binary-Tree-Traversal.pptxIntroduction-to-Binary-Tree-Traversal.pptx
Introduction-to-Binary-Tree-Traversal.pptx
Archana Gopinath
 
DNS-Translates domain names into IP addresses.pptx
DNS-Translates domain names into IP addresses.pptxDNS-Translates domain names into IP addresses.pptx
DNS-Translates domain names into IP addresses.pptx
Archana Gopinath
 
Data Transfer & Manipulation.pptx
Data Transfer & Manipulation.pptxData Transfer & Manipulation.pptx
Data Transfer & Manipulation.pptx
Archana Gopinath
 
DP _ CO Instruction Format.pptx
DP _ CO Instruction Format.pptxDP _ CO Instruction Format.pptx
DP _ CO Instruction Format.pptx
Archana Gopinath
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical Analyzer
Archana Gopinath
 
Implementation of lexical analyser
Implementation of lexical analyserImplementation of lexical analyser
Implementation of lexical analyser
Archana Gopinath
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzers
Archana Gopinath
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
Archana Gopinath
 
minimization the number of states of DFA
minimization the number of states of DFAminimization the number of states of DFA
minimization the number of states of DFA
Archana Gopinath
 
Regular Expression to Finite Automata
Regular Expression to Finite AutomataRegular Expression to Finite Automata
Regular Expression to Finite Automata
Archana Gopinath
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
Archana Gopinath
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
Archana Gopinath
 
Hadoop
HadoopHadoop
Hadoop
Archana Gopinath
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data Analytics
Archana Gopinath
 
If statements in c programming
If statements in c programmingIf statements in c programming
If statements in c programming
Archana Gopinath
 
un Guided media
un Guided mediaun Guided media
un Guided media
Archana Gopinath
 
Guided media Transmission Media
Guided media Transmission MediaGuided media Transmission Media
Guided media Transmission Media
Archana Gopinath
 
Main Memory RAM and ROM
Main Memory RAM and ROMMain Memory RAM and ROM
Main Memory RAM and ROM
Archana Gopinath
 
Java thread life cycle
Java thread life cycleJava thread life cycle
Java thread life cycle
Archana Gopinath
 
PCSTt11 overview of java
PCSTt11 overview of javaPCSTt11 overview of java
PCSTt11 overview of java
Archana Gopinath
 
Introduction-to-Binary-Tree-Traversal.pptx
Introduction-to-Binary-Tree-Traversal.pptxIntroduction-to-Binary-Tree-Traversal.pptx
Introduction-to-Binary-Tree-Traversal.pptx
Archana Gopinath
 
DNS-Translates domain names into IP addresses.pptx
DNS-Translates domain names into IP addresses.pptxDNS-Translates domain names into IP addresses.pptx
DNS-Translates domain names into IP addresses.pptx
Archana Gopinath
 
Data Transfer & Manipulation.pptx
Data Transfer & Manipulation.pptxData Transfer & Manipulation.pptx
Data Transfer & Manipulation.pptx
Archana Gopinath
 
DP _ CO Instruction Format.pptx
DP _ CO Instruction Format.pptxDP _ CO Instruction Format.pptx
DP _ CO Instruction Format.pptx
Archana Gopinath
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical Analyzer
Archana Gopinath
 
Implementation of lexical analyser
Implementation of lexical analyserImplementation of lexical analyser
Implementation of lexical analyser
Archana Gopinath
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzers
Archana Gopinath
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
Archana Gopinath
 
minimization the number of states of DFA
minimization the number of states of DFAminimization the number of states of DFA
minimization the number of states of DFA
Archana Gopinath
 
Regular Expression to Finite Automata
Regular Expression to Finite AutomataRegular Expression to Finite Automata
Regular Expression to Finite Automata
Archana Gopinath
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
Archana Gopinath
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data Analytics
Archana Gopinath
 
If statements in c programming
If statements in c programmingIf statements in c programming
If statements in c programming
Archana Gopinath
 
Guided media Transmission Media
Guided media Transmission MediaGuided media Transmission Media
Guided media Transmission Media
Archana Gopinath
 
Ad

Recently uploaded (20)

Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdfExploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Sandeep Swamy
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
Political History of Pala dynasty Pala Rulers NEP.pptx
Political History of Pala dynasty Pala Rulers NEP.pptxPolitical History of Pala dynasty Pala Rulers NEP.pptx
Political History of Pala dynasty Pala Rulers NEP.pptx
Arya Mahila P. G. College, Banaras Hindu University, Varanasi, India.
 
To study Digestive system of insect.pptx
To study Digestive system of insect.pptxTo study Digestive system of insect.pptx
To study Digestive system of insect.pptx
Arshad Shaikh
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdfBiophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
PKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdfExploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Sandeep Swamy
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
To study Digestive system of insect.pptx
To study Digestive system of insect.pptxTo study Digestive system of insect.pptx
To study Digestive system of insect.pptx
Arshad Shaikh
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 

Map reduce in Hadoop BIG DATA ANALYTICS

  • 1. Myself Archana R Assistant Professor In Dept Of CS SACWC. I am here because I love to give presentations.
  • 2. MapReduce  MapReduce is a programming model for efficient distributed computing  It works like a Unix pipeline  cat input | grep | sort | uniq -c | cat > output  Input | Map | Shuffle & Sort | Reduce | Output  Efficiency from  Streaming through data, reducing seeks  Pipelining  A good fit for a lot of applications  Log processing  Web index building
  • 4. MapReduce - Features  Fine grained Map and Reduce tasks  Improved load balancing  Faster recovery from failed tasks  Automatic re-execution on failure  In a large cluster, some nodes are always slow or flaky  Framework re-executes failed tasks  Locality optimizations  With large data, bandwidth to data is a problem  Map-Reduce + HDFS is a very effective solution  Map-Reduce queries HDFS for locations of input data  Map tasks are scheduled close to the inputs when possible
  • 5. Word Count Example  Mapper  Input: value: lines of text of input  Output: key: word, value: 1  Reducer  Input: key: word, value: set of counts  Output: key: word, value: sum  Launching program  Defines this job  Submits job to cluster
  • 7. Word Count Mapper public static class Map extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable> { private static final IntWritable one = new IntWritable(1); private Text word = new Text(); public static void map(LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer = new StringTokenizer(line); while(tokenizer.hasNext()) { word.set(tokenizer.nextToken()); output.collect(word,one); } } }
  • 8. Word Count Example  Jobs are controlled by configuring JobConfs  JobConfs are maps from attribute names to string values  The framework defines attributes to control how the job is executed  conf.set(“mapred.job.name”, “MyApp”);  Applications can add arbitrary values to the JobConf  conf.set(“my.string”, “foo”);  conf.set(“my.integer”, 12);  JobConf is available to all tasks
  • 9. Putting it all together  Create a launching program for your application  The launching program configures:  The Mapper and Reducer to use  The output key and value types (input types are inferred from the InputFormat)  The locations for your input and output  The launching program then submits the job and typically waits for it to complete
  • 10. Input and Output Formats  A Map/Reduce may specify how it’s input is to be read by specifying an InputFormat to be used  A Map/Reduce may specify how it’s output is to be written by specifying an OutputFormat to be used  These default to TextInputFormat and TextOutputFormat, which process line-based text data  Another common choice is SequenceFileInputFormat and SequenceFileOutputFormat for binary data  These are file-based, but they are not required to be
  • 11. How many Maps and Reduces  Maps  Usually as many as the number of HDFS blocks being processed, this is the default  Else the number of maps can be specified as a hint  The number of maps can also be controlled by specifying the minimum split size  The actual sizes of the map inputs are computed by:  max(min(block_size,data/#maps), min_split_size  Reduces  Unless the amount of data being processed is small  0.95*num_nodes*mapred.tasktracker.tasks.maximum
  • 12. Some handy tools  Partitioners  Combiners  Compression  Counters  Speculation  Zero Reduces  Distributed File Cache  Tool
  • 13. Partitioners  Partitioners are application code that define how keys are assigned to reduces  Default partitioning spreads keys evenly, but randomly  Uses key.hashCode() % num_reduces  Custom partitioning is often required, for example, to produce a total order in the output  Should implement Partitioner interface  Set by calling conf.setPartitionerClass(MyPart.class)  To get a total order, sample the map output keys and pick values to divide the keys into roughly equal buckets and use that in your partitioner
  • 14. Compression  Compressing the outputs and intermediate data will often yield huge performance gains  Can be specified via a configuration file or set programmatically  Set mapred.output.compress to true to compress job output  Set mapred.compress.map.output to true to compress map outputs  Compression Types (mapred(.map)?.output.compression.type)  “block” - Group of keys and values are compressed together  “record” - Each value is compressed individually  Block compression is almost always best  Compression Codecs (mapred(.map)?.output.compression.codec)  Default (zlib) - slower, but more compression  LZO - faster, but less compression
  • 15. Counters  Often Map/Reduce applications have countable events  For example, framework counts records in to and out of Mapper and Reducer  To define user counters: static enum Counter {EVENT1, EVENT2}; reporter.incrCounter(Counter.EVENT1, 1);  Define nice names in a MyClass_Counter.properties file CounterGroupName=MyCounters EVENT1.name=Event 1 EVENT2.name=Event 2