SlideShare a Scribd company logo
Hadoop Programming
Overview
• MapReduce Types
• Input Formats
• Output Formats
• Serialization
• Job
• http://hadoop.apache.org/docs/r2.2.0/api/or
g/apache/hadoop/mapreduce/package-
summary.html
Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
• Maps input key/value pairs to a set of intermediate key/value pairs.
• Maps are the individual tasks which transform input records into a intermediate
records. The transformed intermediate records need not be of the same type as the
input records. A given input pair may map to zero or many output pairs.
• The Hadoop Map-Reduce framework spawns one map task for each InputSplit
generated by the InputFormat for the job.
• The framework first calls setup(org.apache.hadoop.mapreduce.Mapper.Context),
followed by map(Object, Object, Context) for each key/value pair in the InputSplit.
Finally cleanup(Context) is called.
http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.ht
ml
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
What is Writable?
• Hadoop defines its own “box” classes for
strings (Text), integers (IntWritable), etc.
• All values are instances of Writable
• All keys are instances of WritableComparable
Writable
• A serializable object which implements a simple,
efficient, serialization protocol, based on DataInput
and DataOutput.
• Any key or value type in the Hadoop Map-Reduce
framework implements this interface.
• Implementations typically implement a static
read(DataInput) method which constructs a new
instance, calls readFields(DataInput) and returns the
instance.
• http://hadoop.apache.org/docs/r2.2.0/api/or
g/apache/hadoop/io/Writable.html
Hadoop Programming - MapReduce, Input, Output, Serialization, Job
public class MyWritable implements Writable {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}
public static MyWritable read(DataInput in) throws IOException {
MyWritable w = new MyWritable();
w.readFields(in);
return w;
}
}
public class MyWritableComparable implements WritableComparable {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}
public int compareTo(MyWritableComparable w) {
int thisValue = this.value;
int thatValue = ((IntWritable)o).value;
return (thisValue < thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
}
}
Getting Data To The Mapper
Input file
InputSplit InputSplit InputSplit InputSplit
Input file
RecordReader RecordReader RecordReader RecordReader
Mapper
(intermediates)
Mapper
(intermediates)
Mapper
(intermediates)
Mapper
(intermediates)
InputFormat
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
Reading Data
• Data sets are specified by InputFormats
– Defines input data (e.g., a directory)
– Identifies partitions of the data that form an
InputSplit
– Factory for RecordReader objects to extract (k, v)
records from the input source
Input Format
• InputFormat describes the input-specification for a Map-
Reduce job
• The Map-Reduce framework relies on the InputFormat of the
job to:
– Validate the input-specification of the job.
– Split-up the input file(s) into logical InputSplits, each of which is then
assigned to an individual Mapper.
– Provide the RecordReader implementation to be used to glean input
records from the logical InputSplit for processing by the Mapper.
http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Inp
utFormat.html
FileInputFormat and Friends
• TextInputFormat
– Treats each ‘n’-terminated line of a file as a value
• KeyValueTextInputFormat
– Maps ‘n’- terminated text lines of “k SEP v”
• SequenceFileInputFormat
– Binary file of (k, v) pairs (passing data between the output
of one MapReduce job to the input of some other
MapReduce job)
• SequenceFileAsTextInputFormat
– Same, but maps (k.toString(), v.toString())
Filtering File Inputs
• FileInputFormat will read all files out of a
specified directory and send them to the
mapper
• Delegates filtering this file list to a method
subclasses may override
– e.g., Create your own “xyzFileInputFormat” to
read *.xyz from directory list
Record Readers
• Each InputFormat provides its own
RecordReader implementation
– Provides (unused?) capability multiplexing
• LineRecordReader
– Reads a line from a text file
• KeyValueRecordReader
– Used by KeyValueTextInputFormat
Input Split Size
• FileInputFormat will divide large files into
chunks
– Exact size controlled by mapred.min.split.size
• RecordReaders receive file, offset, and length
of chunk
• Custom InputFormat implementations may
override split size
– e.g., “NeverChunkFile”
Hadoop Programming - MapReduce, Input, Output, Serialization, Job
public class ObjectPositionInputFormat extends
FileInputFormat<Text, Point3D> {
public RecordReader<Text, Point3D> getRecordReader(
InputSplit input, JobConf job, Reporter reporter)
throws IOException {
reporter.setStatus(input.toString());
return new ObjPosRecordReader(job, (FileSplit)input);
}
InputSplit[] getSplits(JobConf job, int numSplits) throuw IOException;
}
class ObjPosRecordReader implements RecordReader<Text, Point3D> {
public ObjPosRecordReader(JobConf job, FileSplit split) throws IOException
{}
public boolean next(Text key, Point3D value) throws IOException {
// get the next line}
public Text createKey() {
}
public Point3D createValue() {
}
public long getPos() throws IOException {
}
public void close() throws IOException {
}
public float getProgress() throws IOException {}
}
Sending Data To Reducers
• Map function produces Map.Context object
– Map.context() takes (k, v) elements
• Any (WritableComparable, Writable) can be
used
WritableComparator
• Compares WritableComparable data
– Will call WritableComparable.compare()
– Can provide fast path for serialized data
Partition And Shuffle
Mapper
(intermediates)
Mapper
(intermediates)
Mapper
(intermediates)
Mapper
(intermediates)
Reducer Reducer Reducer
(intermediates) (intermediates) (intermediates)
Partitioner Partitioner Partitioner Partitioner
shuffling
Partitioner
• int getPartition(key, val, numPartitions)
– Outputs the partition number for a given key
– One partition == values sent to one Reduce task
• HashPartitioner used by default
– Uses key.hashCode() to return partition num
• Job sets Partitioner implementation
public class MyPartitioner implements Partitioner<IntWritable,Text> {
@Override
public int getPartition(IntWritable key, Text value, int numPartitions) {
/* Pretty ugly hard coded partitioning function. Don't do that in practice,
it is just for the sake of understanding. */
int nbOccurences = key.get();
if( nbOccurences < 3 )
return 0;
else
return 1;
}
@Override
public void configure(JobConf arg0) {
}
}
job.setPartitionerClass(MyPartitioner.class);
Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
• Reduces a set of intermediate values which
share a key to a smaller set of values.
• Reducer has 3 primary phases:
– Shuffle
– Sort
– Reduce
• http://hadoop.apache.org/docs/r2.2.0/api/or
g/apache/hadoop/mapreduce/Reducer.html
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
Finally: Writing The Output
Reducer Reducer Reducer
RecordWriter RecordWriter RecordWriter
output file output file output file
OutputFormat
OutputFormat
• Analogous to InputFormat
• TextOutputFormat
– Writes “key valn” strings to output file
• SequenceFileOutputFormat
– Uses a binary format to pack (k, v) pairs
• NullOutputFormat
– Discards output
Hadoop Programming - MapReduce, Input, Output, Serialization, Job
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
Job
• The job submitter's view of the Job.
• It allows the user to configure the job, submit it,
control its execution, and query the state. The set
methods only work until the job is submitted,
afterwards they will throw an IllegalStateException.
• Normally the user creates the application, describes
various facets of the job via Job and then submits the
job and monitor its progress.
http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Job.html
Hadoop Programming - MapReduce, Input, Output, Serialization, Job
Ad

More Related Content

Similar to Hadoop Programming - MapReduce, Input, Output, Serialization, Job (20)

Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
Prashant Gupta
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbai
Unmesh Baile
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
IndicThreads
 
Introduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfIntroduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdf
BikalAdhikari4
 
JRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusJRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop Papyrus
Koichi Fujikawa
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
Subhas Kumar Ghosh
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
Hugo Gävert
 
mapreduce ppt.ppt
mapreduce ppt.pptmapreduce ppt.ppt
mapreduce ppt.ppt
TAGADPALLEWARPARTHVA
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
Jazan University
 
hadoop.ppt
hadoop.ppthadoop.ppt
hadoop.ppt
AnushkaChauhan68
 
MapReduce and Hadoop Introcuctory Presentation
MapReduce and Hadoop Introcuctory PresentationMapReduce and Hadoop Introcuctory Presentation
MapReduce and Hadoop Introcuctory Presentation
ssuserb91a20
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 
Lecture 4 Parallel and Distributed Systems Fall 2024.ppt
Lecture 4 Parallel and Distributed Systems Fall 2024.pptLecture 4 Parallel and Distributed Systems Fall 2024.ppt
Lecture 4 Parallel and Distributed Systems Fall 2024.ppt
ssusere82d541
 
L4.FA16n nm,m,m,,m,m,m,mmnm,n,mnmnmm.ppt
L4.FA16n nm,m,m,,m,m,m,mmnm,n,mnmnmm.pptL4.FA16n nm,m,m,,m,m,m,mmnm,n,mnmnmm.ppt
L4.FA16n nm,m,m,,m,m,m,mmnm,n,mnmnmm.ppt
abdulbasetalselwi
 
L3.fa14.ppt
L3.fa14.pptL3.fa14.ppt
L3.fa14.ppt
Tushar557668
 
MAPREDUCE ppt big data computing fall 2014 indranil gupta.ppt
MAPREDUCE ppt big data computing fall 2014 indranil gupta.pptMAPREDUCE ppt big data computing fall 2014 indranil gupta.ppt
MAPREDUCE ppt big data computing fall 2014 indranil gupta.ppt
zuhaibmohammed465
 
Hadoop源码分析 mapreduce部分
Hadoop源码分析 mapreduce部分Hadoop源码分析 mapreduce部分
Hadoop源码分析 mapreduce部分
sg7879
 
Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2
EasyMedico.com
 
Hadoop 3
Hadoop 3Hadoop 3
Hadoop 3
shams03159691010
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Ran Silberman
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbai
Unmesh Baile
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
IndicThreads
 
Introduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfIntroduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdf
BikalAdhikari4
 
JRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusJRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop Papyrus
Koichi Fujikawa
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
Subhas Kumar Ghosh
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
Hugo Gävert
 
MapReduce and Hadoop Introcuctory Presentation
MapReduce and Hadoop Introcuctory PresentationMapReduce and Hadoop Introcuctory Presentation
MapReduce and Hadoop Introcuctory Presentation
ssuserb91a20
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 
Lecture 4 Parallel and Distributed Systems Fall 2024.ppt
Lecture 4 Parallel and Distributed Systems Fall 2024.pptLecture 4 Parallel and Distributed Systems Fall 2024.ppt
Lecture 4 Parallel and Distributed Systems Fall 2024.ppt
ssusere82d541
 
L4.FA16n nm,m,m,,m,m,m,mmnm,n,mnmnmm.ppt
L4.FA16n nm,m,m,,m,m,m,mmnm,n,mnmnmm.pptL4.FA16n nm,m,m,,m,m,m,mmnm,n,mnmnmm.ppt
L4.FA16n nm,m,m,,m,m,m,mmnm,n,mnmnmm.ppt
abdulbasetalselwi
 
MAPREDUCE ppt big data computing fall 2014 indranil gupta.ppt
MAPREDUCE ppt big data computing fall 2014 indranil gupta.pptMAPREDUCE ppt big data computing fall 2014 indranil gupta.ppt
MAPREDUCE ppt big data computing fall 2014 indranil gupta.ppt
zuhaibmohammed465
 
Hadoop源码分析 mapreduce部分
Hadoop源码分析 mapreduce部分Hadoop源码分析 mapreduce部分
Hadoop源码分析 mapreduce部分
sg7879
 

More from Jason J Pulikkottil (20)

Unix/Linux Command Reference - File Commands and Shortcuts
Unix/Linux Command Reference - File Commands and ShortcutsUnix/Linux Command Reference - File Commands and Shortcuts
Unix/Linux Command Reference - File Commands and Shortcuts
Jason J Pulikkottil
 
Introduction to PERL Programming - Complete Notes
Introduction to PERL Programming - Complete NotesIntroduction to PERL Programming - Complete Notes
Introduction to PERL Programming - Complete Notes
Jason J Pulikkottil
 
VLSI System Verilog Notes with Coding Examples
VLSI System Verilog Notes with Coding ExamplesVLSI System Verilog Notes with Coding Examples
VLSI System Verilog Notes with Coding Examples
Jason J Pulikkottil
 
VLSI Physical Design Physical Design Concepts
VLSI Physical Design Physical Design ConceptsVLSI Physical Design Physical Design Concepts
VLSI Physical Design Physical Design Concepts
Jason J Pulikkottil
 
Verilog Coding examples of Digital Circuits
Verilog Coding examples of Digital CircuitsVerilog Coding examples of Digital Circuits
Verilog Coding examples of Digital Circuits
Jason J Pulikkottil
 
Floor Plan, Placement Questions and Answers
Floor Plan, Placement Questions and AnswersFloor Plan, Placement Questions and Answers
Floor Plan, Placement Questions and Answers
Jason J Pulikkottil
 
Physical Design, ASIC Design, Standard Cells
Physical Design, ASIC Design, Standard CellsPhysical Design, ASIC Design, Standard Cells
Physical Design, ASIC Design, Standard Cells
Jason J Pulikkottil
 
Basic Electronics, Digital Electronics, Static Timing Analysis Notes
Basic Electronics, Digital Electronics, Static Timing Analysis NotesBasic Electronics, Digital Electronics, Static Timing Analysis Notes
Basic Electronics, Digital Electronics, Static Timing Analysis Notes
Jason J Pulikkottil
 
Floorplan, Powerplan and Data Setup, Stages
Floorplan, Powerplan and Data Setup, StagesFloorplan, Powerplan and Data Setup, Stages
Floorplan, Powerplan and Data Setup, Stages
Jason J Pulikkottil
 
Floorplanning Power Planning and Placement
Floorplanning Power Planning and PlacementFloorplanning Power Planning and Placement
Floorplanning Power Planning and Placement
Jason J Pulikkottil
 
Digital Electronics Questions and Answers
Digital Electronics Questions and AnswersDigital Electronics Questions and Answers
Digital Electronics Questions and Answers
Jason J Pulikkottil
 
Different Types Of Cells, Types of Standard Cells
Different Types Of Cells, Types of Standard CellsDifferent Types Of Cells, Types of Standard Cells
Different Types Of Cells, Types of Standard Cells
Jason J Pulikkottil
 
DFT Rules, set of rules with illustration
DFT Rules, set of rules with illustrationDFT Rules, set of rules with illustration
DFT Rules, set of rules with illustration
Jason J Pulikkottil
 
Clock Definitions Static Timing Analysis for VLSI Engineers
Clock Definitions Static Timing Analysis for VLSI EngineersClock Definitions Static Timing Analysis for VLSI Engineers
Clock Definitions Static Timing Analysis for VLSI Engineers
Jason J Pulikkottil
 
Basic Synthesis Flow and Commands, Logic Synthesis
Basic Synthesis Flow and Commands, Logic SynthesisBasic Synthesis Flow and Commands, Logic Synthesis
Basic Synthesis Flow and Commands, Logic Synthesis
Jason J Pulikkottil
 
ASIC Design Types, Logical Libraries, Optimization
ASIC Design Types, Logical Libraries, OptimizationASIC Design Types, Logical Libraries, Optimization
ASIC Design Types, Logical Libraries, Optimization
Jason J Pulikkottil
 
Floorplanning and Powerplanning - Definitions and Notes
Floorplanning and Powerplanning - Definitions and NotesFloorplanning and Powerplanning - Definitions and Notes
Floorplanning and Powerplanning - Definitions and Notes
Jason J Pulikkottil
 
Physical Design Flow - Standard Cells and Special Cells
Physical Design Flow - Standard Cells and Special CellsPhysical Design Flow - Standard Cells and Special Cells
Physical Design Flow - Standard Cells and Special Cells
Jason J Pulikkottil
 
Physical Design - Import Design Flow Floorplan
Physical Design - Import Design Flow FloorplanPhysical Design - Import Design Flow Floorplan
Physical Design - Import Design Flow Floorplan
Jason J Pulikkottil
 
Physical Design-Floor Planning Goals And Placement
Physical Design-Floor Planning Goals And PlacementPhysical Design-Floor Planning Goals And Placement
Physical Design-Floor Planning Goals And Placement
Jason J Pulikkottil
 
Unix/Linux Command Reference - File Commands and Shortcuts
Unix/Linux Command Reference - File Commands and ShortcutsUnix/Linux Command Reference - File Commands and Shortcuts
Unix/Linux Command Reference - File Commands and Shortcuts
Jason J Pulikkottil
 
Introduction to PERL Programming - Complete Notes
Introduction to PERL Programming - Complete NotesIntroduction to PERL Programming - Complete Notes
Introduction to PERL Programming - Complete Notes
Jason J Pulikkottil
 
VLSI System Verilog Notes with Coding Examples
VLSI System Verilog Notes with Coding ExamplesVLSI System Verilog Notes with Coding Examples
VLSI System Verilog Notes with Coding Examples
Jason J Pulikkottil
 
VLSI Physical Design Physical Design Concepts
VLSI Physical Design Physical Design ConceptsVLSI Physical Design Physical Design Concepts
VLSI Physical Design Physical Design Concepts
Jason J Pulikkottil
 
Verilog Coding examples of Digital Circuits
Verilog Coding examples of Digital CircuitsVerilog Coding examples of Digital Circuits
Verilog Coding examples of Digital Circuits
Jason J Pulikkottil
 
Floor Plan, Placement Questions and Answers
Floor Plan, Placement Questions and AnswersFloor Plan, Placement Questions and Answers
Floor Plan, Placement Questions and Answers
Jason J Pulikkottil
 
Physical Design, ASIC Design, Standard Cells
Physical Design, ASIC Design, Standard CellsPhysical Design, ASIC Design, Standard Cells
Physical Design, ASIC Design, Standard Cells
Jason J Pulikkottil
 
Basic Electronics, Digital Electronics, Static Timing Analysis Notes
Basic Electronics, Digital Electronics, Static Timing Analysis NotesBasic Electronics, Digital Electronics, Static Timing Analysis Notes
Basic Electronics, Digital Electronics, Static Timing Analysis Notes
Jason J Pulikkottil
 
Floorplan, Powerplan and Data Setup, Stages
Floorplan, Powerplan and Data Setup, StagesFloorplan, Powerplan and Data Setup, Stages
Floorplan, Powerplan and Data Setup, Stages
Jason J Pulikkottil
 
Floorplanning Power Planning and Placement
Floorplanning Power Planning and PlacementFloorplanning Power Planning and Placement
Floorplanning Power Planning and Placement
Jason J Pulikkottil
 
Digital Electronics Questions and Answers
Digital Electronics Questions and AnswersDigital Electronics Questions and Answers
Digital Electronics Questions and Answers
Jason J Pulikkottil
 
Different Types Of Cells, Types of Standard Cells
Different Types Of Cells, Types of Standard CellsDifferent Types Of Cells, Types of Standard Cells
Different Types Of Cells, Types of Standard Cells
Jason J Pulikkottil
 
DFT Rules, set of rules with illustration
DFT Rules, set of rules with illustrationDFT Rules, set of rules with illustration
DFT Rules, set of rules with illustration
Jason J Pulikkottil
 
Clock Definitions Static Timing Analysis for VLSI Engineers
Clock Definitions Static Timing Analysis for VLSI EngineersClock Definitions Static Timing Analysis for VLSI Engineers
Clock Definitions Static Timing Analysis for VLSI Engineers
Jason J Pulikkottil
 
Basic Synthesis Flow and Commands, Logic Synthesis
Basic Synthesis Flow and Commands, Logic SynthesisBasic Synthesis Flow and Commands, Logic Synthesis
Basic Synthesis Flow and Commands, Logic Synthesis
Jason J Pulikkottil
 
ASIC Design Types, Logical Libraries, Optimization
ASIC Design Types, Logical Libraries, OptimizationASIC Design Types, Logical Libraries, Optimization
ASIC Design Types, Logical Libraries, Optimization
Jason J Pulikkottil
 
Floorplanning and Powerplanning - Definitions and Notes
Floorplanning and Powerplanning - Definitions and NotesFloorplanning and Powerplanning - Definitions and Notes
Floorplanning and Powerplanning - Definitions and Notes
Jason J Pulikkottil
 
Physical Design Flow - Standard Cells and Special Cells
Physical Design Flow - Standard Cells and Special CellsPhysical Design Flow - Standard Cells and Special Cells
Physical Design Flow - Standard Cells and Special Cells
Jason J Pulikkottil
 
Physical Design - Import Design Flow Floorplan
Physical Design - Import Design Flow FloorplanPhysical Design - Import Design Flow Floorplan
Physical Design - Import Design Flow Floorplan
Jason J Pulikkottil
 
Physical Design-Floor Planning Goals And Placement
Physical Design-Floor Planning Goals And PlacementPhysical Design-Floor Planning Goals And Placement
Physical Design-Floor Planning Goals And Placement
Jason J Pulikkottil
 
Ad

Recently uploaded (20)

Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATAAWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
SnehaBoja
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
...lab.Lab123456789123456789123456789123456789
...lab.Lab123456789123456789123456789123456789...lab.Lab123456789123456789123456789123456789
...lab.Lab123456789123456789123456789123456789
Ghh
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
pankaj6188303
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATAAWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
SnehaBoja
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
...lab.Lab123456789123456789123456789123456789
...lab.Lab123456789123456789123456789123456789...lab.Lab123456789123456789123456789123456789
...lab.Lab123456789123456789123456789123456789
Ghh
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
pankaj6188303
 
Ad

Hadoop Programming - MapReduce, Input, Output, Serialization, Job

  • 2. Overview • MapReduce Types • Input Formats • Output Formats • Serialization • Job • http://hadoop.apache.org/docs/r2.2.0/api/or g/apache/hadoop/mapreduce/package- summary.html
  • 3. Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT> • Maps input key/value pairs to a set of intermediate key/value pairs. • Maps are the individual tasks which transform input records into a intermediate records. The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs. • The Hadoop Map-Reduce framework spawns one map task for each InputSplit generated by the InputFormat for the job. • The framework first calls setup(org.apache.hadoop.mapreduce.Mapper.Context), followed by map(Object, Object, Context) for each key/value pair in the InputSplit. Finally cleanup(Context) is called. http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.ht ml
  • 4. public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } }
  • 5. What is Writable? • Hadoop defines its own “box” classes for strings (Text), integers (IntWritable), etc. • All values are instances of Writable • All keys are instances of WritableComparable
  • 6. Writable • A serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput. • Any key or value type in the Hadoop Map-Reduce framework implements this interface. • Implementations typically implement a static read(DataInput) method which constructs a new instance, calls readFields(DataInput) and returns the instance. • http://hadoop.apache.org/docs/r2.2.0/api/or g/apache/hadoop/io/Writable.html
  • 8. public class MyWritable implements Writable { // Some data private int counter; private long timestamp; public void write(DataOutput out) throws IOException { out.writeInt(counter); out.writeLong(timestamp); } public void readFields(DataInput in) throws IOException { counter = in.readInt(); timestamp = in.readLong(); } public static MyWritable read(DataInput in) throws IOException { MyWritable w = new MyWritable(); w.readFields(in); return w; } }
  • 9. public class MyWritableComparable implements WritableComparable { // Some data private int counter; private long timestamp; public void write(DataOutput out) throws IOException { out.writeInt(counter); out.writeLong(timestamp); } public void readFields(DataInput in) throws IOException { counter = in.readInt(); timestamp = in.readLong(); } public int compareTo(MyWritableComparable w) { int thisValue = this.value; int thatValue = ((IntWritable)o).value; return (thisValue < thatValue ? -1 : (thisValue==thatValue ? 0 : 1)); } }
  • 10. Getting Data To The Mapper Input file InputSplit InputSplit InputSplit InputSplit Input file RecordReader RecordReader RecordReader RecordReader Mapper (intermediates) Mapper (intermediates) Mapper (intermediates) Mapper (intermediates) InputFormat
  • 11. public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }
  • 12. Reading Data • Data sets are specified by InputFormats – Defines input data (e.g., a directory) – Identifies partitions of the data that form an InputSplit – Factory for RecordReader objects to extract (k, v) records from the input source
  • 13. Input Format • InputFormat describes the input-specification for a Map- Reduce job • The Map-Reduce framework relies on the InputFormat of the job to: – Validate the input-specification of the job. – Split-up the input file(s) into logical InputSplits, each of which is then assigned to an individual Mapper. – Provide the RecordReader implementation to be used to glean input records from the logical InputSplit for processing by the Mapper. http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Inp utFormat.html
  • 14. FileInputFormat and Friends • TextInputFormat – Treats each ‘n’-terminated line of a file as a value • KeyValueTextInputFormat – Maps ‘n’- terminated text lines of “k SEP v” • SequenceFileInputFormat – Binary file of (k, v) pairs (passing data between the output of one MapReduce job to the input of some other MapReduce job) • SequenceFileAsTextInputFormat – Same, but maps (k.toString(), v.toString())
  • 15. Filtering File Inputs • FileInputFormat will read all files out of a specified directory and send them to the mapper • Delegates filtering this file list to a method subclasses may override – e.g., Create your own “xyzFileInputFormat” to read *.xyz from directory list
  • 16. Record Readers • Each InputFormat provides its own RecordReader implementation – Provides (unused?) capability multiplexing • LineRecordReader – Reads a line from a text file • KeyValueRecordReader – Used by KeyValueTextInputFormat
  • 17. Input Split Size • FileInputFormat will divide large files into chunks – Exact size controlled by mapred.min.split.size • RecordReaders receive file, offset, and length of chunk • Custom InputFormat implementations may override split size – e.g., “NeverChunkFile”
  • 19. public class ObjectPositionInputFormat extends FileInputFormat<Text, Point3D> { public RecordReader<Text, Point3D> getRecordReader( InputSplit input, JobConf job, Reporter reporter) throws IOException { reporter.setStatus(input.toString()); return new ObjPosRecordReader(job, (FileSplit)input); } InputSplit[] getSplits(JobConf job, int numSplits) throuw IOException; }
  • 20. class ObjPosRecordReader implements RecordReader<Text, Point3D> { public ObjPosRecordReader(JobConf job, FileSplit split) throws IOException {} public boolean next(Text key, Point3D value) throws IOException { // get the next line} public Text createKey() { } public Point3D createValue() { } public long getPos() throws IOException { } public void close() throws IOException { } public float getProgress() throws IOException {} }
  • 21. Sending Data To Reducers • Map function produces Map.Context object – Map.context() takes (k, v) elements • Any (WritableComparable, Writable) can be used
  • 22. WritableComparator • Compares WritableComparable data – Will call WritableComparable.compare() – Can provide fast path for serialized data
  • 23. Partition And Shuffle Mapper (intermediates) Mapper (intermediates) Mapper (intermediates) Mapper (intermediates) Reducer Reducer Reducer (intermediates) (intermediates) (intermediates) Partitioner Partitioner Partitioner Partitioner shuffling
  • 24. Partitioner • int getPartition(key, val, numPartitions) – Outputs the partition number for a given key – One partition == values sent to one Reduce task • HashPartitioner used by default – Uses key.hashCode() to return partition num • Job sets Partitioner implementation
  • 25. public class MyPartitioner implements Partitioner<IntWritable,Text> { @Override public int getPartition(IntWritable key, Text value, int numPartitions) { /* Pretty ugly hard coded partitioning function. Don't do that in practice, it is just for the sake of understanding. */ int nbOccurences = key.get(); if( nbOccurences < 3 ) return 0; else return 1; } @Override public void configure(JobConf arg0) { } } job.setPartitionerClass(MyPartitioner.class);
  • 26. Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> • Reduces a set of intermediate values which share a key to a smaller set of values. • Reducer has 3 primary phases: – Shuffle – Sort – Reduce • http://hadoop.apache.org/docs/r2.2.0/api/or g/apache/hadoop/mapreduce/Reducer.html
  • 27. public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } }
  • 28. Finally: Writing The Output Reducer Reducer Reducer RecordWriter RecordWriter RecordWriter output file output file output file OutputFormat
  • 29. OutputFormat • Analogous to InputFormat • TextOutputFormat – Writes “key valn” strings to output file • SequenceFileOutputFormat – Uses a binary format to pack (k, v) pairs • NullOutputFormat – Discards output
  • 31. public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }
  • 32. Job • The job submitter's view of the Job. • It allows the user to configure the job, submit it, control its execution, and query the state. The set methods only work until the job is submitted, afterwards they will throw an IllegalStateException. • Normally the user creates the application, describes various facets of the job via Job and then submits the job and monitor its progress. http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Job.html