0% found this document useful (0 votes)

36 views8 pages

MapReduce Questions

Map reducing questions for big data subject for computer science students

Uploaded by

nivithaswathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views8 pages

MapReduce Questions

Map reducing questions for big data subject for computer science students

Uploaded by

nivithaswathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

1. What do you mean by side data? Mention its challenges.

Side data can be defined as extra read-only data needed by a job to

process the main dataset. The challenge is to make side data available to
all the map or reduce tasks in a convenient and efficient fashion.

2. Write down the uses of counters.

 Counters are a useful channel for gathering statistics about the job: for
quality control or for application level-statistics.
 They are also useful for problem diagnosis. If you are tempted to put a
log message into your map or reduce task, then it is often better to see
whether you can use a counter instead to record that a particular
condition occurred.
 Counter values are much easier to retrieve than log output for large
distributed jobs, you get a record of the number of times that condition
occurred, which is more work to obtain from a set of logfiles.
3. What is FileInputFormat?
FileInputFormat is the base class for all implementations of
InputFormat that use files as their data source. It provides two
things: a place to define which files are included as the input to a
job, and an implementation for generating splits for the input files.
The job of dividing splits into records is performed by subclasses.

4. Illustrate with a neat diagram, the use of separators in streaming

MapReduce Jobs.
5. Define partition function.
The partition function operates on the intermediate key and value
types (K2 and V2), and returns the partition index. In practice, the
partition is determined solely by the key (the value is ignored):
partition: (K2, V2) → integer
6. What is IdentityMapper?
IdentityMapper is a generic type, which allows it to work with any
key or value types, with the restriction that the map input and
output keys are of the same type, and the map input and output
values are of the same type.
7. Create a MapReduce program for sorting a Sequence File with
IntWritable keys using the default HashPartitioner.

MapReduce program for sorting a SequenceFile

public class SortByTemperatureToMapFile extends Configured implements Tool

{ @Override

public int run(String[] args) throws Exception {

Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);

if (job == null) {

return -1;
}

job.setInputFormatClass(SequenceFileInputFormat.class);

job.setOutputKeyClass(IntWritable.class);

job.setOutputFormatClass(MapFileOutputFormat.class);

SequenceFileOutputFormat.setCompressOutput(job, true);

SequenceFileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);

SequenceFileOutputFormat.setOutputCompressionType(job,

CompressionType.BLOCK);

return job.waitForCompletion(true) ? 0 : 1;

8. Write a note on the MapReduce Library Classes.

MapReduce Library Classes

https://www.javatpoint.com/mapreduce-api

9. Write a note on Map-Side Joins.

10. Explain about Input Splits and Records.

11. Discuss the Counter in MapReduce with proper example.

https://www.dataspoof.info/post/understanding-counter-in-mapreduce-along-

with-code/

https://techvidvan.com/tutorials/hadoop-counters-types-and-roles/

User-defined counters
Till now, we have discussed all in-build counters. But what if we want some
statistics which is not provided by existing Hadoop MapReduce counters. So,
Hadoop is providing an extra feature for the same. MapReduce user-defined
counters come to the rescue you in such a case. There are some pre-defined
ways in which we can calculate our user-defined counters based on the client’s
custom requirement. We can also say a custom counter to them. In java, we are
using the enum type for calculating custom or user-defined counters.

In a Hadoop job, we can be defined as no. of enums as per our requirement.

Here, each enum is a counter group, and each field of enum is considered as a
counter in the particular counter group. So, this is compiled time approach. So,
we can’t define or change it at runtime. We need to specify it before the job run.
Dynamic Counters in Hadoop
Apart from enum-based user-defined counters, which are available at compile-
time, which means we can not change or add new counters at runtime, what if
we want to add new counters dynamically at runtime? Here are dynamic
counters which we can use at runtime. But, we can’t define them at compile
time.

Implementation of Counter in MapReduce

Now, let’s implement a sample program to create two counters,
ODD_NUMBERS_COUNT and EVEN_NUMBERS_COUNT. Suppose we have numbers
having a single number on each line; how we will calculate the count of even and
odd numbers. Let’s see now.

Let’s first take a sample Input file below detail:

So, the output should be

ODD_NUMBERS_COUNT 3

EVEN_NUMBERS_COUNT 3
Now, let’s see the program to see how do we have implemented the counters:

Mapper Class
1import org.apache.hadoop.io.IntWritable;
2import org.apache.hadoop.io.LongWritable;
3import org.apache.hadoop.io.Text;
4import org.apache.hadoop.mapreduce.Mapper;
5
6import java.io.IOException;
7
8// Mapper
9public class MapperClass extends Mapper<LongWritable, Text, Text, IntWritable>{
10 private Text item = new Text();
11 IntWritable sales = new IntWritable();
12 public void map(LongWritable key, Text value, Context context)
13 throws IOException, InterruptedException {
14 // Splitting the line on tab
15 int number = Integer.parseInt(value.toString());
16
17
18 if(number%2==0) {
19 context.getCounter(Driver.Counter.EVEN_COUNT).increment(1);
20 }else {
21 // incrementing counter
22 context.getCounter(Driver.Counter.ODD_COUNT).increment(1);
23
24 }
25 context.write(item, sales);
26 }
27}
Driver Class
1import org.apache.hadoop.conf.Configuration;
2import org.apache.hadoop.conf.Configured;
3import org.apache.hadoop.fs.Path;
4import org.apache.hadoop.io.IntWritable;
5import org.apache.hadoop.io.Text;
6import org.apache.hadoop.mapreduce.Job;
7import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
8import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
9import org.apache.hadoop.util.Tool;
10import org.apache.hadoop.util.ToolRunner;
11
12public class Driver extends Configured implements Tool {
13
14 enum Counter {
15 ODD_COUNT,
16 EVEN_COUNT
17 }
18
19 public static void main(String[] args) throws Exception {
20 int exitFlag = ToolRunner.run(new Driver(), args);
21 System.exit(exitFlag);
22 }
23
24 public int run(String[] args) throws Exception {
25 Configuration conf=new Configuration();
26 Job job = Job.getInstance(conf, "ODD-EVEN Counter");
27 job.setJarByClass(getClass());
28 job.setMapperClass(MapperClass.class);
29 job.setNumReduceTasks(0);
30 job.setOutputKeyClass(Text.class);
31 job.setOutputValueClass(IntWritable.class);
32 FileInputFormat.addInputPath(job, new Path(args[0]));
33 FileOutputFormat.setOutputPath(job, new Path(args[1]));
34 return job.waitForCompletion(true) ? 0 : 1;
35 }
36
37}
Here, we have two classes.

One is the Driver class, where we have created one enum to implement
counter. As we have discussed before, the enum name Counter is the counter
group name, and ODD_NUMBER and EVEN_NUMBER are two counters which we
will use to calculate even numbers and odd numbers.
Also, we have mentioned different configuration parameters like which is the
mapper class, and there is no reducer class, and some other basic details like
output key and output value class type, input path from where we will read input
and output path where we will store the output.

Another class is mapper class, where we will actually calculate the counter for
even numbers and odd numbers.

context.getCounter(Driver.Counter.EVEN_COUNT).increment(1);
This line is used to increment a counter for an even number, and same there is a
line to increment a counter for an odd number.

Here is the counter we are getting when we run for the input file mentioned
above.

Output

12. Explain the different ways of sorting datasets in MapReduce

Partial Sort:-
The reducer output will be lot of files each of which is sorted within itself based

on the key.

Total Sort:

The reducer output will be a single file having all the output sorted based on the

key.

Secondary Sort:

In this case, we will be able to control the ordering of the values along with the

keys.That is sorting can be done on two or more field values.

Secondary Sort

Secondary sort is a technique which can be used for sorting data on multiple

field. It relies on using a Composite Key which will contain all the values we

want to use for sorting.

In this section we will read our “donations” Sequence File, and map each

donation record to a (CompositeKey, DonationWritable) pair before shuffling and

reducing.

All classes used in this section can be viewed on GitHub in this

package : https://github.com/nicomak/[…]/secondarysort.

The MapReduce secondary sort job which is executed to get our query results is

in the OrderByCompositeKey.java (view) file from the same package.

Composite Key

Our query wants to sort the results on 3 values, so we create

a WritableComparable class called CompositeKey (view) with the following 3

attributes :

 state (String) – This one will be used as our natural key (or primary key)

for partitioning
 city (String) – A secondary key for sorting keys with same “state” natural

key within a partition

 total (float) – Another secondary key for further sorting when the “city” is

the same

1. Partial Sort

Definition: A partial sort is when you sort only a portion of the dataset based on
a specific criterion, rather than sorting the entire dataset.

Use Case: It is useful when you want to retrieve the top N elements based on a
key.

Example:

 You have a dataset of sales records, and you want to find the top 10
salespeople.

Diagram:

2. Total Sort

Definition: A total sort involves sorting the entire dataset based on a specified
key.

Use Case: This is applicable when you need a complete ordered view of the
data.

Example:

 Sorting a list of student records by their grades.

3. Secondary Sort

Definition: Secondary sort allows sorting by multiple criteria. After the primary
sort is applied, the secondary sort organizes the data further based on a
secondary key.

Use Case: Useful when you need to sort by one key and then by another, like
sorting by last name and then by first name.

Example:

 Sorting a list of employees by department (primary key) and then by their

hire date (secondary key).

Diagram:

13. Explain Java counters used in MapReduce

 Dynamic counters
 Readable counter names
 Retrieving counters

PSM1Question ALLNew
0% (2)
PSM1Question ALLNew
7 pages
BDA - Unit - III-1
No ratings yet
BDA - Unit - III-1
57 pages
Hadoop MapReduce Join & Counter With Example
No ratings yet
Hadoop MapReduce Join & Counter With Example
15 pages
S MapReduce Types Formats Features 03
No ratings yet
S MapReduce Types Formats Features 03
16 pages
MapReduce Algo Design Final
No ratings yet
MapReduce Algo Design Final
46 pages
S MapReduce Types Formats
100% (2)
S MapReduce Types Formats
22 pages
L4
No ratings yet
L4
65 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
S MapReduce Types Formats Features 06
No ratings yet
S MapReduce Types Formats Features 06
26 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
37 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
6. Map Reduce Programming
No ratings yet
6. Map Reduce Programming
64 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
BSC in Information Technology (Data Science) : Massive or Big Data Processing J.Alosius
No ratings yet
BSC in Information Technology (Data Science) : Massive or Big Data Processing J.Alosius
30 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
CLOUD UNIT 5
No ratings yet
CLOUD UNIT 5
52 pages
BDT UNIT - III
No ratings yet
BDT UNIT - III
12 pages
Understanding Inputs and Outputs of Mapreduce
No ratings yet
Understanding Inputs and Outputs of Mapreduce
13 pages
Question Bank-BDA
No ratings yet
Question Bank-BDA
15 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
Bda Ia1 Scheme
No ratings yet
Bda Ia1 Scheme
7 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
Unit v Programming Model
No ratings yet
Unit v Programming Model
53 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
CC UNIT-7
No ratings yet
CC UNIT-7
16 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
27 pages
02-Hadoop
No ratings yet
02-Hadoop
117 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
17 pages
Hadoop Unit III DR David
No ratings yet
Hadoop Unit III DR David
12 pages
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
No ratings yet
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
55 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Unit 4 Handouts
No ratings yet
Unit 4 Handouts
13 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
2.1-MapReduce
No ratings yet
2.1-MapReduce
16 pages
S MapReduce Types Formats Features
No ratings yet
S MapReduce Types Formats Features
15 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Data Analytics
No ratings yet
Data Analytics
26 pages
Top Answers To Map Reduce Interview Questions
No ratings yet
Top Answers To Map Reduce Interview Questions
6 pages
BDA CIA 2 IMP Questions
No ratings yet
BDA CIA 2 IMP Questions
44 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
Lesson 2 A Review of Hadoop
No ratings yet
Lesson 2 A Review of Hadoop
6 pages
MAP Reduce - 1 (1).Pptx (1)
No ratings yet
MAP Reduce - 1 (1).Pptx (1)
34 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Map Reduce
No ratings yet
Map Reduce
40 pages
More on C# in Front Office
From Everand
More on C# in Front Office
Xing Zhou
No ratings yet
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
No ratings yet
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
71 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester_Scheme of Evaluation (1) - Copy
No ratings yet
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester_Scheme of Evaluation (1) - Copy
14 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
084 Liza Bda File
No ratings yet
084 Liza Bda File
23 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
BD - Unit - III - MapReduce
100% (1)
BD - Unit - III - MapReduce
31 pages
15me62t U1 Sy
No ratings yet
15me62t U1 Sy
17 pages
SALOME 8 3 0 Release Notes
No ratings yet
SALOME 8 3 0 Release Notes
27 pages
Introduction and History of C Programming Language
100% (2)
Introduction and History of C Programming Language
5 pages
Tanveer Hall Ticket
No ratings yet
Tanveer Hall Ticket
1 page
Kali Book Ar PDF
No ratings yet
Kali Book Ar PDF
155 pages
Education: Siladittya Manna
No ratings yet
Education: Siladittya Manna
3 pages
Ecs Cse 7thsem Unit 1 For VTU, Belgaum
No ratings yet
Ecs Cse 7thsem Unit 1 For VTU, Belgaum
81 pages
Daily Alert Category
No ratings yet
Daily Alert Category
8 pages
Berkay - Mican - Resume - 28 06 2023 21 42 16
No ratings yet
Berkay - Mican - Resume - 28 06 2023 21 42 16
2 pages
Java Classes and Objects, Declaring Objects, Constructors: Dr. Kumud Tripathi
No ratings yet
Java Classes and Objects, Declaring Objects, Constructors: Dr. Kumud Tripathi
13 pages
Gley MobileAds Documentation
No ratings yet
Gley MobileAds Documentation
39 pages
Optical Mouse Sensor: M16125 (A2803) Series Datasheet
No ratings yet
Optical Mouse Sensor: M16125 (A2803) Series Datasheet
11 pages
Assembler 166
No ratings yet
Assembler 166
352 pages
Ansi Eia 649 Revision A Draft and Eia 649 Handbook Status
No ratings yet
Ansi Eia 649 Revision A Draft and Eia 649 Handbook Status
25 pages
Mark V
67% (3)
Mark V
162 pages
TNTET 2019 Paper 2 Original Question Paper
No ratings yet
TNTET 2019 Paper 2 Original Question Paper
58 pages
Github Ex 5 NIVESH R URK23CS1262
No ratings yet
Github Ex 5 NIVESH R URK23CS1262
4 pages
FINEU2012 Himmighoefer Howto
No ratings yet
FINEU2012 Himmighoefer Howto
11 pages
Boolean Logic PDF
No ratings yet
Boolean Logic PDF
7 pages
S. A. Ahsan Rajon: S/O: M. A. Khalil RES: 322. Khan-Jahan Ali Road, Shantidham More, Khulna-9100. Cell
No ratings yet
S. A. Ahsan Rajon: S/O: M. A. Khalil RES: 322. Khan-Jahan Ali Road, Shantidham More, Khulna-9100. Cell
7 pages
Detailing Countdown: Months Days
No ratings yet
Detailing Countdown: Months Days
5 pages
How To Install PostgreSQL 11 On Debian 9
No ratings yet
How To Install PostgreSQL 11 On Debian 9
7 pages
Digital Image Processing
No ratings yet
Digital Image Processing
23 pages
Penthouse Creative Resume Template Black
No ratings yet
Penthouse Creative Resume Template Black
1 page
Ramdump Modem 2023-07-15 20-10-45 Props
No ratings yet
Ramdump Modem 2023-07-15 20-10-45 Props
17 pages
Flow Chart
No ratings yet
Flow Chart
1 page
ISTQB Certified Tester - Foundation Level Syllabus v4.0-pg2
No ratings yet
ISTQB Certified Tester - Foundation Level Syllabus v4.0-pg2
1 page
Omni Directional
No ratings yet
Omni Directional
2 pages
COM 325 Human Computer Interface
100% (1)
COM 325 Human Computer Interface
8 pages

MapReduce Questions

Uploaded by

MapReduce Questions

Uploaded by

1. What do you mean by side data? Mention its challenges.

Side data can be defined as extra read-only data needed by a job to

2. Write down the uses of counters.

4. Illustrate with a neat diagram, the use of separators in streaming

MapReduce program for sorting a SequenceFile

public class SortByTemperatureToMapFile extends Configured implements Tool

public int run(String[] args) throws Exception {

Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);

8. Write a note on the MapReduce Library Classes.

MapReduce Library Classes

9. Write a note on Map-Side Joins.

10. Explain about Input Splits and Records.

11. Discuss the Counter in MapReduce with proper example.

In a Hadoop job, we can be defined as no. of enums as per our requirement.

Implementation of Counter in MapReduce

Let’s first take a sample Input file below detail:

So, the output should be

12. Explain the different ways of sorting datasets in MapReduce

keys.That is sorting can be done on two or more field values.

want to use for sorting.

donation record to a (CompositeKey, DonationWritable) pair before shuffling and

All classes used in this section can be viewed on GitHub in this

in the OrderByCompositeKey.java (view) file from the same package.

Our query wants to sort the results on 3 values, so we create

a WritableComparable class called CompositeKey (view) with the following 3

key within a partition

 Sorting a list of student records by their grades.

 Sorting a list of employees by department (primary key) and then by their

13. Explain Java counters used in MapReduce

You might also like