exp5bda
exp5bda
Thoery :
MapReduce is a style of computing that has been implemented in several systems, including Google’s internal
implementation (simply called MapReduce) and the popular open-source implementation Hadoop which can be
obtained, along with the HDFS file system from the Apache Foundation. You can use an implementation of
MapReduce to manage many large- scale computations in a way that is tolerant of hardware faults. All you need to
write are two functions, called Map and Reduce, while the system manages the parallel execution, coordination of
tasks that execute Map or Reduce, and also deals with the possibility that one of these tasks will fail to execute. In
brief, a MapReduce computation executes as follows:
1. Some number of Map tasks each are given one or more chunks from a distributed file system. These Map
tasks turn the chunk into a sequence of key-value pairs. The way key- value pairs are produced from the input data is
determined by the code written by the user for the Map function.
2. The key-value pairs from each Map task are collected by a master controller and sorted by key. The keys are
divided among all the Reduce tasks, so all key-value pairs with the same key wind up at the same Reduce task.
3. The Reduce tasks work on one key at a time, and combine all the values associated with that key in some way.
The manner of combination of values is determined by the code written by the user for the Reduce function.
Matrix Multiplication
Suppose we have an n x n matrix M, whose element in row i and column j will be denoted by Mij. Suppose we also
have vector v of length n, whose jth element is Vj . Then the matrix vector product is the vector of length n, whose ith
element xi.
Let A and B be the two matrices to be multiplied and the result be matrix C. Matrix A has dimensions
● MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.
● Map stage − The map or mapper’s job is to process the input data. Generally the input data is in the form of
file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to the mapper function line
by line. The mapper processes the data and creates several small chunks of data.
● Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The Reducer’s job is
to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored
in the HDFS.
● During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers in the cluster.
● The framework manages all the details of data-passing such as issuing tasks, verifying task completion, and
copying data around the cluster between the nodes.
● Most of the computing takes place on nodes with data on local disks that reduces the network traffic.
● After completion of the given tasks, the cluster collects and reduces the data to form an appropriate result, and
sends it back to the Hadoop server.
● The framework manages all the details of data-passing such as issuing tasks, verifying task completion, and
copying data around the cluster between the nodes.
● Most of the computing takes place on nodes with data on local disks that reduces the network traffic.
● After completion of the given tasks, the cluster collects and reduces the data to form an appropriate result, and
sends it back to the Hadoop server.
Steps to follow:
Step 1: Create a folder in C:\ as ‘hadoop_project’ => C:\hadoop_project
Step 2 : Inside the folder right click -> new -> text document and create 3 java code (mapper
code,reducer code,driver code)
Step 3 : open command prompt (cmd) and navigate the project folder
step
4 : compile java files with Hadoop dependencies and create a jar file
Step 5 : prepare input data file as matrix_input.txt