0% found this document useful (0 votes)
4 views

ccpractical 7

Uploaded by

Akshay Rathod
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

ccpractical 7

Uploaded by

Akshay Rathod
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Cloud Computing Lab

Practical-7
Aim: Demostrate the use of map and reduce tasks.
Theory:
MapReduce
A MapReduce is a data processing tool which is used to process the data parallelly in a
distributed form. It was developed in 2004, on the basis of paper titled as "MapReduce:
Simplified Data Processing on Large Clusters," published by Google. The MapReduce is a
paradigm which has two phases, the mapper phase, and the reducer phase. In the Mapper, the
input is given in the form of a key-value pair. The output of the Mapper is fed to the reducer
as input. The reducer runs only after the Mapper is over. The reducer too takes input in key-
value format, and the output of reducer is the final output.

Steps in Map Reduce:


The map takes data in the form of pairs and returns a list of pairs. The keys will not be
unique in this case. Using the output of Map, sort and shuffle are applied by the Hadoop
architecture. This sort and shuffle acts on these lists of pairs and sends out unique keys and a
list of values associated with this unique key . An output of sort and shuffle sent to the
reducer phase. The reducer performs a defined function on a list of values for unique keys,
and Final output will be stored/displayed.

Step 1 : Install java jdk 8


First of all you must install Java JDK 8 on your system. You can just type this command to
install java jdk on your system.
sudo apt install openjdk-8-jdk

Matoshri Pratishthan’s Group Of Institutions,Nanded


Cloud Computing Lab

sudo apt-get install openjdk-8-jdk -y


Verify Java installation:
bash
java -version

Matoshri Pratishthan’s Group Of Institutions,Nanded


Cloud Computing Lab

Step 3 : Create a Dedicated Hadoop User

Bash sudo useradd hadoop


sudo passwd hadoop

Matoshri Pratishthan’s Group Of Institutions,Nanded


Cloud Computing Lab

Now add this configuration in core-site.xml file.


Step 4: Add this reading package sudonano .bashrc
sudoapt.get install ssh

Now add this configuration in core-site.xml file.

Step 5: Download the latest Hadoop version from the Apache Hadoop

Matoshri Pratishthan’s Group Of Institutions,Nanded


Cloud Computing Lab

Step 6: Add this file in hdfs-site.xml


core-site.xml

Matoshri Pratishthan’s Group Of Institutions,Nanded


Cloud Computing Lab

mapred-site.xml
yarn-site.xml

Step 7 : Map reduce

Matoshri Pratishthan’s Group Of Institutions,Nanded


Cloud Computing Lab

 It can be used for distributed pattern-based searching.


 We can also use MapReduce in Machin learning.

Matoshri Pratishthan’s Group Of Institutions,Nanded


Cloud Computing Lab

Step 8: Output of the mapreduce

Step 9: Open mpgi

Matoshri Pratishthan’s Group Of Institutions,Nanded


Cloud Computing Lab

Step 10:open wrodcountTutorial

Step 11: open input and output

Step 12: open for the input file for input.text

Matoshri Pratishthan’s Group Of Institutions,Nanded


Cloud Computing Lab

Step 13: open for the output file open part-r-00000

Matoshri Pratishthan’s Group Of Institutions,Nanded


Cloud Computing Lab

Conclusion :
Thus, we have successfully demonstrated the use of map and reduce
tasks.

Matoshri Pratishthan’s Group Of Institutions,Nanded

You might also like