Big Data Manual
Big Data Manual
1. Downloading and installing Hadoop; Understanding different Hadoop modes. Startup scripts,
Configuration files.
2. Hadoop Implementation of file management tasks, such as Adding files and directories,
retrieving files and Deleting files
3. Implement of Matrix Multiplication with Hadoop Map Reduce
4. Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
5. Installation of Hive along with practice examples.
6. Installation of HBase, Installing thrift along with Practice examples
7. Practice importing and exporting data from various databases.
Software Requirements:
AIM:
To download and install Hadoop and understand different modes in Hadoop, startup
scripts and configuration files.
ALGORITHM:
https://www.oracle.com/java/technologies/downloads/#java8
After downloading and installing the Java, go to command prompt and check the Java version
https://dlcdn.apache.org/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz
After the download copy the zip file into the C drive and extract in here.
Step 3–1:
Click path into the User Variables and add the path (%HADOOP_HOME%\bin)
Step 3–2:
Click path into the User Variables and add the path (%JAVA_HOME%\bin)
Step 4: Config files under Hadoop directory
Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the core-site.xml document and click edit.
On notepad document you will be add this code between the and save it.
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the mapred-site.xml document and click
edit. On notepad document you will be add this code between the and save it:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the hdfs-site.xml document and click edit.
On notepad document you will be add this code between the and save it:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>hadoop-3.2.4/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>hadoop-3.2.4/data/datanode</value>
</property>
</configuration>
Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the yarn-site.xml document and click edit.
On notepad document you will be add this code between the and save it:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Step 7–1: Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the hadoop-env.cmd document
and click edit. On notepad document you will be chane this code structure and save it:
@rem The java implementation to use. Required.
set JAVA_HOME=C:\Progra~1\Java\jre1.8
after the configuration if we start to made with C:\Program Files\Java\jre1.8 for this
configuration:
hadoop -version
Step 8–1:Enter C:\hadoop-3.2.4\bin when inside the folder, click the path area and write cmd:
hdfs namenode -format
Step 8–2:Enter C:\hadoop-3.2.4\sbin when inside the folder, click the path area and write cmd
start-all.cmd
and will be see 4 result to ok for this step:
Hadoop Implementation of File Management Tasks, Such As Adding Files and Directories,
Retrieving Files And Deleting Files
AIM:
I) STARTING HDFS:
Format the configured HDFS file system and then open the namenode (HDFS server) and
execute the following HDFS command:
Start the distributed file system and follow the command listed below to start the namenode as
well as the data nodes in cluster.
start-dfs.sh
Below mentioned steps are followed to insert the required file in the Hadoop file system.
Step2: Use the Hadoop HDFS put Command transfer and store the data file from the local
systems to the HDFS using the following commands in the terminal.
For instance, if you have a file in HDFS called file. Then retrieve the required file from the
Hadoop file system by carrying out:
Shut down the HDFS files by following the below HDFS command
stop-dfs.sh
RESULT:
Thus the commands has been implemented successfully.
EXP:3
AIM:
To implement Matrix Multiplication with Hadoop Map Reduce.
PROGRAM:
import java.io.IOException;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
public class MatrixMapper extends Mapper<LongWritable, Text, Text, Text>
{
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException
{
Configuration conf = context.getConfiguration();
int m = Integer.parseInt(conf.get("m"));
int p = Integer.parseInt(conf.get("p"));
String line = value.toString();
String[] indicesAndValue = line.split(",");
Text outputKey = new Text();
Text outputValue = new Text();
if (indicesAndValue[0].equals("M"))
{
for (int k = 0; k < p; k++)
{
outputKey.set(indicesAndValue[1] + "," + k);
outputValue.set("M," + indicesAndValue[2] + "," + indicesAndValue[3]);
context.write(outputKey, outputValue);
}
}
else
{
for (int i = 0; i < m; i++)
{
outputKey.set(i + "," + indicesAndValue[2]);
outputValue.set("N," + indicesAndValue[1] + "," + indicesAndValue[3]);
context.write(outputKey, outputValue);
}
}
}
}
Step 2: Creating Reducer file for Matrix Multiplication.
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
public class MatrixReducer extends Reducer<Text, Text, Text, Text>
{
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException,
InterruptedException
{
String[] value;
HashMap<Integer, Float> hashA = new HashMap<Integer, Float>();
HashMap<Integer, Float> hashB = new HashMap<Integer, Float>();
for (Text val : values)
{
value = val.toString().split(",");
if (value[0].equals("M"))
{
hashA.put(Integer.parseInt(value[1]), Float.parseFloat(value[2]));
}
else
{
hashB.put(Integer.parseInt(value[1]), Float.parseFloat(value[2]));
}
}
int n = Integer.parseInt(context.getConfiguration().get("n")); float result = 0.0f;
float a_ij; float b_jk;
for (int j = 0; j < n; j++)
{
a_ij = hashA.containsKey(j) ? hashA.get(j) : 0.0f;
b_jk = hashB.containsKey(j) ? hashB.get(j) : 0.0f;
result += a_ij * b_jk;
}
if (result != 0.0f)
{
context.write(null, new Text(key.toString() + "," + Float.toString(result)));
}
}
}
Step 3: Creating Multiply file for Matrix Multiplication.
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
OUTPUT:
start-all.cmd
hadoop fs –mkdir /input_matrix
hadoop fs –put G:\Hadoop_Experiments\Matrix_M.txt /input_matrix
hadoop fs –ls /input_matrix
RESULT:
Thus the Matrix Multiplication with Hadoop Map Reduce was implemented successfully.
4. Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
5. Installation of Hive along with practice examples.
6. Installation of HBase, Installing thrift along with Practice examples
7. Practice importing and exporting data from various databases.