Bda Record
Bda Record
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
Reduce application in this mode on small data, before actually executing it on cluster
with big data.
STEPS:
installation. This step is optional, but recommended because it gives you flexibility to have a
separate account for Hadoop installation by separating this installation from other software
installation
• sudo adduser hadoop_dev ( Upon executing this command, you will prompted
to enter the new password for this user. Please enter the password and enter
other details. Don’t forget to save the details at the end)
• su - hadoop_dev ( Switches the user from current user to the new user created
i.e Hadoop_dev)
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
common/hadoop-2.7.0/hadoop-2.7.0.tar.gz
mv hadoop-2.7.0 hadoop2
vim /home/hadoop_dev/hadoop2/etc/hadoop/
• hadoop-env.sh
• uncomment JAVA_HOME and update it followingline:
( Please check for your relevant java installation and set this value accordingly. Latestversions
of Hadoop require > JDK1.7)
8. Let us run a sample hadoop programs that isprovided to you in the download package:
$ cp etc/hadoop/*.xml input ( copy over all the xml files to input folder)
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
(grep/find all the files matching the pattern ‘dfs[a-z.]+’ and copy those files to
output directory)
$ cat output/* (look for the output in the output directory that Hadoop creates for
you).
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Note: This change sets the namenode ip and port.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Note: This change sets the default replicationcount for blocks used by HDFS.
3. We need to setup password less login so that the master will be able to do a password-
less ssh to start the daemons on all the slaves.
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
a. ssh localhost ( enter your password and if you are able to login then ssh server is
running)
We can run Hadoop jobs locally or on YARN in this mode. In this Post, we
will focus on authorized_keys
You can check If NameNode has started successfully or not by using the following web
interface: http://0.0.0.0:50070 .
If you are unable to see this, try to check the logs in the /home/ hadoop_dev/hadoop2/logs
folder.
7. You can check whether the daemons are runningor not by issuing Jps command.
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
10. Stop the daemons when you are done executing the jobs, with the below command:
sbin/stop-dfs.sh
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
sbin/start-yarn.sh
Once this command is run, you can check if ResourceManager is running or not by visiting
the following URL on browser : http://0.0.0.0:8088 . If you are unable to see this, check for
the logs in thedirectory: /home/hadoop_dev/hadoop2/logs
5. To check whether the services are running, issue a jps command. The following shows all
the services necessary to run YARN on a single server:
$ jps
15933 Jps
15567 ResourceManager
15785 NodeManager
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
7. Stop the daemons when you are done executingthe jobs, with the below command:
sbin/stop-yarn.sh
RESULT: The Hadoop software had been installed and its different modes had performed
successfully.
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
PROCEDURE :
• Creating a Keyspace
• Altering a Keyspace
• Dropping a Keyspace
• durable_writes
• Replication
1. Replication
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
The Replica Placement approach and desired number of replicas are specified in the
being created.
• We are employing Network Topology Strategy, the second replica placement method.
Verification
Using the command Describe, you may determine whether or not the table has been
created. Thiscommand will display all created keyspaces as seen below if you use it
over keyspaces.
cqlsh> DESCRIBE
2. durable_writes
A table's durable_writes property can be changed from its default value of true to false. This
propertycannot be set to simplex strategy.
Example
Cqlsh>-- Create a keyspace (similar to a database in relational databases)
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
CASSANDRA QUERY:
USE example
keyspace
-- Create a table
CREATE
TABLE users
user_id UUID
PRIMARY KEY,
first_name TEXT,
last_name TEXT,
email TEXT
);
-- Insert data into the table
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
OUTPUT:
CREATE:
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
AIM :
To implement matrix multiplication with hadoop Mapreduce in cloudera platform.
OBJECTIVE :
To learn and understand the concept of matrix multiplication with hadoop
map reduce.
SOFTWARE REQUIRED :
Java, hadoops .
ALGORITHM:
1. Mapper:
• Shuffle and sort the key-value pairs so that values with the same key (i.e., the same
(row,column)) are grouped together.
3. Reducer:
• The output will be a set of key-value pairs, where the key is (row, column) and the value
is theresult of matrix multiplication.
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
PROGRAM:
import java.io.IOException;
import
java.util.StringTokenizer;
import
org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapreduce.Job;
import
org.apache.hadoop.mapreduce.Mapper;
import
org.apache.hadoop.mapreduce.Reducer;
import
org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class
MatrixMultiplication {
public static class
MatrixMapper
extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context
context)throws IOException, InterruptedException {
// Split the input line into parts
String[] parts = value.toString().split(",");
String matrixName = parts[0];
int row =
Integer.parseInt(parts[1]);int
col =
Integer.parseInt(parts[2]);
int value1 =
Integer.parseInt(parts[3]);if
(matrixName.equals("A")) {
for (int k = 0; k < 3; k++) {
// Emit intermediate key-value pairs (key: row-col, value: A-row-value)
context.write(new Text(row + "-" + k), new Text(matrixName + "-" + col + "-" +
value1));
}
} else if
(matrixName.equals("B")) {
for (int i = 0; i < 3; i++) {
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
job.setReducerClass(MatrixReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new
Path(args[0]));
OUTPUT:
RESULT : Thus the implementation of matrix multiplication with hadoop Mapreduce had
been executedsuccessfully.
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
EX NO: 4
IMPLEMENT A WORD COUNT MAP REDUCE PROGRAM TO
DATE: UNDERSTAND REDUCE MAP PARADIGM
AIM:
To run a basic Word Count Map program to understand Reduce Map Paradigm
OBJECTIVE:
To perform a distributed word count on a given input dataset using Hadoop MapReduce,
where the input consists of lines of text, and the output is a list of words with their respective
SOFTWARE REQUIRED:
Java, Cloudera
ALGORITHM:
PROGRAM:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
OUTPUT:
The output of the program will be a list of words and their frequencies, where each line
represents a word-frequency pair.
For example:
apple 5
RESULT:
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
EX NO: 5
IMPLEMENATATION OF HIVE WITH PRACTICE
DATE:
EXAMPLES
AIM:
To install Cloudera Virtualbox and implement the Hive shell commands in the terminal
OBJECTIVE:
SOFTWARE REQUIRED:
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
OUTPUT:
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
RESULT:
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
EX NO: 6
INSTALLATION OF HBASE AND THRIFT ALONG WITH
DATE: PRACTICE EXAMPLES
AIM:
OBJECTIVE:
SOFTWARE REQUIRED:
HBase, Thrift
ALGORTIHM:
Step 1: Prerequisites
Step 2: Download and Extract HBase
Step 3: Configure HBase
Step 4: Start HBase
Step 5: Verify HBase Installation
Step 6: Install Thrift
Step 7: Practice Examples
IMPLEMENTATION:
1. Prerequisites:
• Java Development Kit (JDK): Ensure that you have Java installed on your system.
HBase requires Java to run.
• Apache Hadoop: HBase relies on Hadoop for distributed file storage. Install and
configure Hadoop before installing HBase.
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
</configuration>
4. Start HBase:
• Start HBase by running the following command from the HBase installation directory:
$ ./bin/start-hbase.sh
6. Install Thrift:
• Install the necessary dependencies for Thrift. The package names might differ based on
your operating system.
For example, on Ubuntu:
$ sudo apt-get install libboost-dev libevent-dev automake libtool flex bison pkg-config
• Download the latest version of Thrift from the Apache Thrift website
(https://thrift.apache.org/).
• Extract the downloaded archive and navigate to the Thrift source directory.
• Run the following commands to configure and install Thrift:
$ ./bootstrap.sh
$. /configure
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
$ make
$ sudo make install
7. Practice Examples:
$ ./bin/hbase shell
create 'my_table', 'cf'
RESULT:
Thus the Installation of HBase, Installing Thrift Along with Practice Examples Executed
Successfully
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
VARIOUS DATABASES
DATE :
AIM:
OBJECTIVE:
SOFTWARE REQUIRED:
Hadoop, HBase
IMPLEMENTATION:
• Install and configure Apache Hadoop on your Ubuntu machine. Refer to the Hadoop
documentation for detailed instructions.
• Install and configure Apache HBase on your Ubuntu machine. Refer to the HBase
documentation for detailed instructions.
• Ensure that your data is in a format compatible with Hadoop and HBase. Common formats
include CSV, TSV, or Avro.
• Use Hadoop's file system commands (hadoop fs) to import your data into Hadoop's
distributed file system (HDFS).
For example:
$hadoop fs -put /path/to/local/file.csv /user/hadoop/input/
Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE
For example:
$create 'my_table', 'cf1', 'cf2'
• Use the importtsv tool to import data from Hadoop to HBase. This tool is bundled with HBase
and allows importing TSV or CSV data.
For example:
$hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=<separator> -
Dimporttsv.columns=<column_mapping> <table_name> <path_to_input>
• Replace <separator> with the separator used in your data (e.g., , for CSV),
<column_mapping> with the mapping of columns to HBase column families and qualifiers,
<table_name> with the name of your HBase table, and <path_to_input> with the Hadoop path
where the data is located.
• Use the export command in HBase shell to export data from an HBase table to Hadoop.
For example:
$export '<table_name>', '/user/hadoop/output/'
• Use Hadoop's file system commands (hadoop fs) to retrieve the exported data from Hadoop.
For example:
$hadoop fs -get /user/hadoop/output/part-* /path/to/local/output/
These are general steps to import and export data from various databases using Hadoop and HBase.
The specific commands and configurations might vary depending on your database and data formats.
Make sure to refer to the documentation of the specific tools and databases you're using for more
detailed instructions.
RESULT:Thus importing and exporting data from various databases are executed succesfully
Page No.: