0% found this document useful (0 votes)
15 views

Bda Record

Uploaded by

Sulaksha BK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Bda Record

Uploaded by

Sulaksha BK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

EX NO: 1 DOWNLOADING AND INSTALLING HADOOP,


UNDERSTANDING DIFFERENT HADOOP MODES,
DATE:
STARTUP SCRIPTS,CONFIGURATION FILES
AIM: Installation of Hadoop Framework, it‘s components and understanding Hadoop modes ,
Startup scripts, configuration files.
OBJECTIVE:
To learn and understand about Hadoop Framework
SOFTWARE REQUIRED:
Apache Hadoop
DESCRIPTION:
Hadoop is an open-source framework that allows to store and process big data in a distributed
environment across clusters of computers using simple programming models. It is designed to scale
up from single servers to thousands of machines, each offering local computation and storage. The
Apache Hadoop framework includes following four modules:
HADOOP Common: Contains Java libraries and utilities needed by other Hadoop modules. These
libraries give file system and OS level abstraction and comprise of the essential Java files and scripts
that are required to start Hadoop.
Hadoop Distributed File System (HDFS): A distributed file-system that provides high- throughput
access to application data on the community machines thus providing very high aggregate bandwidth
across the cluster.
HADOOP Yarn: A resource-management framework responsible for job scheduling and cluster
resource management.
Hadoop MapReduce: This is a YARN- based programming model for parallel processing of large
data set.
PROCEDURE:
Hadoop software can be installed in three modes of operation:

• Stand Alone Mode: Hadoop is a distributed software and is designed to run on a


commodity of machines. However, we can install it on a single node in stand-alone
mode. In this mode, Hadoop software runs as a single monolithic java process. This
mode is extremely useful for debugging purpose. You can first test run your Map-

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

Reduce application in this mode on small data, before actually executing it on cluster
with big data.

• Pseudo Distributed Mode: In this mode also,Hadoop software is installed on a Single


Node. Various daemons of Hadoop will run on the same machine as separate java
processes. Hence all the daemons namely NameNode, DataNode,
SecondaryNameNode, JobTracker,TaskTracker run on single machine.

• Fully Distributed Mode: In Fully Distributed Mode, the daemons NameNode,


JobTracker, SecondaryNameNode (Optional and can be run on a separate node) run on
the Master Node.The daemons DataNode and TaskTracker runon the Slave Node.

Hadoop Installation: Ubuntu Operating System in stand-alonemode

STEPS:

1. Now, let us setup a new user account for Hadoop

installation. This step is optional, but recommended because it gives you flexibility to have a
separate account for Hadoop installation by separating this installation from other software
installation

• sudo adduser hadoop_dev ( Upon executing this command, you will prompted
to enter the new password for this user. Please enter the password and enter
other details. Don’t forget to save the details at the end)

• su - hadoop_dev ( Switches the user from current user to the new user created
i.e Hadoop_dev)

2. Download the latest Hadoop distribution.


• Visit this URL and choose one of the mirror sites. You can copy the download
link and also use “wget” to download it from command prompt:

We get http:// apache.mirrors.lucidnetworks.net/hadoop/

3. Untar the file :

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

common/hadoop-2.7.0/hadoop-2.7.0.tar.gz

tar xvzf hadoop-2.7.0.tar.gz

4. Rename the folder to hadoop2

mv hadoop-2.7.0 hadoop2

5. Edit configuration file /home/hadoop_dev/hadoop2/etc/hadoop/hadoop-env.sh


and setJAVA_HOME in that file.

vim /home/hadoop_dev/hadoop2/etc/hadoop/

• hadoop-env.sh
• uncomment JAVA_HOME and update it followingline:

export JAVA_HOME=/usr/lib/jvm/java-8- oracle

( Please check for your relevant java installation and set this value accordingly. Latestversions
of Hadoop require > JDK1.7)

6. Let us verify if the installation is successful or not

( change to home directory cd /home/ hadoop_dev/hadoop2/):


• bin/hadoop( running this command shouldprompt you with various options)

7. This finishes the Hadoop setup in stand-alonemode.

8. Let us run a sample hadoop programs that isprovided to you in the download package:

$ mkdir input (create the input directory)

$ cp etc/hadoop/*.xml input ( copy over all the xml files to input folder)

$ bin/hadoop jar share/hadoop/mapreduce/ hadoop-mapreduce-examples-2.7.0.jar


grepinput output 'dfs[a-z.]+'

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

(grep/find all the files matching the pattern ‘dfs[a-z.]+’ and copy those files to
output directory)

$ cat output/* (look for the output in the output directory that Hadoop creates for
you).

Hadoop Installation: PsuedoDistributed Mode( Locally )

Steps for Installation:


1. Edit the file /home/Hadoop_dev/hadoop2/etc/hadoop/core-site.xml as below:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Note: This change sets the namenode ip and port.

2. Edit the file /home/Hadoop_dev/hadoop2/etc/hadoop/hdfs-site.xml as below:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Note: This change sets the default replicationcount for blocks used by HDFS.

3. We need to setup password less login so that the master will be able to do a password-
less ssh to start the daemons on all the slaves.

Check if ssh server is running on your host or not:

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

a. ssh localhost ( enter your password and if you are able to login then ssh server is
running)

b. In step a. if you are unable to login, then installssh as follows:

sudo apt-get install ssh

c. Setup password less login as below:

i. ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

ii. cat ~/.ssh/id_dsa.pub >> ~/.ssh/

We can run Hadoop jobs locally or on YARN in this mode. In this Post, we
will focus on authorized_keys

4. running thejobs locally.


5. Format the file system. When we format namenode it formats the meta-data related to
data-nodes. By doing that, all the information on the datanodes are lost and they can be reused
for newdata:

a. bin/hdfs namenode –format

6. Start the daemons

a. sbin/start-dfs.sh (Starts NameNode andDataNode)

You can check If NameNode has started successfully or not by using the following web
interface: http://0.0.0.0:50070 .

If you are unable to see this, try to check the logs in the /home/ hadoop_dev/hadoop2/logs
folder.

7. You can check whether the daemons are runningor not by issuing Jps command.

8. This finishes the installation of Hadoop in pseudodistributed mode.

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

9. Let us run the same example we can in theprevious blog post:

i) Create a new directory on the hdfs

bin/hdfs dfs -mkdir –p /user/hadoop_dev

Copy the input files for the program to hdfs:

bin/hdfs dfs -put etc/hadoop input

Run the program:

bin/hadoop jar share/hadoop/mapreduce/ hadoop-mapreduce-examples-2.6.0.jar grep

input output 'dfs[a-z.]+'

ii) View the output on hdfs:

bin/hdfs dfs -cat output/*

10. Stop the daemons when you are done executing the jobs, with the below command:

sbin/stop-dfs.sh

Hadoop Installation – PsuedoDistributed Mode( YARN )

Steps for Installation

1. Edit the file /home/hadoop_dev/hadoop2/etc/hadoop/mapred-site.xml as below:

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

2. Edit the fie /home/hadoop_dev/hadoop2/etc/hadoop/yarn-site.xml as below:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Note: This particular configuration tells

MapReduce how to do its shuffle. In this case ituses the mapreduce_shuffle.

3. Format the NameNode:

bin/hdfs namenode –format

4. Start the daemons using the command:

sbin/start-yarn.sh

This starts the daemons ResourceManager andNodeManager.

Once this command is run, you can check if ResourceManager is running or not by visiting
the following URL on browser : http://0.0.0.0:8088 . If you are unable to see this, check for
the logs in thedirectory: /home/hadoop_dev/hadoop2/logs

5. To check whether the services are running, issue a jps command. The following shows all
the services necessary to run YARN on a single server:

$ jps
15933 Jps
15567 ResourceManager
15785 NodeManager

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

6. Let us run the same example as we ran before:

i) Create a new directory on the hdfs

bin/hdfs dfs -mkdir –p /user/hadoop_dev

Copy the input files for the program to hdfs:

bin/hdfs dfs -put etc/hadoop input

ii) Run the program:

bin/yarn jar share/hadoop/mapreduce/ hadoop-mapreduce-examples-2.6.0.jar grep

input output 'dfs[a-z.]+'

iii) View the output on hdfs:

bin/hdfs dfs -cat output/*

7. Stop the daemons when you are done executingthe jobs, with the below command:

sbin/stop-yarn.sh

RESULT: The Hadoop software had been installed and its different modes had performed
successfully.

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

EX NO: 2 HADOOP IMPLEMENTATION OF FILE MANAGEMENT


TASKS SUCH AS ADDING FILES NAD DIRECTORIES,
DATE:
RETRIEVING FILES AND DELETING FILES

AIM : Implementing File Management System using Cassandra .


OBJECTIVE : To learn and understand file management tasks .

SOFTWARE REQUIRED : Java(version- 1.8), Python (version 2.7), Cassandra (v 3.11.15).

PROCEDURE :

The different Keyspace Operations in Cassandra include -

• Creating a Keyspace
• Altering a Keyspace
• Dropping a Keyspace

1. Cassandra Create Keyspace

• In Cassandra, a namespace that specifies data replication on nodes is known as a Keyspace.


Each node ina cluster has its own Keyspace.

• Below is the syntax of Creating a Keyspace in Cassandra.

Syntax of Cassandra Create keyspace :

CREATE KEYSPACE <identifier> WITH <properties of keyspace>

Complete Syntax With Properties:

CREATE KEYSPACE Examplespace

WITH replication = {'class': ‘Strategy name’, 'replication_factor' : 2};

Two Properties of Cassandra Create Keyspace

There are two main properties of Cassandra:

• durable_writes

• Replication
1. Replication

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

The Replica Placement approach and desired number of replicas are specified in the

Replication option.The replica placement strategies are all listed below:

• Old Network Topology Strategy - This approach for legacy replication.


• Network Topology Strategy - You can independently configure the replication factor for each
data centerusing this option.
• Simple Strategy - Assigns the cluster a basic replication factor.
Example
cqlsh.> CREATE KEYSPACE Examplespace

WITH replication = {'class':'SimpleStrategy',

'replication_factor' : 2};Here, a KeySpace called simplilearn is

being created.

• We are employing Network Topology Strategy, the second replica placement method.

• we have set the replication factor to 2 replicas.

Verification
Using the command Describe, you may determine whether or not the table has been
created. Thiscommand will display all created keyspaces as seen below if you use it
over keyspaces.

cqlsh> DESCRIBE

simplilearn system system_traces

2. durable_writes

A table's durable_writes property can be changed from its default value of true to false. This
propertycannot be set to simplex strategy.

Example
Cqlsh>-- Create a keyspace (similar to a database in relational databases)

CREATE KEYSPACE example_keyspace WITH replication = {'class': 'SimpleStrategy',


'replication_factor': 1};

... AND DURABLE_WRITES = false;

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

CASSANDRA QUERY:

Create a keyspace (similar to a database in relational databases)

Cqlsh>CREATE KEYSPACE example_keyspace WITH replication = {'class':


'SimpleStrategy','replication_factor': 1};

-- Use the keyspace

USE example

keyspace

-- Create a table
CREATE

TABLE users

user_id UUID

PRIMARY KEY,

first_name TEXT,

last_name TEXT,

email TEXT

);
-- Insert data into the table

INSERT INTO users (user_id, first_name, last_name,

email)VALUES (uuid(), 'John', 'Doe',

'[email protected]');

-- Insert more data

INSERT INTO users (user_id, first_name, last_name,

email)VALUES (uuid(), 'Jane', 'Smith',

'[email protected]');

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

-- Query all users

Select * from user

OUTPUT:

CREATE:

RESULT : Thus the implementation of file management system using Cassandra


has been executedsuccessfully .

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

EXP : 3 IMPLEMENT OF MATRIX MULTIPLICATION

WITH HADOOP MAP REDUCE


DATE:

AIM :
To implement matrix multiplication with hadoop Mapreduce in cloudera platform.

OBJECTIVE :
To learn and understand the concept of matrix multiplication with hadoop
map reduce.

SOFTWARE REQUIRED :
Java, hadoops .

ALGORITHM:

1. Mapper:

• Read input lines from the text file.


• Split each line into matrix name, row index, column index, and value.
• If the matrix name is "A," emit key-value pairs with the key as (row, column) and
value as(matrixA, column, value).
• If the matrix name is "B," emit key-value pairs with the key as (row, column) and
value as(matrixB, row, value).
2. Shuffle and Sort:

• Shuffle and sort the key-value pairs so that values with the same key (i.e., the same
(row,column)) are grouped together.
3. Reducer:

• For each key (row, column):


• Gather the values for matrix A and matrix B.
• Multiply the corresponding values from matrix A and matrix B to calculate the result.
• Sum all the products to get the final result for the (row, column) element.
• Emit the key as (row, column) and the result as the value.
3. Output:

• The output will be a set of key-value pairs, where the key is (row, column) and the value
is theresult of matrix multiplication.

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

PROGRAM:

import java.io.IOException;
import
java.util.StringTokenizer;
import
org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapreduce.Job;
import
org.apache.hadoop.mapreduce.Mapper;
import
org.apache.hadoop.mapreduce.Reducer;
import
org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class
MatrixMultiplication {
public static class
MatrixMapper
extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context
context)throws IOException, InterruptedException {
// Split the input line into parts
String[] parts = value.toString().split(",");
String matrixName = parts[0];
int row =
Integer.parseInt(parts[1]);int
col =
Integer.parseInt(parts[2]);
int value1 =
Integer.parseInt(parts[3]);if
(matrixName.equals("A")) {
for (int k = 0; k < 3; k++) {
// Emit intermediate key-value pairs (key: row-col, value: A-row-value)
context.write(new Text(row + "-" + k), new Text(matrixName + "-" + col + "-" +
value1));
}
} else if
(matrixName.equals("B")) {
for (int i = 0; i < 3; i++) {

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

// Emit intermediate key-value pairs (key: row-col, value: B-col-value)


context.write(new Text(i + "-" + col), new Text(matrixName + "-" + row + "-" +
value1));
}
}
}
}
public static class MatrixReducer
extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context
context)throws IOException, InterruptedException {
int result = 0;
int[] aValues = new
int[3];
int[] bValues = new
int[3]; for (Text value
: values) {
String[]parts=value.toString().split() ;
String matrixName = parts[0];
int index =
Integer.parseInt(parts[1]);
int val=Integer.parseInt(parts[2]);
if
(matrixName.equals("
A"))
{aValues[index] = val;
} else if
(matrixName.equals("B")) {
bValues[index] = val;
}
}
for (int i = 0; i < 3; i++) {
// Calculate the dot product
result += aValues[i] * bValues[i];
}
context.write(key, new Text(Integer.toString(result)));
}
}

public static void main(String[] args) throws


Exception {Configuration conf = new
Configuration();
Job job = Job.getInstance(conf,
"MatrixMultiplication");
job.setJarByClass(MatrixMultiplication.class);
job.setMapperClass(MatrixMapper.class);

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

job.setReducerClass(MatrixReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new
Path(args[0]));

OUTPUT:

RESULT : Thus the implementation of matrix multiplication with hadoop Mapreduce had
been executedsuccessfully.

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

EX NO: 4
IMPLEMENT A WORD COUNT MAP REDUCE PROGRAM TO
DATE: UNDERSTAND REDUCE MAP PARADIGM

AIM:
To run a basic Word Count Map program to understand Reduce Map Paradigm
OBJECTIVE:
To perform a distributed word count on a given input dataset using Hadoop MapReduce,
where the input consists of lines of text, and the output is a list of words with their respective
SOFTWARE REQUIRED:
Java, Cloudera
ALGORITHM:

Step 1: Create a Java Project and declare a package within it.


Step 2: Create a class in the package under the name WordCount.
Step 3: Insert WordCount code in the class file.
Step4: export the code to a newly created jar file .
Step5: Now open the terminal and create a text file and add words to it.
Step6: Open the hdfs dfs file and create a folder and add your text file to it.
Step7: run the wordcount program with the text file connected to it as a input

’’hadoop jar home\cloudera\MapReduceWordCount.jar WordCount /input_dir /output_dir’’

PROGRAM:

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

public class WordCount {


public static class WordCountMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException InterruptedException
{
StringTokenizer tokenizer = new StringTokenizer(value.toString());
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
} public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum); context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountReducer.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

FileInputFormat.addInputPath(job, new Path(args[0]));


FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

OUTPUT:

The output of the program will be a list of words and their frequencies, where each line
represents a word-frequency pair.
For example:
apple 5

RESULT:

The program to implement Word Map Program is successfully executed.

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

EX NO: 5
IMPLEMENATATION OF HIVE WITH PRACTICE
DATE:
EXAMPLES

AIM:

To install Cloudera Virtualbox and implement the Hive shell commands in the terminal

OBJECTIVE:

To understand the usage of Hive and implement commands.

SOFTWARE REQUIRED:

• Need to install Cloudera – vmware workstation.


• Link to download for windows –https://www.cloudera.com/downloads/cdh.html
PROCEDURE:
Steps to Open Cloudera after Installation
Step 1: On your desktop VMware workstation is available. Open that.
Step 2: Now you will get an interface. Click on open a virtual device.
Step 3: Select path – In this step, you have to select the path and file where you have downloaded the
file.
Step 4: Now your virtual environment is creating.
Step 5: You can view your virtual machine details in this path.
Step 6: Now open the terminal to get started with hive commands.
Step 7: Now type “hive” in the terminal. It will give access to work with hive commands in the
terminal.

DATABASE OPERATIONS IN HIVE:


1. Create a database
Syntax:
create database database_name;
2. Creating a table
Syntax:
CREATE TABLE tablename(column_name datatype);
3. Display Database
Syntax:
show databases;
4. Describe Database
Syntax:

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

describe database database_name;


5. Insert the data to the table:
Syntax:
INSERT INTO TABLE <table_name> VALUES (<add values as per column entity>);
6. Display all the data of the table:
Syntax:
SELECT * FROM <table_name>;
7. Renaming Table Name
ALTER TABLE with RENAME is used to change the name of an already existing table in the hive.
Syntax:
ALTER TABLE <current_table_name> RENAME TO <new_table_name>;
8. ADD Columns
Syntax:
ALTER TABLE <table_name> ADD COLUMNS (<col-name> <data-type> COMMENT ”, <col-
name> <data-type> COMMENT ”, ….. )
9. CHANGE Column
CHANGE in ALTER TABLE is used to change the name or data type of an existing column or
attribute.
Syntax:
ALTER TABLE <table_name> CHANGE <column_name> <new_column_name>
<new_data_type>;
10. DROP TABLE command
DROP TABLE command in the hive is used to drop a table inside the hive. Hive will remove all of
its data and metadata from the hive meta-store. The hive DROP TABLE statement comes with a
PURGE option. In case if the PURGE option is mentioned the data will be completely lost and
cannot be recovered later but if not mentioned then data will move to .Trash/current directory.
Syntax:
DROP TABLE IF EXISTS data PURGE;

OUTPUT:

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

RESULT:

The program to implement basic commands in Hive is successfully executed.

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

EX NO: 6
INSTALLATION OF HBASE AND THRIFT ALONG WITH
DATE: PRACTICE EXAMPLES

AIM:

To install HBase and Thrift along with practice examples

OBJECTIVE:

To implement commands in HBase and Thrift

SOFTWARE REQUIRED:

HBase, Thrift

ALGORTIHM:

Step 1: Prerequisites
Step 2: Download and Extract HBase
Step 3: Configure HBase
Step 4: Start HBase
Step 5: Verify HBase Installation
Step 6: Install Thrift
Step 7: Practice Examples

IMPLEMENTATION:

1. Prerequisites:

• Java Development Kit (JDK): Ensure that you have Java installed on your system.
HBase requires Java to run.
• Apache Hadoop: HBase relies on Hadoop for distributed file storage. Install and
configure Hadoop before installing HBase.

2. Download and Extract HBase:

• Visit the Apache HBase website (https://hbase.apache.org/) and navigate to the


downloads page.
• Download the latest stable release of HBase, such as "hbase-x.x.x-bin.tar.gz".
• Extract the downloaded archive to a directory of your choice. For example:

$ tar -xvf hbase-x.y.z-bin.tar.gz


3. Configure HBase:

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

• Navigate to the HBase installation directory:


$ cd hbase-x.y.z
• Open the conf/hbase-site.xml file and configure the necessary properties. For example,
you might need to set the Hadoop configuration and ZooKeeper quorum:

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
</configuration>

4. Start HBase:

• Start HBase by running the following command from the HBase installation directory:
$ ./bin/start-hbase.sh

5. Verify HBase Installation:

• Access the HBase shell by running the following command:


$./bin/hbase shell
• If the shell opens without any errors, it indicates a successful installation.

6. Install Thrift:

• Install the necessary dependencies for Thrift. The package names might differ based on
your operating system.
For example, on Ubuntu:

$ sudo apt-get install libboost-dev libevent-dev automake libtool flex bison pkg-config

• Download the latest version of Thrift from the Apache Thrift website
(https://thrift.apache.org/).
• Extract the downloaded archive and navigate to the Thrift source directory.
• Run the following commands to configure and install Thrift:

$ ./bootstrap.sh
$. /configure

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

$ make
$ sudo make install

7. Practice Examples:

$ ./bin/hbase shell
create 'my_table', 'cf'

RESULT:

Thus the Installation of HBase, Installing Thrift Along with Practice Examples Executed
Successfully

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

EXP : 7 PRACTICING IMPORTING AND EXPORTING DATA FROM

VARIOUS DATABASES
DATE :

AIM:

To practice importing and exporting data from various databases

OBJECTIVE:

To understand importing and exporting of data from databases

SOFTWARE REQUIRED:

Hadoop, HBase

IMPLEMENTATION:

Step 1: Set up Hadoop

• Install and configure Apache Hadoop on your Ubuntu machine. Refer to the Hadoop
documentation for detailed instructions.

Step 2: Set up HBase

• Install and configure Apache HBase on your Ubuntu machine. Refer to the HBase
documentation for detailed instructions.

Step 3: Prepare data for import/export

• Ensure that your data is in a format compatible with Hadoop and HBase. Common formats
include CSV, TSV, or Avro.

Step 4: Import data into Hadoop

• Use Hadoop's file system commands (hadoop fs) to import your data into Hadoop's
distributed file system (HDFS).
For example:
$hadoop fs -put /path/to/local/file.csv /user/hadoop/input/

Step 5: Create an HBase table:


• In the HBase shell, create a table with the appropriate column families to match your data.

Page No.:
MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

#363, Arcot Road, Kodambakkam, Chennai – 600024, Tamil Nadu, India

Department: Computer Science & Engineering Register No.:311521104059

For example:
$create 'my_table', 'cf1', 'cf2'

Step 6: Import data from Hadoop to HBase

• Use the importtsv tool to import data from Hadoop to HBase. This tool is bundled with HBase
and allows importing TSV or CSV data.

For example:
$hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=<separator> -
Dimporttsv.columns=<column_mapping> <table_name> <path_to_input>

• Replace <separator> with the separator used in your data (e.g., , for CSV),
<column_mapping> with the mapping of columns to HBase column families and qualifiers,
<table_name> with the name of your HBase table, and <path_to_input> with the Hadoop path
where the data is located.

Step 7: Export data from HBase to Hadoop

• Use the export command in HBase shell to export data from an HBase table to Hadoop.

For example:
$export '<table_name>', '/user/hadoop/output/'

Step 8: Retrieve exported data from Hadoop

• Use Hadoop's file system commands (hadoop fs) to retrieve the exported data from Hadoop.

For example:
$hadoop fs -get /user/hadoop/output/part-* /path/to/local/output/

These are general steps to import and export data from various databases using Hadoop and HBase.
The specific commands and configurations might vary depending on your database and data formats.
Make sure to refer to the documentation of the specific tools and databases you're using for more
detailed instructions.

RESULT:Thus importing and exporting data from various databases are executed succesfully

Page No.:

You might also like