0% found this document useful (0 votes)
24 views

Unit 3 tt1

The document discusses HDFS, Hive, HiveQL and HBase. It provides examples of creating tables in Hive with transactions and partitions, inserting and querying data, updating records and altering tables. It also compares different join types in Hive and provides HBase commands for creating, inserting and querying a table and checking its status.

Uploaded by

avm5439
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Unit 3 tt1

The document discusses HDFS, Hive, HiveQL and HBase. It provides examples of creating tables in Hive with transactions and partitions, inserting and querying data, updating records and altering tables. It also compares different join types in Hive and provides HBase commands for creating, inserting and querying a table and checking its status.

Uploaded by

avm5439
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Unit 3

HDFS, HIVE AND HIVEQL, HBASE

1) Consider the following database table: (10 Marks)


Id Name Branch Mobile
101 Darsheel Computer 9876502314
102 Ayush Mechanical 9932416578
103 Yash Computer 9810324576
1) Create the following table in Hive with transactional property = true and partitions.
2) Insert the following values in the table.
3) Display the count of students with respect to branch.
4) Update the name to “Yashvi” where ID =103.
5) Alter the table to add new column “CGPA” with float datatype.
i. Create the table in Hive with transactional property = true and partitions:
CREATE TABLE student ( Id INT,
Name STRING,
Branch STRING,
Mobile BIGINT ) PARTITIONED BY (year INT)
CLUSTERED BY (Branch) INTO 4 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

ii. Insert the values into the table:


INSERT INTO student PARTITION (year=2022) VALUES
(101, 'Darsheel', 'Computer', 9876502314),
(102, 'Ayush', 'Mechanical', 9932416578),
(103, 'Yash', 'Computer', 9810324576);

iii. Display the count of students with respect to branch:


SELECT Branch, COUNT(*) AS Student_Count FROM student GROUP BY Branch;

iv. Update the name to “Yashvi” where ID =103:


UPDATE student SET Name = 'Yashvi' WHERE Id = 103;

v. Alter the table to add a new column “CGPA” with float datatype:
ALTER TABLE student ADD COLUMN cgpa FLOAT;

These commands will create a table in Hive with transactional properties and partitions, insert values
into the table, display the count of students with respect to branch, update the name where ID = 103,
and alter the table to add a new column "CGPA" with float datatype

Explain different Joins in Hive with suitable example. (5 Marks)


• Join queries can be performed on two tables present in Hive.
• Joins are of 4 types, these are:
• Inner join: The Records common to both tables will be retrieved by this Inner Join.
• Left outer Join: Returns all the rows from the left table even though there are no matches in
the right table.
• Right Outer Join: Returns all the rows from the Right table even though there are no matches
in the left table.
• Full Outer Join: It combines records of both the tables based on the JOIN Condition given
in the query. It returns all the records from both tables and fills in NULL Values for the columns
missing values matched on either side.
Now, let's illustrate each join type with a suitable example:
Consider two tables:
Table A: Employee
+----+----------+-------------+
| Id | Name | Department |
+----+----------+-------------+
| 1 | Alice | HR |
| 2 | Bob | Engineering |
| 3 | Charlie | Sales |

Table B: Salary
+----+-------+
| Id | Salary|
+----+-------+
| 1 | 50000 |
| 2 | 60000 |
| 4 | 70000 |

Now, let's see how each join works:


1. INNER JOIN:
SELECT * FROM Employee e INNER JOIN Salary s ON e.Id = s.Id;
Output:
+----+----------+-------------+----+-------+
| Id | Name | Department | Id | Salary|
+----+----------+-------------+----+-------+
| 1 | Alice | HR | 1 | 50000 |
| 2 | Bob | Engineering | 2 | 60000 |
+----+----------+-------------+----+-------+

2. LEFT OUTER JOIN:


SELECT * FROM Employee e LEFT OUTER JOIN Salary s ON e.Id = s.Id;
Output:
+----+----------+-------------+----+-------+
| Id | Name | Department | Id | Salary|
+----+----------+-------------+----+-------+
| 1 | Alice | HR | 1 | 50000 |
| 2 | Bob | Engineering | 2 | 60000 |
| 3 | Charlie | Sales | NULL| NULL |
+----+----------+-------------+----+-------+
3. RIGHT OUTER JOIN:
SELECT * FROM Employee e RIGHT OUTER JOIN Salary s ON e.Id = s.Id;
Output:
+----+----------+-------------+----+-------+
| Id | Name | Department | Id | Salary|
+----+----------+-------------+----+-------+
| 1 | Alice | HR | 1 | 50000 |
| 2 | Bob | Engineering | 2 | 60000 |
| NULL| NULL | NULL | 4 | 70000 |
+----+----------+-------------+----+-------+

4. FULL OUTER JOIN:


sqlCopy code
SELECT * FROM Employee e FULL OUTER JOIN Salary s ON e.Id = s.Id;
Output:
+----+----------+-------------+----+-------+
| Id | Name | Department | Id | Salary|
+----+----------+-------------+----+-------+
| 1 | Alice | HR | 1 | 50000 |
| 2 | Bob | Engineering | 2 | 60000 |
| 3 | Charlie | Sales | NULL| NULL |
| NULL| NULL | NULL | 4 | 70000 |
+----+----------+-------------+----+-------+

Compare and contrast HBase with RDBMS. (5 Marks).


Write the HBase Commands for the following: (5 Marks)
a. Create a table with name Product.
b. Add column family ‘Shoe’ and ‘Tshirt’.
c. In column family Shoe, add the columns as below:
Brand = Nike, Price = $50, Size = 9, Description = For Men
d. In column family Tshirt, add the columns as below:
Brand = Polo, Price = $35, Size = XL, Description = Round Neck
e. Display the contents of table ‘Product’ for 10 row keys.
f. Check whether the table ‘Product’ is enabled or disabled.

a. Create a table with name Product:


create 'Product', 'Shoe', 'Tshirt'

b. Add column family ‘Shoe’ and ‘Tshirt’:


alter 'Product', {NAME => 'Shoe'}, {NAME => 'Tshirt'}

c. In column family Shoe, add the columns as below:


put 'Product', 'row_key', 'Shoe:Brand', 'Nike'
put 'Product', 'row_key', 'Shoe:Price', '$50'
put 'Product', 'row_key', 'Shoe:Size', '9'
put 'Product', 'row_key', 'Shoe:Description', 'For Men'

d. In column family Tshirt, add the columns as below:


put 'Product', 'row_key', 'Tshirt:Brand', 'Polo'
put 'Product', 'row_key', 'Tshirt:Price', '$35'
put 'Product', 'row_key', 'Tshirt:Size', 'XL'
put 'Product', 'row_key', 'Tshirt:Description', 'Round Neck'

e. Display the contents of table ‘Product’ for 10 row keys:


scan 'Product', {LIMIT => 10}

f. Check whether the table ‘Product’ is enabled or disabled:


is_enabled 'Product'

These are the HBase commands for creating a table, adding column families and columns, displaying
table contents, and checking the table's status. Adjust the row key and column values as needed for your
specific use case.

Write the HBase Commands for the following: (5 Marks)


1) Checking the status of the HBase Server with 3 different parameters.
2) Creating a table t1 with column families cf1, cf2, cf3.
3) Inserting value in the column c1 of cf1 as v1, column c2 of cf1 as v2.
4) Displaying the filters available.
5) Deleting the table t1 from HBase Server

Here are the HBase commands for the given tasks:


(i) Checking the status of the HBase Server with 3 different parameters:
status 'detailed', 'simple', 'summary'
(ii) Creating a table t1 with column families cf1, cf2, cf3:
create 't1', 'cf1', 'cf2', 'cf3'

(iii) Inserting value in the column c1 of cf1 as v1, column c2 of cf1 as v2:
put 't1', 'row_key', 'cf1:c1', 'v1'
put 't1', 'row_key', 'cf1:c2', 'v2'

(iv) Displaying the filters available:


list_filters

(v) Deleting the table t1 from HBase Server:


disable 't1' drop 't1'

These are the HBase commands for checking the status of the server, creating a table with column
families, inserting values into the table, displaying available filters, and deleting a table from the HBase
server. Adjust the table name, column families, row key, and column values as needed for your specific
use case.

Compare Apache Pig vs MapReduce. (5 Marks)

Explain the significance of Pig Grunt and Pig Latin. (5 Marks)


Pig Grunt
• Grunt shell is a shell command.
• The Grunt shell of the Apace pig is mainly used to write pig Latin scripts.
• Pig script can be executed with grunt shell which is a native shell provided by Apache pig to execute
pig queries.
• We can invoke shell commands using sh and fs.
• Syntax of sh command : grunt> sh ls
• Syntax of fs command : grunt> fs -ls
Pig Latin
• The Pig Latin is a data flow language used by Apache Pig to analyze the data in Hadoop.
• It is a textual language that abstracts the programming from the Java MapReduce idiom into a notation.
• The Pig Latin statements are used to process the data.
• It is an operator that accepts a relation as an input and generates another relation as an output.
• It can span multiple lines.
• Each statement must end with a semi-colon.
• It may include expression and schemas.
• By default, these statements are processed using multi-query execution.

Explain the working of Zookeeper. Also state the benefits of Zookeeper. (10 Marks)
• Apache ZooKeeper is a software project of Apache Software Foundation.
• It is an open-source technology that maintains configuration information and provides
synchronized as well as group services which are deployed on Hadoop cluster to administer
the infrastructure.
• The ZooKeeper framework was originally built at Yahoo! for easier accessing of applications
but, later on, ZooKeeper was used for organizing services used by distributed frameworks like
Hadoop, HBase, etc., and Apache ZooKeeper became a standard.
• It was designed to be a vigorous service that enabled application developers to focus mainly
on their application logic rather than coordination.
• ZooKeeper is a distributed coordination service that also helps to manage a large set of hosts.
• Managing and coordinating a service especially in a distributed environment is a complicated
process, so ZooKeeper solves this problem due to its simple architecture as well as API, that
allows developers to implement common coordination tasks like electing a master server,
managing group membership, and managing metadata.
• Apache ZooKeeper is used for maintaining centralized configuration information, naming,
providing distributed synchronization, and providing group services in a simple interface so
that we don’t have to write it from scratch.
• Apache Kafka also uses ZooKeeper to manage configuration.
• ZooKeeper allows developers to focus on the core application logic, and it implements various
protocols on the cluster so that the applications need not implement them on their own.Features
of Apache ZooKeeper
Apache ZooKeeper provides a wide range of good features to the user such as:
• Updating the Node’s Status: Apache ZooKeeper is capable of updating every node that
allows it to store updated information about each node across the cluster.
• Managing the Cluster: This technology can manage the cluster in such a way that the status
of each node is maintained in real time, leaving lesser chances for errors and ambiguity.
• Naming Service: ZooKeeper attaches a unique identification to every node which is quite
similar to the DNA that helps identify it.
• Automatic Failure Recovery: Apache ZooKeeper locks the data while modifying which
helps the cluster recover it automatically if a failure occurs in the database.Zookeeper
Working of Apache Zookeeper
• The first thing that happens as soon as the ensemble (a group of ZooKeeper servers) starts is,
it waits for the clients to connect to the servers.
• After that, the clients in the ZooKeeper ensemble will connect to one of the nodes. That node
can be any of a leader node or a follower node.
• Once the client is connected to a particular node, the node assigns a session ID to the client
and sends an acknowledgement to that particular client.
• If the client does not get any acknowledgement from the node, then it resends the message to
another node in the ZooKeeper ensemble and tries to connect with it.
• On receiving the acknowledgement, the client makes sure that the connection is not lost by
sending the heartbeats to the node at regular intervals.
• Finally, the client can perform functions like read, write, or store the data as per the need.
Benefits of Apache ZooKeeper
• Simplicity: Coordination is done with the help of a shared hierarchical
namespace.
• Reliability: The system keeps performing even if more than one node
fails.
• Order: It keeps track by stamping each update with a number
denoting its order.
• Speed: It runs with a ratio of 10:1 in the cases where ‘reads’ are more
common.
• Scalability: The performance can be enhanced by deploying more
machines.

You might also like