0% found this document useful (0 votes)

24 views

Unit 3 tt1

The document discusses HDFS, Hive, HiveQL and HBase. It provides examples of creating tables in Hive with transactions and partitions, inserting and querying data, updating records and altering tables. It also compares different join types in Hive and provides HBase commands for creating, inserting and querying a table and checking its status.

Uploaded by

avm5439

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Unit 3 tt1

Uploaded by

avm5439

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Unit 3

HDFS, HIVE AND HIVEQL, HBASE

1) Consider the following database table: (10 Marks)

Id Name Branch Mobile
101 Darsheel Computer 9876502314
102 Ayush Mechanical 9932416578
103 Yash Computer 9810324576
1) Create the following table in Hive with transactional property = true and partitions.
2) Insert the following values in the table.
3) Display the count of students with respect to branch.
4) Update the name to “Yashvi” where ID =103.
5) Alter the table to add new column “CGPA” with float datatype.
i. Create the table in Hive with transactional property = true and partitions:
CREATE TABLE student ( Id INT,
Name STRING,
Branch STRING,
Mobile BIGINT ) PARTITIONED BY (year INT)
CLUSTERED BY (Branch) INTO 4 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

ii. Insert the values into the table:

INSERT INTO student PARTITION (year=2022) VALUES
(101, 'Darsheel', 'Computer', 9876502314),
(102, 'Ayush', 'Mechanical', 9932416578),
(103, 'Yash', 'Computer', 9810324576);

iii. Display the count of students with respect to branch:

SELECT Branch, COUNT(*) AS Student_Count FROM student GROUP BY Branch;

iv. Update the name to “Yashvi” where ID =103:

UPDATE student SET Name = 'Yashvi' WHERE Id = 103;

v. Alter the table to add a new column “CGPA” with float datatype:
ALTER TABLE student ADD COLUMN cgpa FLOAT;

These commands will create a table in Hive with transactional properties and partitions, insert values
into the table, display the count of students with respect to branch, update the name where ID = 103,
and alter the table to add a new column "CGPA" with float datatype

Explain different Joins in Hive with suitable example. (5 Marks)

• Join queries can be performed on two tables present in Hive.
• Joins are of 4 types, these are:
• Inner join: The Records common to both tables will be retrieved by this Inner Join.
• Left outer Join: Returns all the rows from the left table even though there are no matches in
the right table.
• Right Outer Join: Returns all the rows from the Right table even though there are no matches
in the left table.
• Full Outer Join: It combines records of both the tables based on the JOIN Condition given
in the query. It returns all the records from both tables and fills in NULL Values for the columns
missing values matched on either side.
Now, let's illustrate each join type with a suitable example:
Consider two tables:
Table A: Employee
+----+----------+-------------+
| Id | Name | Department |
+----+----------+-------------+
| 1 | Alice | HR |
| 2 | Bob | Engineering |
| 3 | Charlie | Sales |

Table B: Salary
+----+-------+
| Id | Salary|
+----+-------+
| 1 | 50000 |
| 2 | 60000 |
| 4 | 70000 |

Now, let's see how each join works:

1. INNER JOIN:
SELECT * FROM Employee e INNER JOIN Salary s ON e.Id = s.Id;
Output:
+----+----------+-------------+----+-------+
| Id | Name | Department | Id | Salary|
+----+----------+-------------+----+-------+
| 1 | Alice | HR | 1 | 50000 |
| 2 | Bob | Engineering | 2 | 60000 |
+----+----------+-------------+----+-------+

2. LEFT OUTER JOIN:

SELECT * FROM Employee e LEFT OUTER JOIN Salary s ON e.Id = s.Id;
Output:
+----+----------+-------------+----+-------+
| Id | Name | Department | Id | Salary|
+----+----------+-------------+----+-------+
| 1 | Alice | HR | 1 | 50000 |
| 2 | Bob | Engineering | 2 | 60000 |
| 3 | Charlie | Sales | NULL| NULL |
+----+----------+-------------+----+-------+
3. RIGHT OUTER JOIN:
SELECT * FROM Employee e RIGHT OUTER JOIN Salary s ON e.Id = s.Id;
Output:
+----+----------+-------------+----+-------+
| Id | Name | Department | Id | Salary|
+----+----------+-------------+----+-------+
| 1 | Alice | HR | 1 | 50000 |
| 2 | Bob | Engineering | 2 | 60000 |
| NULL| NULL | NULL | 4 | 70000 |
+----+----------+-------------+----+-------+

4. FULL OUTER JOIN:

sqlCopy code
SELECT * FROM Employee e FULL OUTER JOIN Salary s ON e.Id = s.Id;
Output:
+----+----------+-------------+----+-------+
| Id | Name | Department | Id | Salary|
+----+----------+-------------+----+-------+
| 1 | Alice | HR | 1 | 50000 |
| 2 | Bob | Engineering | 2 | 60000 |
| 3 | Charlie | Sales | NULL| NULL |
| NULL| NULL | NULL | 4 | 70000 |
+----+----------+-------------+----+-------+

Compare and contrast HBase with RDBMS. (5 Marks).

Write the HBase Commands for the following: (5 Marks)
a. Create a table with name Product.
b. Add column family ‘Shoe’ and ‘Tshirt’.
c. In column family Shoe, add the columns as below:
Brand = Nike, Price = $50, Size = 9, Description = For Men
d. In column family Tshirt, add the columns as below:
Brand = Polo, Price = $35, Size = XL, Description = Round Neck
e. Display the contents of table ‘Product’ for 10 row keys.
f. Check whether the table ‘Product’ is enabled or disabled.

a. Create a table with name Product:

create 'Product', 'Shoe', 'Tshirt'

b. Add column family ‘Shoe’ and ‘Tshirt’:

alter 'Product', {NAME => 'Shoe'}, {NAME => 'Tshirt'}

c. In column family Shoe, add the columns as below:

put 'Product', 'row_key', 'Shoe:Brand', 'Nike'
put 'Product', 'row_key', 'Shoe:Price', '$50'
put 'Product', 'row_key', 'Shoe:Size', '9'
put 'Product', 'row_key', 'Shoe:Description', 'For Men'

d. In column family Tshirt, add the columns as below:

put 'Product', 'row_key', 'Tshirt:Brand', 'Polo'
put 'Product', 'row_key', 'Tshirt:Price', '$35'
put 'Product', 'row_key', 'Tshirt:Size', 'XL'
put 'Product', 'row_key', 'Tshirt:Description', 'Round Neck'

e. Display the contents of table ‘Product’ for 10 row keys:

scan 'Product', {LIMIT => 10}

f. Check whether the table ‘Product’ is enabled or disabled:

is_enabled 'Product'

These are the HBase commands for creating a table, adding column families and columns, displaying
table contents, and checking the table's status. Adjust the row key and column values as needed for your
specific use case.

Write the HBase Commands for the following: (5 Marks)

1) Checking the status of the HBase Server with 3 different parameters.
2) Creating a table t1 with column families cf1, cf2, cf3.
3) Inserting value in the column c1 of cf1 as v1, column c2 of cf1 as v2.
4) Displaying the filters available.
5) Deleting the table t1 from HBase Server

Here are the HBase commands for the given tasks:

(i) Checking the status of the HBase Server with 3 different parameters:
status 'detailed', 'simple', 'summary'
(ii) Creating a table t1 with column families cf1, cf2, cf3:
create 't1', 'cf1', 'cf2', 'cf3'

(iii) Inserting value in the column c1 of cf1 as v1, column c2 of cf1 as v2:
put 't1', 'row_key', 'cf1:c1', 'v1'
put 't1', 'row_key', 'cf1:c2', 'v2'

(iv) Displaying the filters available:

list_filters

(v) Deleting the table t1 from HBase Server:

disable 't1' drop 't1'

These are the HBase commands for checking the status of the server, creating a table with column
families, inserting values into the table, displaying available filters, and deleting a table from the HBase
server. Adjust the table name, column families, row key, and column values as needed for your specific
use case.

Compare Apache Pig vs MapReduce. (5 Marks)

Explain the significance of Pig Grunt and Pig Latin. (5 Marks)

Pig Grunt
• Grunt shell is a shell command.
• The Grunt shell of the Apace pig is mainly used to write pig Latin scripts.
• Pig script can be executed with grunt shell which is a native shell provided by Apache pig to execute
pig queries.
• We can invoke shell commands using sh and fs.
• Syntax of sh command : grunt> sh ls
• Syntax of fs command : grunt> fs -ls
Pig Latin
• The Pig Latin is a data flow language used by Apache Pig to analyze the data in Hadoop.
• It is a textual language that abstracts the programming from the Java MapReduce idiom into a notation.
• The Pig Latin statements are used to process the data.
• It is an operator that accepts a relation as an input and generates another relation as an output.
• It can span multiple lines.
• Each statement must end with a semi-colon.
• It may include expression and schemas.
• By default, these statements are processed using multi-query execution.

Explain the working of Zookeeper. Also state the benefits of Zookeeper. (10 Marks)
• Apache ZooKeeper is a software project of Apache Software Foundation.
• It is an open-source technology that maintains configuration information and provides
synchronized as well as group services which are deployed on Hadoop cluster to administer
the infrastructure.
• The ZooKeeper framework was originally built at Yahoo! for easier accessing of applications
but, later on, ZooKeeper was used for organizing services used by distributed frameworks like
Hadoop, HBase, etc., and Apache ZooKeeper became a standard.
• It was designed to be a vigorous service that enabled application developers to focus mainly
on their application logic rather than coordination.
• ZooKeeper is a distributed coordination service that also helps to manage a large set of hosts.
• Managing and coordinating a service especially in a distributed environment is a complicated
process, so ZooKeeper solves this problem due to its simple architecture as well as API, that
allows developers to implement common coordination tasks like electing a master server,
managing group membership, and managing metadata.
• Apache ZooKeeper is used for maintaining centralized configuration information, naming,
providing distributed synchronization, and providing group services in a simple interface so
that we don’t have to write it from scratch.
• Apache Kafka also uses ZooKeeper to manage configuration.
• ZooKeeper allows developers to focus on the core application logic, and it implements various
protocols on the cluster so that the applications need not implement them on their own.Features
of Apache ZooKeeper
Apache ZooKeeper provides a wide range of good features to the user such as:
• Updating the Node’s Status: Apache ZooKeeper is capable of updating every node that
allows it to store updated information about each node across the cluster.
• Managing the Cluster: This technology can manage the cluster in such a way that the status
of each node is maintained in real time, leaving lesser chances for errors and ambiguity.
• Naming Service: ZooKeeper attaches a unique identification to every node which is quite
similar to the DNA that helps identify it.
• Automatic Failure Recovery: Apache ZooKeeper locks the data while modifying which
helps the cluster recover it automatically if a failure occurs in the database.Zookeeper
Working of Apache Zookeeper
• The first thing that happens as soon as the ensemble (a group of ZooKeeper servers) starts is,
it waits for the clients to connect to the servers.
• After that, the clients in the ZooKeeper ensemble will connect to one of the nodes. That node
can be any of a leader node or a follower node.
• Once the client is connected to a particular node, the node assigns a session ID to the client
and sends an acknowledgement to that particular client.
• If the client does not get any acknowledgement from the node, then it resends the message to
another node in the ZooKeeper ensemble and tries to connect with it.
• On receiving the acknowledgement, the client makes sure that the connection is not lost by
sending the heartbeats to the node at regular intervals.
• Finally, the client can perform functions like read, write, or store the data as per the need.
Benefits of Apache ZooKeeper
• Simplicity: Coordination is done with the help of a shared hierarchical
namespace.
• Reliability: The system keeps performing even if more than one node
fails.
• Order: It keeps track by stamping each update with a number
denoting its order.
• Speed: It runs with a ratio of 10:1 in the cases where ‘reads’ are more
common.
• Scalability: The performance can be enhanced by deploying more
machines.

Tutorial MapR Administration
No ratings yet
Tutorial MapR Administration
236 pages
AOA
100% (1)
AOA
219 pages
5000 Programming Books.
57% (7)
5000 Programming Books.
160 pages
SQL
No ratings yet
SQL
57 pages
SQL 2
No ratings yet
SQL 2
15 pages
Learn_Advanced_Sql (1)
No ratings yet
Learn_Advanced_Sql (1)
48 pages
Data Engineering Questionnaire
No ratings yet
Data Engineering Questionnaire
143 pages
HIVE AND PIG
No ratings yet
HIVE AND PIG
57 pages
Shibasish Chatterjee (2153203) Big Data SME Hands-On
No ratings yet
Shibasish Chatterjee (2153203) Big Data SME Hands-On
85 pages
TCS Data Analyst Interview Questions
No ratings yet
TCS Data Analyst Interview Questions
8 pages
SQL Database Cheat Sheet-1
No ratings yet
SQL Database Cheat Sheet-1
8 pages
SQL INFO
No ratings yet
SQL INFO
12 pages
SQL_Cheat_Sheet_with_JOIN_and_Integrity_Constraints
No ratings yet
SQL_Cheat_Sheet_with_JOIN_and_Integrity_Constraints
6 pages
DSCI 5350 - Lecture 5 PDF
No ratings yet
DSCI 5350 - Lecture 5 PDF
64 pages
DBMS Pactical File SS
No ratings yet
DBMS Pactical File SS
21 pages
9_SQL NOTES
No ratings yet
9_SQL NOTES
9 pages
SQL Lecture Notes Compilation
No ratings yet
SQL Lecture Notes Compilation
6 pages
Tutorialspoint HBase Pig
No ratings yet
Tutorialspoint HBase Pig
23 pages
DBMS - Reference for Placement
No ratings yet
DBMS - Reference for Placement
7 pages
Database Labreport
No ratings yet
Database Labreport
22 pages
ADBTLab Manua115-16
No ratings yet
ADBTLab Manua115-16
45 pages
1 DBMS Practical Data
No ratings yet
1 DBMS Practical Data
10 pages
ASSESSEMENTS SQL - Batch 10
No ratings yet
ASSESSEMENTS SQL - Batch 10
14 pages
Chapter 3
No ratings yet
Chapter 3
14 pages
DBML 3
No ratings yet
DBML 3
6 pages
Mysql Final
No ratings yet
Mysql Final
14 pages
SQL Tutorial for Beginners
No ratings yet
SQL Tutorial for Beginners
10 pages
1 - // How To Connect Mysql Server To Command Prompt
No ratings yet
1 - // How To Connect Mysql Server To Command Prompt
15 pages
Exp. No. 2
No ratings yet
Exp. No. 2
13 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram
28 pages
My SQL Cheat Sheet
No ratings yet
My SQL Cheat Sheet
9 pages
DMSS prac(1)
No ratings yet
DMSS prac(1)
24 pages
ANSWER
No ratings yet
ANSWER
3 pages
Gajanan
No ratings yet
Gajanan
23 pages
DBMS Queries Overview
No ratings yet
DBMS Queries Overview
98 pages
SQL-Query
No ratings yet
SQL-Query
14 pages
Mysql Notes
No ratings yet
Mysql Notes
17 pages
Advanced SQL Cheat Sheet 1736497122
No ratings yet
Advanced SQL Cheat Sheet 1736497122
8 pages
Database
No ratings yet
Database
7 pages
DBMS_Lab_Program-2
No ratings yet
DBMS_Lab_Program-2
19 pages
DBMS lab 6 Tasks
No ratings yet
DBMS lab 6 Tasks
2 pages
Quick SQL Cheatsheet: SELECT: Used To Select Data From A Database
No ratings yet
Quick SQL Cheatsheet: SELECT: Used To Select Data From A Database
8 pages
DBMS Po
No ratings yet
DBMS Po
75 pages
DBMS Lab 06
No ratings yet
DBMS Lab 06
12 pages
DMS UNIT 3
No ratings yet
DMS UNIT 3
20 pages
Chapter 2 - SQL Basics and Query Optimization
No ratings yet
Chapter 2 - SQL Basics and Query Optimization
23 pages
SQL QUERY
No ratings yet
SQL QUERY
7 pages
Sql_Interview_Questions_Top_100
No ratings yet
Sql_Interview_Questions_Top_100
18 pages
DBMSLAB
No ratings yet
DBMSLAB
46 pages
Mysql Questions
No ratings yet
Mysql Questions
7 pages
Lab 05
No ratings yet
Lab 05
10 pages
Doc1 2
No ratings yet
Doc1 2
29 pages
Oe Printout
No ratings yet
Oe Printout
8 pages
Database Management System (DBMS) Lab Report
100% (1)
Database Management System (DBMS) Lab Report
48 pages
rp4 10
No ratings yet
rp4 10
19 pages
DBMS-Database Management Lab Manual
No ratings yet
DBMS-Database Management Lab Manual
39 pages
Complete SQL
No ratings yet
Complete SQL
16 pages
SQL
No ratings yet
SQL
7 pages
University of Technology Sydney SQL-Exam-Notes 2019 Database Fundamental
No ratings yet
University of Technology Sydney SQL-Exam-Notes 2019 Database Fundamental
2 pages
dbms
No ratings yet
dbms
9 pages
DBMS Module 2
100% (1)
DBMS Module 2
24 pages
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Data Visualization with Excel Dashboards and Reports
From Everand
Data Visualization with Excel Dashboards and Reports
Dick Kusleika
4/5 (1)
Sinhagad Exp Chair Car (CC) : Electronic Reserva On Slip (ERS)
No ratings yet
Sinhagad Exp Chair Car (CC) : Electronic Reserva On Slip (ERS)
2 pages
Shri Saibaba Sansthan Trust, Shirdi (Offical Booking Portal)
No ratings yet
Shri Saibaba Sansthan Trust, Shirdi (Offical Booking Portal)
1 page
Ticket 1
No ratings yet
Ticket 1
3 pages
Tutorial 7
No ratings yet
Tutorial 7
1 page
Adobe Scan Oct 10, 20geef23
No ratings yet
Adobe Scan Oct 10, 20geef23
14 pages
Vardhaman College of Engineering: Project Planning and Management
No ratings yet
Vardhaman College of Engineering: Project Planning and Management
62 pages
unit-5-notes-data-analytics-kit-601[1]
No ratings yet
unit-5-notes-data-analytics-kit-601[1]
44 pages
A Study On Cloud Computing Services IJERTCONV4IS34014
No ratings yet
A Study On Cloud Computing Services IJERTCONV4IS34014
6 pages
Spark2x: Big Data Huawei Course
No ratings yet
Spark2x: Big Data Huawei Course
25 pages
Final 7th Sem Syllabus
No ratings yet
Final 7th Sem Syllabus
39 pages
Storage in Cloud
No ratings yet
Storage in Cloud
51 pages
Data Engineers
No ratings yet
Data Engineers
21 pages
Telecommunications Sector
No ratings yet
Telecommunications Sector
16 pages
MapReduce Book Final
No ratings yet
MapReduce Book Final
175 pages
Tapan Banker, Tapan Nayan Banker Cloud Architect, Enterprise
No ratings yet
Tapan Banker, Tapan Nayan Banker Cloud Architect, Enterprise
2 pages
CCS335 SET3
No ratings yet
CCS335 SET3
2 pages
Big Data and The Courts
No ratings yet
Big Data and The Courts
8 pages
Big Data Analytics
No ratings yet
Big Data Analytics
6 pages
BigData Hadoop - Interview Questions and Answers - Multiple Choice - Objective
67% (3)
BigData Hadoop - Interview Questions and Answers - Multiple Choice - Objective
2 pages
Unit-3 FBDA
No ratings yet
Unit-3 FBDA
34 pages
CS8081-Internet of Things
No ratings yet
CS8081-Internet of Things
11 pages
Unit-4 Hadoop Distributed File System (HDFS) : Syllabus
No ratings yet
Unit-4 Hadoop Distributed File System (HDFS) : Syllabus
17 pages
Slide 2 GFS and Hadoop
No ratings yet
Slide 2 GFS and Hadoop
95 pages
gfs vs hfs
No ratings yet
gfs vs hfs
2 pages
Impala
No ratings yet
Impala
11 pages
100 Data Science Interview Questions and Answers (General)
100% (1)
100 Data Science Interview Questions and Answers (General)
11 pages
Heterogeneous Log File Analyzer System Using Hadoop Mapreduce Framework
No ratings yet
Heterogeneous Log File Analyzer System Using Hadoop Mapreduce Framework
4 pages
GitHub Mikeroyal Digital Forensics Guide Digital Forensics Guide
No ratings yet
GitHub Mikeroyal Digital Forensics Guide Digital Forensics Guide
30 pages
Hadoop Installation
No ratings yet
Hadoop Installation
7 pages
Big Data Analytics in Cybersecurity First Edition Deng pdf download
100% (2)
Big Data Analytics in Cybersecurity First Edition Deng pdf download
59 pages
Specialised Programme On Big Data Analytics
No ratings yet
Specialised Programme On Big Data Analytics
3 pages
Krishna Data Scientist +1 (713) - 478-5282
No ratings yet
Krishna Data Scientist +1 (713) - 478-5282
5 pages
Reddit Resume
No ratings yet
Reddit Resume
1 page

Unit 3 tt1

Uploaded by

Unit 3 tt1

Uploaded by

Unit 3

HDFS, HIVE AND HIVEQL, HBASE

1) Consider the following database table: (10 Marks)

ii. Insert the values into the table:

iii. Display the count of students with respect to branch:

iv. Update the name to “Yashvi” where ID =103:

Explain different Joins in Hive with suitable example. (5 Marks)

Now, let's see how each join works:

2. LEFT OUTER JOIN:

4. FULL OUTER JOIN:

Compare and contrast HBase with RDBMS. (5 Marks).

a. Create a table with name Product:

b. Add column family ‘Shoe’ and ‘Tshirt’:

c. In column family Shoe, add the columns as below:

d. In column family Tshirt, add the columns as below:

e. Display the contents of table ‘Product’ for 10 row keys:

f. Check whether the table ‘Product’ is enabled or disabled:

Write the HBase Commands for the following: (5 Marks)

Here are the HBase commands for the given tasks:

(iv) Displaying the filters available:

(v) Deleting the table t1 from HBase Server:

Compare Apache Pig vs MapReduce. (5 Marks)

Explain the significance of Pig Grunt and Pig Latin. (5 Marks)

You might also like