0% found this document useful (0 votes)
2 views

Hive-Part-2

Uploaded by

Vipul Khandke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Hive-Part-2

Uploaded by

Vipul Khandke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

HIVE

PART-2
HIVE
• Hive is a data warehouse system - used to analyse
structured data.
• Built on the top of Hadoop.
• Developed by Facebook.
• Functionality of reading, writing, and managing
large datasets residing in distributed storage.
• Runs SQL like queries called HQL (Hive query
language) which gets internally converted to
MapReduce jobs.
• Using Hive, - skip writing complex MapReduce
programs.
• Hive supports Data Definition Language (DDL),
HIVE Architecture
Hive - MetaStore

• Hive MetaStore - It is a central repository that


stores all the structure information of various
tables and partitions in the warehouse. It also
includes metadata of column and its type
information used to read and write data and the
corresponding HDFS files where the data is
stored.
A high-level comparison of SQL and
HiveQL

5
Apache Hive Installation
• Java Installation - $ java -version
• Hadoop Installation - $hadoop version
• Download the Apache Hive tar file.
• http://mirrors.estointernet.in/apache/hive/hive-1.2.2/
• Unzip the downloaded tar file.
• tar -xvf apache-hive-1.2.2-bin.tar.gz
• Open the bashrc file.  $ sudo nano ~/.bashrc
• Provide the following HIVE_HOME path.
• export HIVE_HOME=/home/user/local/apache-hive-1.2.2-
bin
• export PATH=$PATH:/home/user/local/apache-hive-1.2.2-
bin/bin
• Update the environment variable.  $ source ~/.bashrc
• Let's start the hive  $ hive
Hive
• Data Types
• DDL Commands
• DML Operations
• Data Retrieval Queries
Hive Data Types
• Basic datatypes
• Numbers
• Date / Time
• Strings
• Complex datatypes
HIVE DATA TYPES
Integer Types
Type Size Range

TINYINT 1-byte signed -128 to 127


integer
SMALLINT 2-byte signed 32,768 to 32,767
integer
INT 4-byte signed 2,147,483,648 to 2,147,483,647
integer

Decimal
BIGINT Types8-byte signed -9,223,372,036,854,775,808 to
integer 9,223,372,036,854,775,807
Type Size Range

FLOAT 4-byte Single precision floating point number

DOUBLE 8-byte Double precision floating point number


HIVE DATA TYPES …
Date/Time Types
• TIMESTAMP
• supports UNIX timestamp with optional
nanosecond precision.
• "YYYY-MM-DD HH:MM:SS.fffffffff" (9 decimal place
precision)
• DATES
• The Date value is used to specify a particular
year, month and day, in the form YYYY--MM--DD.
• However, it didn't provide the time of the day. The range
of Date type lies between 0000--01--01 to 9999--12--31
HIVE DATA TYPES…
String Types
• Varchar
• The varchar is a variable length type whose
range lies between 1 and 65535, which
specifies that the maximum number of
characters allowed in the character string.
• CHAR
• The char is a fixed-length type whose
maximum length is fixed at 255.
HIVE DATA TYPES…
Complex Type
Type Size Range

Struct It is similar to C struct or an struct('James','Roy')


object where fields are accessed
using the "dot" notation.
Map It contains the key-value tuples map('first','James','last'
where the fields are accessed ,'Roy')
using array notation.
Array It is a collection of similar type of array('James','Roy')
values that indexable using zero-
based integers.
Hive DDL

Hive Data Definition Language (DDL)


DDL
• Used to describe data and data structures of a
database
• Like SQL DDL, Hive DDL is used for managing,
creating, altering and dropping databases,
tables and other objects in a database.

13
Hive DDL

Hive Data Definition Language (DDL)


DDL Commands in Hive
• Create
• Alter
• Drop
• Show
• Truncate
• Describe

14
Hive - Create Database
hive
> show databases;

hive> create database


demo;
Hive - Drop Database
hive
> show databases;

hive> drop database d


emo;

hive
> show databases;
Hive - Alter Database
• add database properties or modify the
properties
ALTER Database Command 1
Syntax:
DATABASE or SCHEMA is the same thing we can use any name.
SCHEMA in ALTER is added in hive 0.14.0 and later.

ALTER (DATABASE|SCHEMA) <database_name> SET DBPROPERTIES


('<property_name>'='<property_value>',..);

Step 1: Create a database with the name student


hive> CREATE DATABASE student;
Hive - Alter Database
Syntax:
ALTER Database Command 1
hive>
ALTER (DATABASE|SCHEMA) <database_name> SET DBPROPERTIES
('<property_name>'='<property_value>',..);

Step 2: Use ALTER to add properties to the database

hive> ALTER DATABASE student SET DBPROPERTIES ( 'owner' = ‘IIITK-Batch2020' , ' Date'
= '2023-09-27');

Step 3: Describe the database to see the effect


hive> DESCRIBE DATABASE EXTENDED student;

Step 4: Let’s change the existing property to see the effect. In our
example, we are changing the owner from ‘IIITK-Batch2020’ to ‘IIITK-
Batch2020-Set1’

hive> ALTER DATABASE student SET DBPROPERTIES ( 'owner' = 'IIITK-Batch2020-Set1' ,


'Date' = '2023-09-27');
Hive - Alter Database
ALTER Database Command 2

• With the help of the below command, we can change the


database directory on HDFS.

• The LOCATION with ALTER is only available in Hive 2.2.1, 2.4.0,


and later. One thing we should keep in mind that changing the
database location does not transfer data to the newly specified
location.

• It only changes the parent-directory location and the newly


added data will be added to this new HDFS location.
Syntax:

ALTER (DATABASE|SCHEMA) <database_name> SET LOCATION


'Path_on_HDFS';
Hive - Alter Database
ALTER Database Command 2
Syntax:

ALTER (DATABASE|SCHEMA) <database_name> SET LOCATION


'Path_on_HDFS';
Step 1: Describe the database student to see its parent-directory.
By default, hive stores its data at /user/hive/warehouse on
HDFS.

hive> DESCRIBE DATABASE EXTENDED student;


Step 2: Use ALTER to change the parent-directory location

(NOTE: /hive_db is the available directory on my HDFS ).

hive> ALTER DATABASE student SET LOCATION


Step 3: Describe the database student to see the location is overridden
'hdfs://localhost:9000/hive_db';
or not.

hive> DESCRIBE DATABASE EXTENDED student;


Hive - Alter Database
ALTER Database Command 2
Commands with Output

We have successfully changed the location of the student database.


Now whatever tables you will add to this database will be made in /hive_db.
Hive - Alter Database
ALTER Database Command 3
• The below command is used to set or change the user name and its ROLE.
• SET OWNER transfer the current user ownership to a new user or a new role.
• By default, the user who makes the database is set as the owner of that
database.
Syntax:

ALTER DATABASE <database_name> SET OWNER [USER|ROLE] user_name or


role_name;
Hive - Alter Database
ALTER Database Command 3
Syntax:

ALTER DATABASE <database_name> SET OWNER [USER|ROLE] user_name or


role_name;

Step 1: Change the user name associated with the student database.

hive> DESCRIBE DATABASE EXTENDED student; # we have used it to see the


current user info

hive> ALTER DATABASE student SET OWNER USER Ram; # with this we have
changed the db owner from dikshant to Ram
Hive - Alter Database
ALTER Database Command 3
Step 1: Change the user name associated with the student database.

hive> DESCRIBE DATABASE EXTENDED student; # we have used it to see the current
user info

hive> ALTER DATABASE student SET OWNER USER Ram; # with this we have changed
the db owner from dikshant to Ram
Hive - Alter Database
ALTER Database Command 3
Syntax:

ALTER DATABASE <database_name> SET OWNER [USER|ROLE] user_name or


role_name;
Step 2: Now, change the role of ram to admin.

hive> ALTER DATABASE student SET OWNER ROLE admin;


Hive DDL – Table (Now)
• Create
• Alter
• Drop
• Show
• Truncate
• Describe
Hive DDL – Table - Create
Hive provides two types of table:-
Internal table
External table

Step 1: Create a database first so that we can create tables inside it.
hive> CREATE DATABASE database_name;
hive> SHOW DATABASES;

Step 2: Now, to have access to this database we have to use it.


hive> USE database-name;
Step 3: Now, start creating a table under this database-name
Hive DDL – Table - Create
Internal Table
• Internal tables are also called managed tables

• Lifecycle of their data is controlled by the Hive

• By default, these tables are stored in a subdirectory under the


directory defined by hive.metastore.warehouse.dir (i.e.
/user/hive/warehouse)

• The internal tables are not flexible enough to share with other
tools like Pig.

• If we try to drop the internal table, Hive deletes both table


schema and data.
Hive DDL – Table - Create
Create an internal table
hive> create table demo.employee (Id int, Name string , Salary float)
row format delimited
fields terminated by ‘,' ;

Metadata of the created table


hive> describe demo.employee;
Hive DDL – Table - Create
Creating a table using the Existing Schema
• Hive allows creating a new table by using the
schema of an existing table.

hive> create table if not exists demo.copy_employee like


demo.employee;

Here, we can say


that the new table is
a copy of an existing
table with same
schema.
Hive DDL – Table - Create
External Table
• external table allows us to create and access a table and a data
externally

• Two keywords
external keyword - used to specify the external table
location keyword - used to determine the location of loaded data

• As the table is external, the data is not present in the Hive


directory.

• Therefore, if we try to drop the table, the metadata of the table


will be deleted, but the data still exists.

• In case Internal table, if we try to drop the internal table, Hive


deletes both table schema and data.
Hive DDL – Table - Create
External Table
To create an external table, follow the below steps: -

Step 1: Let's create a directory on HDFS by using the following


command: -
> hdfs dfs -mkdir /HiveDirectory

Step 2: Now, store the file on the created directory.


> hdfs dfs -put hive/emp_details /HiveDirectory

Step 3: Let's create an external table using the following command: -

hive> create external table emplist (Id int, Name string ,


Salary float)
row format delimited
fields terminated by ','
location '/HiveDirectory';
Hive DDL – Table - Create
External Table
Step 3: Let's create an external table using the following command: -

hive> create external table emplist (Id int, Name string , Salary float)
row format delimited
fields terminated by ','
location '/HiveDirectory';
Hive - Load data into
Table

• Once the internal table has been created,


the next step is to load the data into it.
• So, in Hive, we can easily load data from
any file to the database.
• Load the data of the file into the
database
>load data local inpath '/home/codegyani/hive/
emp_details' into table demo.employee;

34
Hive - Load data into
Table

Loading data from local file system


>load data local inpath '/home/codegyani/hive/
emp_details' into table demo.employee;

> select * from demo.employee;

35
Hive - Load data into
Table
• If we want to add more data into the current database,
execute the same query again by just updating the
new file name.

>load data local inpath '/home/codegyani/hive/emp_details1' into table demo.em


ployee;

> select * from demo.employee;

36
Hive – Load data into
Table
Load unmatched data
• One or more column data doesn't match the data type of
specified table columns), it will not throw any exception.
• However, it stores the Null value at the position of
unmatched tuple.
• add one more file to the current table. This file contains the
unmatched data.

• Third column contains the data of string type, and the table allows the float type data. So,
this condition arises in an unmatched data situation.

Now, load the data into the table.


> load data local inpath '/home/codegyani/hive/emp_details2' into table demo.employee;

37
Hive – Load data into
Table
Load unmatched data

• Third column contains the data of string type, and the table allows the float type data. So
this condition arises in an unmatched data situation.

Now, load the data into the table.


> load data local inpath '/home/codegyani/hive/emp_details2' into table demo.employee;

38
Hive – Load data into
Table
Load unmatched data

>select * from demo.employee

39
Hive - Alter Table
• In Hive, we can perform modifications
in the existing table like changing the
table name, column name, comments,
and table properties.
• It provides SQL like commands to alter
the table.a Table
Rename
 Adding column
 Change Column
 Delete or Replace Column

40
Hive - Alter Table
 Rename a Table
change the name of an existing table

Syntax: ALTER TABLE old_table_name RENAME to


new_table_name;
 existing tables present in the current
database

hive> ALTER TABLE emp RENAME to


employee_data;

41
Hive - Alter Table
 Rename a Table
 existing tables present in the current database

hive> Alter table emp rename to employee_data;

42
Hive - Alter Table
 Adding column
add one or more columns in an existing
table
Syntax: ALTER TABLE table_name ADD COLUMNS(column_name
datatype);
Schema of the table data of columns exists in the
table

• Add a new column to the table by following command

hive> ALTER TABLE employee_data ADD COLUMNS


(age int);

43
Hive - Alter Table
 Adding column
Schema of the table data of columns exists in the
table

• Add a new column to the table by following command


hive> ALTER TABLE employee_data ADD COLUMNS
(age int);

44
Hive - Alter Table
 Adding column
Schema of the table Data of columns exists in the
table

• Add a new column to the table by following command


hive> ALTER TABLE employee_data ADD COLUMNS
(age int);
Updated schema of the table

45
Hive - Alter Table
 Adding column
hive> ALTER TABLE employee_data ADD COLUMNS
(age int);
Updated schema of the table

Updated data of the table

• add any data to the


new column, hive
consider NULL as the
value.

46
References

• https://www.geeksforgeeks.org/hive-alter-
database/?
ref=ml_lbp - Alter Database
• https
://www.geeksforgeeks.org/how-to-create-t
able-in-hive/?
ref=ml_lbp – Table
• https://
www.javatpoint.com/hive-create-table

47

You might also like