BDA Unit-5-PPT
BDA Unit-5-PPT
What is Hive?
Hive is a data warehouse infrastructure tool to process structured data in
Hadoop.
It is used to Summarize big data, and makes querying and analysing easy.
Initially Hive was developed by Facebook, later the Apache Software
Foundation took it up and developed it further as an open source under
the name Apache Hive.
Hive is not
A relational database
A design for OnLine Transaction Processing (OLTP)
A language for real-time queries and row-level updates
Features of Hive
It stores schema in a database and processed data into HDFS.
It is designed for OLAP.
It provides SQL type language for querying called HiveQL or HQL.
It is familiar, fast, scalable, and extensible.
Architecture of Hive
Unit Name Operation
User Interface Create interaction between user and HDFS.
hive> ALTER TABLE employee ADD COLUMNS ( > dept STRING COMMENT
'Department name');
Replace Statement
The following query deletes all the columns from the employee table and
replaces it with emp and name columns:
Drop Table Statement
PARTITIONING
It is a way of dividing a table into related parts based on the values of
partitioned columns such as date, city, and department.
Using partition, it is easy to query a portion of the data.
Tables or partitions are sub-divided into buckets, to provide extra structure
to the data that may be used for more efficient querying.
Bucketing works based on the value of hash function of some column of a
table.
We can add partitions to a table by altering the table.
Renaming a Partition
Dropping a Partition
BUILT-IN OPERATORS
There are four types of operators in Hive:
1. Relational Operators
2. Arithmetic Operators
3. Logical Operators
4. Complex Operators
Relational Operators
A = B, A != B, A < B, A > B, A <= B, A >= B – all primitive types
A IS NULL, A IS NOT NULL – all types
A LIKE B, A RLIKE B, A REGEXP B -- Strings
Arithmetic operators:
A + B, A – B, * , / , % - BINARY and all numeric
&, |, ^ - Binary and bitwise logical
~ - Unary
CUSTOMERS TABLE