BDA Unit-5-PPT

Apache Hive is a data warehouse infrastructure tool designed for processing structured data in Hadoop, facilitating easy querying and analysis of big data. It uses HiveQL for querying and is structured for OLAP rather than OLTP, with features like schema storage in a database and integration with HDFS. Hive supports various data types, table operations, partitioning, and built-in operators, making it a comprehensive solution for managing large datasets.

Uploaded by

Devabn Nirmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

BDA Unit-5-PPT

Uploaded by

Devabn Nirmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

APACHE HIVE

What is Hive?
 Hive is a data warehouse infrastructure tool to process structured data in
Hadoop.
 It is used to Summarize big data, and makes querying and analysing easy.
 Initially Hive was developed by Facebook, later the Apache Software
Foundation took it up and developed it further as an open source under
the name Apache Hive.
 Hive is not
 A relational database
 A design for OnLine Transaction Processing (OLTP)
 A language for real-time queries and row-level updates
Features of Hive
 It stores schema in a database and processed data into HDFS.
 It is designed for OLAP.
 It provides SQL type language for querying called HiveQL or HQL.
 It is familiar, fast, scalable, and extensible.
Architecture of Hive
Unit Name Operation
User Interface Create interaction between user and HDFS.

Hive chooses respective database servers to store the

Meta Store schema or Metadata of tables, databases, columns in a
table, their data types, and HDFS mapping.

 For querying on schema info on the Metastore.

HiveQL Process
 It avoids writing MapReduce program in Java, we can
Engine
write a query for MapReduce job and process it.

Processes the query and generates results as same as

Execution Engine
MapReduce results.

These are the data storage techniques to store data into

HDFS or HBASE
file system.
Working of Hive - workflow between Hive and Hadoop
Step
Name Operation
No.
1 Execute Query The Hive interface sends query to Driver (any database
driver such as JDBC, ODBC, etc.) to execute.
2 Get Plan The driver takes the help of query compiler that parses
the query to check the syntax and query plan or the
requirement of query.
3 Get Metadata The compiler sends metadata request to Metastore (any
database).
4 Send Metastore sends metadata as a response to the
Metadata compiler.
5 Send Plan The compiler checks the requirement and resends the
plan to the driver. Up to here, the parsing and compiling
of a query is complete.
6 Execute Plan The driver sends the execute plan to the execution
engine.
7 Execute Job Internally, the process of execution job is a MapReduce
job. The execution engine sends the job to JobTracker,
and it assigns this job to TaskTracker. Here, the query
executes MapReduce job.
Step
Name Operation
No.
7.1 Metadata Ops Meanwhile in execution, the execution engine can
execute metadata operations with Metastore.
8 Fetch Result The execution engine receives the results from Data
nodes.
9 Send Results The execution engine sends those resultant values to the
driver.
10 Send Results The driver sends the results to Hive Interfaces.
HIVE DATA TYPES
 All the data types are involved in the table creation.
 Classified into 4 types: Column types, literals, Null values, Complex types.
 Column Types : Integral type, String, Timestamp, Dates, Decimals.
Type Postfix Example
TINYINT Y 10Y
SMALLINT S 10S
INT - 10
BIGINT L 10L
VARCHAR Length: 1 to 65535 “ rama” or ‘rama’
CHAR Length: 255
TIMESTAMP YYYY-MM-DD HH:MM:SS.fffffffff
DATE YYYY-MM-DD
DECIMAL DECIMAL(precision, scale) decimal(10,0)
 Union Types
 Union is a collection of heterogeneous data types.
 UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>
 Ex:
 {0:1}
 {1:2.0}
 {2:["three","four"]}
 {3:{"a":5,"b":"five"}}
 {2:["six","seven"]}
 {3:{"a":8,"b":"eight"}}
 {0:9}
 {1:10.0}
 FLOAT , DOUBLE – for storing floating point numbers.
 Missing values are represented by the special value NULL.
 Complex Types
 Arrays
 Arrays in Hive are used the same way they are used in Java.
 Syntax: ARRAY<data_type>
 Maps
 Maps in Hive are similar to Java Maps.
 Syntax: MAP<primitive_type, data_type>
 Structs
 Structs in Hive is similar to using complex data with comment.
 Syntax: STRUCT<col_name : data_type [COMMENT col_comment], ...>
CREATE DATABASE and DROP
 Can define databases and tables to analyse structured data.
 Hive contains a default database named default.
 A database in Hive is a namespace or a collection of tables.
 IF NOT EXISTS is an optional clause, which notifies the user that a database
with the same name already exists.
 Drop Database is a statement that drops all the tables and deletes the
database.
 DROP (DATABASE|SCHEMA) [IF EXISTS] database_name
[RESTRICT|CASCADE];
Table Operations
 Create Table is a statement used to create a table in Hive.
 Let us assume you need to create a table named employee using CREATE
TABLE statement.
 The following table lists the fields and their data types in employee table:
 The following data is a Comment, Row formatted fields such as Field
terminator, Lines terminator, and Stored File type.
 Load Data Statement
 we can insert data using the LOAD DATA statement.
 There are two ways to load data: one is from local file system and second is from
Hadoop file system.
 We will insert the following data into the table. It is a text file named
sample.txt in /home/user directory.
 ALTER TABLE :
 Here we can alter the attributes of a table such as changing its table name,
changing column names, adding columns, and deleting or replacing columns.
 This statement takes any of the following syntaxes based on what attributes we
wish to modify in a table.
 Change Statement :
 The following table contains the fields of employee table and it shows the fields
to be changed (in bold).
 Add Columns Statement

 hive> ALTER TABLE employee ADD COLUMNS ( > dept STRING COMMENT
'Department name');

 Replace Statement
 The following query deletes all the columns from the employee table and
replaces it with emp and name columns:
 Drop Table Statement
PARTITIONING
 It is a way of dividing a table into related parts based on the values of
partitioned columns such as date, city, and department.
 Using partition, it is easy to query a portion of the data.
 Tables or partitions are sub-divided into buckets, to provide extra structure
to the data that may be used for more efficient querying.
 Bucketing works based on the value of hash function of some column of a
table.
 We can add partitions to a table by altering the table.
 Renaming a Partition

 Dropping a Partition
BUILT-IN OPERATORS
 There are four types of operators in Hive:
 1. Relational Operators
 2. Arithmetic Operators
 3. Logical Operators
 4. Complex Operators
 Relational Operators
 A = B, A != B, A < B, A > B, A <= B, A >= B – all primitive types
 A IS NULL, A IS NOT NULL – all types
 A LIKE B, A RLIKE B, A REGEXP B -- Strings
 Arithmetic operators:
 A + B, A – B, * , / , % - BINARY and all numeric
 &, |, ^ - Binary and bitwise logical
 ~ - Unary

 Logical Operators: (BOOLEAN OPERANDS)

 AND - &&
 OR - ||
 NOT - !
 Complex Operators
VIEWS AND INDEXES
 Views are generated based on user requirements.
 You can save any result set data as a view.
 You can create a view at the time of executing a SELECT statement.
 Creating an Index
 An Index is nothing but a pointer on a particular column of a table.
 Creating an index means creating a pointer on a particular column of a table.
HIVEQL SELECT…WHERE
 The Hive Query Language (HiveQL) is a query language for Hive to process
and analyze structured data in a Metastore.

 SELECT statement is used to retrieve the data from a table.

 WHERE clause works similar to a condition. It filters the data using the
condition and gives you a finite result.
 JOINS:
 It is a clause that is used for combining specific fields from two tables by using
values common to each one.
 It is used to combine records from two or more tables in the database.
ORDERS TABLE

CUSTOMERS TABLE

Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Abdel Majid Ed-Dahbi: Looking For PFE Internship in Salesforce
No ratings yet
Abdel Majid Ed-Dahbi: Looking For PFE Internship in Salesforce
1 page
Hive Tutorial
No ratings yet
Hive Tutorial
25 pages
Hive
No ratings yet
Hive
13 pages
module 3-1
No ratings yet
module 3-1
32 pages
Hiveppt
No ratings yet
Hiveppt
29 pages
HIVE
No ratings yet
HIVE
80 pages
Hive Final (1)
No ratings yet
Hive Final (1)
75 pages
Unit 2.2 Hive
No ratings yet
Unit 2.2 Hive
80 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
Hive
No ratings yet
Hive
65 pages
HIVE
No ratings yet
HIVE
28 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Hive_Main
No ratings yet
Hive_Main
33 pages
Unit 5 Lecture No-1(Hive)
No ratings yet
Unit 5 Lecture No-1(Hive)
30 pages
Hive
No ratings yet
Hive
29 pages
HIVE Lect
No ratings yet
HIVE Lect
91 pages
Hive
No ratings yet
Hive
9 pages
(r17a0528) Big Data Analytics-57-100
No ratings yet
(r17a0528) Big Data Analytics-57-100
44 pages
Hadoop Hive
No ratings yet
Hadoop Hive
61 pages
HIVE AND PIG
No ratings yet
HIVE AND PIG
57 pages
Cheat Sheet: Hive Basics
No ratings yet
Cheat Sheet: Hive Basics
1 page
Hive
No ratings yet
Hive
30 pages
Hive PPTs
No ratings yet
Hive PPTs
34 pages
Unit5 Notes
No ratings yet
Unit5 Notes
29 pages
Introduction to Hive
No ratings yet
Introduction to Hive
14 pages
Unit 5 Lecture No-1(Hive)
No ratings yet
Unit 5 Lecture No-1(Hive)
30 pages
Session 3.2
No ratings yet
Session 3.2
27 pages
Unit 3
No ratings yet
Unit 3
8 pages
unit-IV.docx
No ratings yet
unit-IV.docx
64 pages
Hive-Part-2
No ratings yet
Hive-Part-2
53 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
7.Hive
No ratings yet
7.Hive
30 pages
Hive
No ratings yet
Hive
45 pages
Apache Hive lessons for beginner
No ratings yet
Apache Hive lessons for beginner
93 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
Big Data Analytics and Developers Training Session 10
No ratings yet
Big Data Analytics and Developers Training Session 10
27 pages
DOC-20250429-WA0006. (1)
No ratings yet
DOC-20250429-WA0006. (1)
53 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
DSCI 5350 - Lecture 5 PDF
No ratings yet
DSCI 5350 - Lecture 5 PDF
64 pages
Hive
No ratings yet
Hive
23 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
bda report
No ratings yet
bda report
16 pages
Apache Hive
No ratings yet
Apache Hive
30 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages
Practical-2 Hive (Show- Create- Load Commands).Pptx
No ratings yet
Practical-2 Hive (Show- Create- Load Commands).Pptx
13 pages
Hive Notes
No ratings yet
Hive Notes
15 pages
Hive
No ratings yet
Hive
63 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
44 pages
Unit IV (1)
No ratings yet
Unit IV (1)
22 pages
hive
No ratings yet
hive
49 pages
Apache Hive: An Introduction
No ratings yet
Apache Hive: An Introduction
51 pages
hive
No ratings yet
hive
47 pages
Unit-4 Pig Hive
No ratings yet
Unit-4 Pig Hive
40 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
From Everand
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
Dave Fowler
No ratings yet
FSD Internal Questions
No ratings yet
FSD Internal Questions
2 pages
1-2 & 2-2 Sem Invigilations
No ratings yet
1-2 & 2-2 Sem Invigilations
1 page
Digital Nurture 4.0 - Qualifier Assessment
No ratings yet
Digital Nurture 4.0 - Qualifier Assessment
1 page
R25 DEPARTMENT VISION & MISSION
No ratings yet
R25 DEPARTMENT VISION & MISSION
3 pages
Challenging Problems Time Work Speed Distance (2)
No ratings yet
Challenging Problems Time Work Speed Distance (2)
40 pages
UNIT-5
No ratings yet
UNIT-5
14 pages
CO-PO Mapping
No ratings yet
CO-PO Mapping
11 pages
Digital Nurture Eligible students list
No ratings yet
Digital Nurture Eligible students list
10 pages
Programme Credit Framework
No ratings yet
Programme Credit Framework
14 pages
SAT CLASS - 5
No ratings yet
SAT CLASS - 5
18 pages
Apssdc Summer Internship-2025
No ratings yet
Apssdc Summer Internship-2025
20 pages
APF
No ratings yet
APF
9 pages
PPT on ESD UNIT 5
No ratings yet
PPT on ESD UNIT 5
31 pages
Building, Trustworthy Generative AI Systems brochure (3)
No ratings yet
Building, Trustworthy Generative AI Systems brochure (3)
6 pages
SAT CLASS - 1
No ratings yet
SAT CLASS - 1
13 pages
SAT CLASS - 4
No ratings yet
SAT CLASS - 4
6 pages
Sat Class - 14
No ratings yet
Sat Class - 14
7 pages
SAT CLASS - 2
No ratings yet
SAT CLASS - 2
17 pages
SAT CLASS -21
No ratings yet
SAT CLASS -21
4 pages
Sat Class - 26
No ratings yet
Sat Class - 26
15 pages
Sat Class - 11
No ratings yet
Sat Class - 11
13 pages
SAT CLASS -23
No ratings yet
SAT CLASS -23
4 pages
Transform Warehouse Security With AI Powered Surveillance
No ratings yet
Transform Warehouse Security With AI Powered Surveillance
9 pages
Sat Class - 19
No ratings yet
Sat Class - 19
15 pages
Sat Class - 12
No ratings yet
Sat Class - 12
11 pages
SAT CLASS -13
No ratings yet
SAT CLASS -13
2 pages
SAT CLASS -8
No ratings yet
SAT CLASS -8
2 pages
SAT CLASS -5
No ratings yet
SAT CLASS -5
4 pages
SAT CLASS -7
No ratings yet
SAT CLASS -7
2 pages
SAT CLASS -4
No ratings yet
SAT CLASS -4
2 pages
Final Yr Project Report
No ratings yet
Final Yr Project Report
82 pages
Patch Log
No ratings yet
Patch Log
3 pages
Suhrit A2 P1
No ratings yet
Suhrit A2 P1
22 pages
SIMOCRANE Basic Technology 2012-06 en en-US PDF
No ratings yet
SIMOCRANE Basic Technology 2012-06 en en-US PDF
532 pages
Agile - Prersistent Systems
No ratings yet
Agile - Prersistent Systems
5 pages
Quality Update 1 for Desigo CC Family V7
No ratings yet
Quality Update 1 for Desigo CC Family V7
29 pages
Final Quality Management Plan
100% (1)
Final Quality Management Plan
65 pages
Time Boxing
No ratings yet
Time Boxing
17 pages
Diving Into Microsoft Net Entity Framework
No ratings yet
Diving Into Microsoft Net Entity Framework
217 pages
UNIT IV - RTOS Based Embedded System Design
No ratings yet
UNIT IV - RTOS Based Embedded System Design
34 pages
Socks4 Proxies
No ratings yet
Socks4 Proxies
41 pages
Computer Studies - Computer Studies Form 2 - Question Paper
No ratings yet
Computer Studies - Computer Studies Form 2 - Question Paper
12 pages
Game Log
No ratings yet
Game Log
14 pages
Literature Search, Citation and Reference Style
No ratings yet
Literature Search, Citation and Reference Style
87 pages
Career Iness: B1 Science: Mind Reading
No ratings yet
Career Iness: B1 Science: Mind Reading
2 pages
MMM3 Mythus System Dangerous Journey RPG Newsletter
100% (1)
MMM3 Mythus System Dangerous Journey RPG Newsletter
117 pages
PC 3000 Product Catalogue
No ratings yet
PC 3000 Product Catalogue
30 pages
Installation Doubts?: Workshop Will Start at 3:00 PM
No ratings yet
Installation Doubts?: Workshop Will Start at 3:00 PM
50 pages
13 Raspberry Pi Datasheets
No ratings yet
13 Raspberry Pi Datasheets
5 pages
BMI Calculator Java
No ratings yet
BMI Calculator Java
2 pages
EPS Reviews API Launch Requirements
No ratings yet
EPS Reviews API Launch Requirements
4 pages
04 Data Kids Activity Guide - What Are Your Favorite Songs
No ratings yet
04 Data Kids Activity Guide - What Are Your Favorite Songs
22 pages
Immersive Multimedia in Entertainment:Fifa 11: Name:Predeeb Kumar A/L Palanisamy Class:5 Al-Farabi I/C No:941205-07-5191
No ratings yet
Immersive Multimedia in Entertainment:Fifa 11: Name:Predeeb Kumar A/L Palanisamy Class:5 Al-Farabi I/C No:941205-07-5191
9 pages
Attacking Metasploitable2 VM Server Cameron W
No ratings yet
Attacking Metasploitable2 VM Server Cameron W
20 pages
Bluetooth
No ratings yet
Bluetooth
21 pages
12 CS Project Model
No ratings yet
12 CS Project Model
27 pages
Eaton 9130 Rackmount UPS: Product Brochure
No ratings yet
Eaton 9130 Rackmount UPS: Product Brochure
2 pages
Deep Learning by Author of LSTM
No ratings yet
Deep Learning by Author of LSTM
26 pages
Readme
No ratings yet
Readme
10 pages

BDA Unit-5-PPT

Uploaded by

BDA Unit-5-PPT

Uploaded by

APACHE HIVE

Hive chooses respective database servers to store the

 For querying on schema info on the Metastore.

Processes the query and generates results as same as

These are the data storage techniques to store data into

 Logical Operators: (BOOLEAN OPERANDS)

 SELECT statement is used to retrieve the data from a table.

You might also like