0% found this document useful (0 votes)

5 views

HIVE

Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. Hive uses a SQL-like language called HiveQL to process structured data using MapReduce.

Uploaded by

amitsachan47

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

HIVE

Uploaded by

amitsachan47

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Hive - Introduction about:reader?url=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fwww.tutorialspoint.com%2Fhive...

tutorialspoint.com

Hive - Introduction
5–6 minutes

The term ‘Big Data’ is used for collections of large datasets that
include huge volume, high velocity, and a variety of data that is
increasing day by day. Using traditional data management
systems, it is difficult to process Big Data. Therefore, the Apache
Software Foundation introduced a framework called Hadoop to
solve Big Data management and processing challenges.

Hadoop

Hadoop is an open-source framework to store and process Big

Data in a distributed environment. It contains two modules, one is
MapReduce and another is Hadoop Distributed File System
(HDFS).

MapReduce: It is a parallel programming model for processing

large amounts of structured, semi-structured, and unstructured
data on large clusters of commodity hardware.

HDFS:Hadoop Distributed File System is a part of Hadoop

framework, used to store and process the datasets. It provides a
fault-tolerant file system to run on commodity hardware.

The Hadoop ecosystem contains different sub-projects (tools) such

as Sqoop, Pig, and Hive that are used to help Hadoop modules.

1 of 7 5/9/2024, 3:33 PM
Hive - Introduction about:reader?url=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fwww.tutorialspoint.com%2Fhive...

Sqoop: It is used to import and export data to and from between

HDFS and RDBMS.

Pig: It is a procedural language platform used to develop a script

for MapReduce operations.

Hive: It is a platform used to develop SQL type scripts to do

MapReduce operations.

Note: There are various ways to execute MapReduce operations:

The traditional approach using Java MapReduce program for

structured, semi-structured, and unstructured data.

The scripting approach for MapReduce to process structured and

semi structured data using Pig.

The Hive Query Language (HiveQL or HQL) for MapReduce to

process structured data using Hive.

What is Hive

Hive is a data warehouse infrastructure tool to process structured

data in Hadoop. It resides on top of Hadoop to summarize Big
Data, and makes querying and analyzing easy.

Initially Hive was developed by Facebook, later the Apache

Software Foundation took it up and developed it further as an open
source under the name Apache Hive. It is used by different
companies. For example, Amazon uses it in Amazon Elastic
MapReduce.

Hive is not

A relational database

2 of 7 5/9/2024, 3:33 PM
Hive - Introduction about:reader?url=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fwww.tutorialspoint.com%2Fhive...

A design for OnLine Transaction Processing (OLTP)

A language for real-time queries and row-level updates

Features of Hive

It stores schema in a database and processed data into HDFS.

It is designed for OLAP.

It provides SQL type language for querying called HiveQL or HQL.

It is familiar, fast, scalable, and extensible.

Architecture of Hive

The following component diagram depicts the architecture of Hive:

This component diagram contains different units. The following

table describes each unit:

Unit Name Operation

User Hive is a data warehouse infrastructure software

3 of 7 5/9/2024, 3:33 PM
Hive - Introduction about:reader?url=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fwww.tutorialspoint.com%2Fhive...

Interface that can create interaction between user and

HDFS. The user interfaces that Hive supports
are Hive Web UI, Hive command line, and Hive
HD Insight (In Windows server).

Meta Store Hive chooses respective database servers to

store the schema or Metadata of tables,
databases, columns in a table, their data types,
and HDFS mapping.

HiveQL HiveQL is similar to SQL for querying on schema

Process info on the Metastore. It is one of the
Engine replacements of traditional approach for
MapReduce program. Instead of writing
MapReduce program in Java, we can write a
query for MapReduce job and process it.

Execution The conjunction part of HiveQL process Engine

Engine and MapReduce is Hive Execution Engine.
Execution engine processes the query and
generates results as same as MapReduce
results. It uses the flavor of MapReduce.

HDFS or Hadoop distributed file system or HBASE are the

HBASE data storage techniques to store data into file
system.

Working of Hive

The following diagram depicts the workflow between Hive and

Hadoop.

4 of 7 5/9/2024, 3:33 PM
Hive - Introduction about:reader?url=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fwww.tutorialspoint.com%2Fhive...

The following table defines how Hive interacts with Hadoop

framework:

Step Operation
No.

1 Execute Query
The Hive interface such as Command Line or Web UI
sends query to Driver (any database driver such as
JDBC, ODBC, etc.) to execute.

2 Get Plan
The driver takes the help of query compiler that parses
the query to check the syntax and query plan or the
requirement of query.

3 Get Metadata
The compiler sends metadata request to Metastore
(any database).

4 Send Metadata

5 of 7 5/9/2024, 3:33 PM
Hive - Introduction about:reader?url=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fwww.tutorialspoint.com%2Fhive...

Metastore sends metadata as a response to the

compiler.

5 Send Plan
The compiler checks the requirement and resends the
plan to the driver. Up to here, the parsing and compiling
of a query is complete.

6 Execute Plan
The driver sends the execute plan to the execution
engine.

7 Execute Job
Internally, the process of execution job is a MapReduce
job. The execution engine sends the job to JobTracker,
which is in Name node and it assigns this job to
TaskTracker, which is in Data node. Here, the query
executes MapReduce job.

7.1 Metadata Ops

Meanwhile in execution, the execution engine can
execute metadata operations with Metastore.

8 Fetch Result
The execution engine receives the results from Data
nodes.

9 Send Results
The execution engine sends those resultant values to
the driver.

6 of 7 5/9/2024, 3:33 PM
Hive - Introduction about:reader?url=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fwww.tutorialspoint.com%2Fhive...

10 Send Results
The driver sends the results to Hive Interfaces.

7 of 7 5/9/2024, 3:33 PM

Catalogue Ralph Lauren Home
75% (4)
Catalogue Ralph Lauren Home
189 pages
Hardware Configuration Changes in RUN Mode
No ratings yet
Hardware Configuration Changes in RUN Mode
4 pages
Alchemy Circle and Polygons by Notshurly-D2y44ze
100% (3)
Alchemy Circle and Polygons by Notshurly-D2y44ze
10 pages
1 - Introduction
No ratings yet
1 - Introduction
5 pages
01 Introduction To Hive
No ratings yet
01 Introduction To Hive
17 pages
Hive Tutorial
No ratings yet
Hive Tutorial
19 pages
01 Introduction To Hive
No ratings yet
01 Introduction To Hive
14 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
HIVE (1)
No ratings yet
HIVE (1)
18 pages
Unit 5 Lecture No-1(Hive)
No ratings yet
Unit 5 Lecture No-1(Hive)
30 pages
Assignment 4-Gcc: Hive Is Not
No ratings yet
Assignment 4-Gcc: Hive Is Not
3 pages
Hadoop and Hive architecture 1
No ratings yet
Hadoop and Hive architecture 1
11 pages
Introduction to Hive-5
No ratings yet
Introduction to Hive-5
4 pages
01 Introduction To Hive (1) 2 15
No ratings yet
01 Introduction To Hive (1) 2 15
14 pages
hive
No ratings yet
hive
49 pages
Hive
No ratings yet
Hive
12 pages
Web Based Data Management of Apache Hive
No ratings yet
Web Based Data Management of Apache Hive
22 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
big-data-unit 5
No ratings yet
big-data-unit 5
54 pages
Unit 3
No ratings yet
Unit 3
8 pages
Session 3.1
No ratings yet
Session 3.1
29 pages
Unit-IV -BDA
No ratings yet
Unit-IV -BDA
42 pages
BDA Unit-5
No ratings yet
BDA Unit-5
25 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
DA Unit-5
No ratings yet
DA Unit-5
78 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
Hive
No ratings yet
Hive
52 pages
Unit 5 Lecture No-1(Hive)
No ratings yet
Unit 5 Lecture No-1(Hive)
30 pages
Hive
No ratings yet
Hive
30 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Hive
No ratings yet
Hive
23 pages
W12 (3)
No ratings yet
W12 (3)
28 pages
Hive
No ratings yet
Hive
5 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
(r17a0528) Big Data Analytics-57-100
No ratings yet
(r17a0528) Big Data Analytics-57-100
44 pages
Bigdata Lecture 5
No ratings yet
Bigdata Lecture 5
19 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Introduction To Hive
No ratings yet
Introduction To Hive
28 pages
bda report
No ratings yet
bda report
16 pages
Ibiz Hive
No ratings yet
Ibiz Hive
27 pages
7.Hive
No ratings yet
7.Hive
30 pages
Hive
No ratings yet
Hive
12 pages
Introduction To Hive
No ratings yet
Introduction To Hive
9 pages
Hive
No ratings yet
Hive
28 pages
Big Data & Analytics (CSE6005) L6 (2)
No ratings yet
Big Data & Analytics (CSE6005) L6 (2)
56 pages
Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
Course3 Module2 Intro To Hive Slides
No ratings yet
Course3 Module2 Intro To Hive Slides
76 pages
Unit 3-1
No ratings yet
Unit 3-1
41 pages
Architecture and Working of Hive
No ratings yet
Architecture and Working of Hive
7 pages
BDA IA-3 QB-1[1]
No ratings yet
BDA IA-3 QB-1[1]
17 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Introduction To HIVE
No ratings yet
Introduction To HIVE
8 pages
Week 14 Hive
No ratings yet
Week 14 Hive
6 pages
Working of Hive: Mapreduce: It Is A Parallel Programming Model For Processing Large Amounts
No ratings yet
Working of Hive: Mapreduce: It Is A Parallel Programming Model For Processing Large Amounts
3 pages
Unit5 Notes
No ratings yet
Unit5 Notes
29 pages
BD - Unit - IV - Hive and Pig
No ratings yet
BD - Unit - IV - Hive and Pig
41 pages
unit 3 Hive Overview and Architecture
No ratings yet
unit 3 Hive Overview and Architecture
5 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
24 pages
Bda Exp-6
No ratings yet
Bda Exp-6
10 pages
Chapter - 4 - Data Access - Hive
No ratings yet
Chapter - 4 - Data Access - Hive
35 pages
Cell - The Fundamental Unit of Life LESSON
No ratings yet
Cell - The Fundamental Unit of Life LESSON
104 pages
2D Projective Geometry: 1 The Real Projective Plane
No ratings yet
2D Projective Geometry: 1 The Real Projective Plane
2 pages
Mechanical Station Plan
No ratings yet
Mechanical Station Plan
1 page
Question Bank Tissue
No ratings yet
Question Bank Tissue
12 pages
Đề cương
No ratings yet
Đề cương
13 pages
Knowledge, Attitudes, and Practices of Fisherfolks On Climate Change Awareness and Adaptation in Lake Sampaloc of San Pablo City, Laguna, Philippines
No ratings yet
Knowledge, Attitudes, and Practices of Fisherfolks On Climate Change Awareness and Adaptation in Lake Sampaloc of San Pablo City, Laguna, Philippines
8 pages
PA 7.0.1 Quick Install Guide
No ratings yet
PA 7.0.1 Quick Install Guide
50 pages
Spu 25 15 Uk
No ratings yet
Spu 25 15 Uk
2 pages
STBP Dealing Ranges
100% (4)
STBP Dealing Ranges
8 pages
GAYA BRASSERIE MENU (2024) (1)
No ratings yet
GAYA BRASSERIE MENU (2024) (1)
8 pages
GSCARR-2023-0488 (1) BGG
No ratings yet
GSCARR-2023-0488 (1) BGG
34 pages
Salient Features
No ratings yet
Salient Features
3 pages
Smartgen 4020
No ratings yet
Smartgen 4020
30 pages
Us Teletrac Navman Tradeshow - Brochure - Final - Re rls09122016
No ratings yet
Us Teletrac Navman Tradeshow - Brochure - Final - Re rls09122016
12 pages
GE Sentry For Windows Manual Ver3-0
No ratings yet
GE Sentry For Windows Manual Ver3-0
20 pages
Vertical Axis Wind Turbine
No ratings yet
Vertical Axis Wind Turbine
22 pages
Bu111 Midterm Review: Wednesday October 16th, 7pm-10pm (BA111)
No ratings yet
Bu111 Midterm Review: Wednesday October 16th, 7pm-10pm (BA111)
46 pages
Casio SGW100 Manual
No ratings yet
Casio SGW100 Manual
5 pages
UPDATED-2024-Wage-Tax-refunds-employer-certification-letter-dates-and-locations-template
No ratings yet
UPDATED-2024-Wage-Tax-refunds-employer-certification-letter-dates-and-locations-template
13 pages
Full Hard Stainless Steel Shim Flat Sheets
No ratings yet
Full Hard Stainless Steel Shim Flat Sheets
15 pages
Stanford Prison Experiment
No ratings yet
Stanford Prison Experiment
29 pages
Cambridge International AS & A Level: Chemistry 9701/11
No ratings yet
Cambridge International AS & A Level: Chemistry 9701/11
16 pages
POSTCARD PROJECT 5th Grade
No ratings yet
POSTCARD PROJECT 5th Grade
5 pages
Pharmacology 3 question bank
No ratings yet
Pharmacology 3 question bank
7 pages
A Z Of Digital Research Methods Catherine Dawson - The ebook is ready for download, no waiting required
No ratings yet
A Z Of Digital Research Methods Catherine Dawson - The ebook is ready for download, no waiting required
70 pages
AM025 (Paper 1) Set 2 - KMM (Student)
No ratings yet
AM025 (Paper 1) Set 2 - KMM (Student)
5 pages
armed forces medley 2024 sept 30 #2-Bass_Drum
No ratings yet
armed forces medley 2024 sept 30 #2-Bass_Drum
2 pages

HIVE

Uploaded by

HIVE

Uploaded by

Hive - Introduction about:reader?url=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fwww.tutorialspoint.com%2Fhive...

Hadoop is an open-source framework to store and process Big

MapReduce: It is a parallel programming model for processing

HDFS:Hadoop Distributed File System is a part of Hadoop

The Hadoop ecosystem contains different sub-projects (tools) such

Sqoop: It is used to import and export data to and from between

Pig: It is a procedural language platform used to develop a script

Hive: It is a platform used to develop SQL type scripts to do

Note: There are various ways to execute MapReduce operations:

The traditional approach using Java MapReduce program for

The scripting approach for MapReduce to process structured and

The Hive Query Language (HiveQL or HQL) for MapReduce to

Hive is a data warehouse infrastructure tool to process structured

Initially Hive was developed by Facebook, later the Apache

A design for OnLine Transaction Processing (OLTP)

A language for real-time queries and row-level updates

It stores schema in a database and processed data into HDFS.

It is designed for OLAP.

It provides SQL type language for querying called HiveQL or HQL.

It is familiar, fast, scalable, and extensible.

The following component diagram depicts the architecture of Hive:

This component diagram contains different units. The following

Unit Name Operation

User Hive is a data warehouse infrastructure software

Interface that can create interaction between user and

Meta Store Hive chooses respective database servers to

HiveQL HiveQL is similar to SQL for querying on schema

Execution The conjunction part of HiveQL process Engine

HDFS or Hadoop distributed file system or HBASE are the

The following diagram depicts the workflow between Hive and

The following table defines how Hive interacts with Hadoop

Metastore sends metadata as a response to the

7.1 Metadata Ops

You might also like