0% found this document useful (0 votes)
2 views

Hive2

Hive is a data warehousing tool that converts queries into MapReduce, Tez, or Spark jobs and utilizes a Metastore for schema storage. It features a Hive Shell for executing HiveQL commands and supports both internal and external tables. While it offers a familiar SQL-like interface and scalability for large datasets, it is not optimized for real-time queries.

Uploaded by

focsit.navneet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Hive2

Hive is a data warehousing tool that converts queries into MapReduce, Tez, or Spark jobs and utilizes a Metastore for schema storage. It features a Hive Shell for executing HiveQL commands and supports both internal and external tables. While it offers a familiar SQL-like interface and scalability for large datasets, it is not optimized for real-time queries.

Uploaded by

focsit.navneet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Hive

Architecture:


o

Components: Hive Driver, Compiler, Execution Engine, Metastore,


HDFS storage.

o
o

Hive converts queries to MapReduce, Tez, or Spark jobs.

o

Installation:

Install Hadoop → Install Hive → Configure Metastore.

Hive Shell: CLI for executing HiveQL commands.


Hive Services:

Metastore (stores metadata), Driver (manages sessions), Compiler


(parses queries), Execution Engine (executes queries).

Hive Metastore: Critical for schema storage, supports Derby, MySQL.



Hive vs Traditional Databases:

Hive: Schema-on-read, optimized for batch processing.

o
o

DB: Schema-on-write, optimized for transactions.

HiveQL Example:


sql


CopyEdit


CREATE TABLE students (name STRING, age INT);SELECT *


FROM students WHERE age > 20;



Tables:

Internal (managed by Hive) and External (managed outside Hive).

o

UDFs:

Extend HiveQL with custom processing.

Sorting & Aggregating:

ORDER BY, GROUP BY, CLUSTER BY.

MapReduce Integration:

Hive queries internally generate MapReduce jobs.

Joins and Subqueries:

Supports complex joins (INNER, OUTER) and nested queries.

Advantages:

Familiar SQL-like interface, scalable for large datasets.

Limitations:

Not designed for real-time queries.

You might also like