0% found this document useful (0 votes)

38 views

2 SQL Hadoop Analyzing Big Data Hive m2 Intro Slides

Hive provides a SQL-like interface to query large datasets stored in Hadoop. It introduces the concepts of schema-on-read and a warehouse directory to store metadata. HiveQL can be used to perform queries, create and load tables, and analyze data stored in HDFS. The demo showed how to create a database and table, load sample data, and run queries on the Hive table.

Uploaded by

गोपाल शर्मा

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

2 SQL Hadoop Analyzing Big Data Hive m2 Intro Slides

Uploaded by

गोपाल शर्मा

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Introduction to Hive

Ahmad Alkilani
www.pluralsight.com
Outline

 Why Hive? Motivation

 Hive’s Architecture

 Hive Principles – Schema on Read

 Hive Principles – The Hive Warehouse

 HiveQL – SELECT, Sub queries, UNION ALL, CREATE DATABASE,

CREATE TABLE

 Demo – Working with Hive and loading data

Hive Motivation

 Opens up Big Data to the masses

 Provides a SQL-like query language and interfaces

 Builds on Hadoop core using MapReduce for execution

 Originally started at Facebook

 MapReduce development is time consuming
 Requires intimate knowledge of the framework
 Limited resources with required expertise
 No schema to help understand data in HDFS
Hive Architecture

JDBC
Thrift Server
ODBC
Metastore
HiveQL
Hive CLI

Hive Web UI

HDInsight

Driver
Query processing ,compiling, optimizing

Execution
MapReduce
HDFS
Hive Principles – Schema on Read

 Imposes no Hive-specific format

 Uses Serializers/Deserializers
JSON to read and write data

XML JSON

Log Files
Log Files
Hive: Read as the following structure
Text
Text

HDFS
Hive Principles – The Hive Warehouse

Hive warehouse
 Meta data about all the objects known to Hive, persisted in in the meta store
 Consists of
 Databases
Database_A Database_B
 Tables
 Partitions
2012
 Buckets/Clusters

 Local Hive warehouse

 Managed by Hive
 Typically under /hive/warehouse
 Dropping a table will drop the data just as well as the meta-data.
 External Tables
 Hive manages the meta-data only
 Anywhere on the Hadoop file system
 Dropping a table in Hive will only remove the table’s definition, data remains untouched.
Basic commands using HiveQL

Hive Basics
The SELECT statement
SELECT • DISTINCT Clause
exp1, exp2, exp3 SELECT DISTINCT col1, col2, col3 FROM some_table;
FROM
some_table • Aliasing
WHERE SELECT col1 + col2 AS col3 FROM some_table;
where_condition
LIMIT • REGEX Column Specification
number_of_records; SELECT '(ID|Name)?+.+’ FROM some_table;

FROM • Interchangeable constructs

some_table • Hive is not case sensitive
• Semicolon to terminate statements
SELECT
exp1, exp2, exp3
WHERE
where_condition;
Sub queries & Union

SELECT subq.mycol SELECT t3.mycol

FROM ( FROM (
SELECT col_a + col_b AS mycol SELECT col_a + col_b AS mycol
FROM some_table; FROM some_table
) subq; UNION ALL
SELECT col_y AS mycol
FROM another_table
SELECT col_a + col_b AS mycol ) t3
FROM some_table JOIN t4 ON (t4.col_x = t3.mycol);
UNION ALL
SELECT col_y AS mycol
FROM another_table;
Create Database
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS]
HDFS
database_name/hive/warehouse
[COMMENT some_comment]
marketing.db finance.db

[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ….)];
USE db_name;
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name;

/somewhere/on/hdf
shumanresources.db
Create Table
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
[(col_name data type [COMMENT col_comment], ...)]
[PARTITIONED BY (col_name data type [COMMENT col_comment], ...)]
[ROW FORMAT row_format] [STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)];

HDFS
/hive/warehouse
advertising finance.db /mydata/2013/07/2
1
sales deals
ctry=USA ctry=UAE
/mydata/2013/07/2
6

/mydata/2012/03/1
/somewhere/on/hdfs 9
humanresources.db
employees my_ext_tabl
e
Working with Hive

Demo
Demo Recap

 Pluralsight database
 Hive creates pluralsight.db directory

 Created movies hive managed table

 Placed u.info in movies table; Hive doesn’t complain but results in NULLs
 Placed correct u.item data in movies table

 LOAD DATA INPATH [path]

 Moves data if source is HDFS
 Copies data if source is LOCAL
 Syntax: LOAD DATA LOCAL INPATH [path]

 Consider using EXTERNAL tables if data is already in HDFS

Summary

 Hive as an important player in the Big Data community

 Hive warehouse and schema on read concepts

 HiveQL
 SELECT, UNION ALL, Sub Queries, DISTINCT, Aliasing
 Create database
 External and Hive managed tables
 Loading data into the Hive warehouse
 Truncate or overwrite
 Different methods for creating tables
 CTAS
 LIKE

Hadoop HIVE
No ratings yet
Hadoop HIVE
41 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
5- HIVE
No ratings yet
5- HIVE
51 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
hive
No ratings yet
hive
49 pages
HIVE architecture
No ratings yet
HIVE architecture
5 pages
Hive Main 2
No ratings yet
Hive Main 2
26 pages
Hive
No ratings yet
Hive
23 pages
Hadoop Hive
No ratings yet
Hadoop Hive
61 pages
BDA Unit-5
No ratings yet
BDA Unit-5
26 pages
Hive Interview Questions Answers
No ratings yet
Hive Interview Questions Answers
6 pages
Hive-Part-2
No ratings yet
Hive-Part-2
47 pages
Hive
No ratings yet
Hive
45 pages
Apache Hive lessons for beginner
No ratings yet
Apache Hive lessons for beginner
93 pages
Introduction to Hive
No ratings yet
Introduction to Hive
14 pages
Hive
No ratings yet
Hive
63 pages
04 Bigdata Hive
No ratings yet
04 Bigdata Hive
22 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
HIVE
No ratings yet
HIVE
28 pages
Introduction To Hive: Liyin Tang Liyintan@usc - Edu
No ratings yet
Introduction To Hive: Liyin Tang Liyintan@usc - Edu
24 pages
Unit IV Notes
No ratings yet
Unit IV Notes
47 pages
Unit-5 1
No ratings yet
Unit-5 1
29 pages
Hive
No ratings yet
Hive
9 pages
Course3 Module2 Intro To Hive Slides
No ratings yet
Course3 Module2 Intro To Hive Slides
76 pages
Hive2 PDF
No ratings yet
Hive2 PDF
8 pages
Hive L1
No ratings yet
Hive L1
134 pages
HIVE
No ratings yet
HIVE
24 pages
Hive
No ratings yet
Hive
28 pages
Big Data: Sqoop
No ratings yet
Big Data: Sqoop
43 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
Module 5_data analytics
No ratings yet
Module 5_data analytics
4 pages
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
No ratings yet
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
30 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
24 pages
Pig, Hive, and Jaql: IBM Information Management Cloud Computing Center of Competence IBM Toronto Lab
No ratings yet
Pig, Hive, and Jaql: IBM Information Management Cloud Computing Center of Competence IBM Toronto Lab
40 pages
Unit 3
No ratings yet
Unit 3
75 pages
Hive PPT
No ratings yet
Hive PPT
61 pages
HIVE
No ratings yet
HIVE
16 pages
Hive-Part-2
No ratings yet
Hive-Part-2
53 pages
Hive Slides-2
No ratings yet
Hive Slides-2
25 pages
Actividad 7. Investigación Hive
No ratings yet
Actividad 7. Investigación Hive
25 pages
The Free Hive Book
No ratings yet
The Free Hive Book
1 page
BDA-Unit-V
No ratings yet
BDA-Unit-V
23 pages
BDA-UNIT-IV -2020-21
100% (1)
BDA-UNIT-IV -2020-21
30 pages
Unit 2.2 Hive
No ratings yet
Unit 2.2 Hive
80 pages
MBDHC 2
No ratings yet
MBDHC 2
23 pages
Bda Unit 5 Notes
No ratings yet
Bda Unit 5 Notes
23 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
7 pages
Super 25 Unit 4 Notes
No ratings yet
Super 25 Unit 4 Notes
16 pages
BDA Unit-5-PPT
No ratings yet
BDA Unit-5-PPT
39 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Introduction To Hive
No ratings yet
Introduction To Hive
28 pages
Unit-3 FBDA
No ratings yet
Unit-3 FBDA
34 pages
Fbda Unit-3
No ratings yet
Fbda Unit-3
27 pages
BDACT2QB
No ratings yet
BDACT2QB
19 pages
Chapter - 4 - Data Access - Hive
No ratings yet
Chapter - 4 - Data Access - Hive
35 pages
Big Data & Analytics (CSE6005) L6 (2)
No ratings yet
Big Data & Analytics (CSE6005) L6 (2)
56 pages
Hive
No ratings yet
Hive
30 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
QuickStart Guide to Db2 Development with Python
From Everand
QuickStart Guide to Db2 Development with Python
Roger E. Sanders
No ratings yet
Fast Algorithms For Mining Association Rules
No ratings yet
Fast Algorithms For Mining Association Rules
2 pages
JW Player 6.8.4616 (Ads Edition) - Google खोज
0% (1)
JW Player 6.8.4616 (Ads Edition) - Google खोज
2 pages
FFmpeg, HLS - Google खोज
No ratings yet
FFmpeg, HLS - Google खोज
2 pages
What Is LightGBM, How To Implement It - How To Fine Tune The Parameters
No ratings yet
What Is LightGBM, How To Implement It - How To Fine Tune The Parameters
2 pages
Flutter Documentation - Flutter
No ratings yet
Flutter Documentation - Flutter
3 pages
Mahesh
No ratings yet
Mahesh
1 page
Om Namah Shivaya
No ratings yet
Om Namah Shivaya
1 page
3 SQL Hadoop Analyzing Big Data Hive m3 Hiveql Slides
No ratings yet
3 SQL Hadoop Analyzing Big Data Hive m3 Hiveql Slides
33 pages
1 SQL Hadoop Analyzing Big Data Hive m1 Intro Hadoop Slides
No ratings yet
1 SQL Hadoop Analyzing Big Data Hive m1 Intro Hadoop Slides
11 pages
Ashish Kedia's Answer To How Do I Practice Programming Everyday
No ratings yet
Ashish Kedia's Answer To How Do I Practice Programming Everyday
1 page
Power Development Department, J & K Online Payment Receipt
No ratings yet
Power Development Department, J & K Online Payment Receipt
1 page
Acio Interview - What Willbe Asked in Acio Interview - Quora
No ratings yet
Acio Interview - What Willbe Asked in Acio Interview - Quora
3 pages
Golang Tutorial - Table of Contents
No ratings yet
Golang Tutorial - Table of Contents
3 pages
Jammu Secretariat) : Kashmir at (Chief
No ratings yet
Jammu Secretariat) : Kashmir at (Chief
4 pages
Note
No ratings yet
Note
1 page
5 SQL Hadoop Analyzing Big Data Hive m5 Storage Eco System Slides
No ratings yet
5 SQL Hadoop Analyzing Big Data Hive m5 Storage Eco System Slides
15 pages
IB ACIO Previous Paper 2014-15
No ratings yet
IB ACIO Previous Paper 2014-15
30 pages
Dept. Name Postname Ur SC Stobctotalexsohhhvhgp
No ratings yet
Dept. Name Postname Ur SC Stobctotalexsohhhvhgp
4 pages
Lung - Pathophysiology
No ratings yet
Lung - Pathophysiology
66 pages
La Dolce Vita - Fashion and Media
100% (1)
La Dolce Vita - Fashion and Media
52 pages
J-HXfer Meshless Conjugate
No ratings yet
J-HXfer Meshless Conjugate
13 pages
Gem EP
No ratings yet
Gem EP
1 page
XTR 117
No ratings yet
XTR 117
16 pages
Index Page: S.No. Date Name of The Experiment Marks Awarded Remarks/ Initial's Part - A
No ratings yet
Index Page: S.No. Date Name of The Experiment Marks Awarded Remarks/ Initial's Part - A
39 pages
Wyse 7020 Thin Client Data Sheet
No ratings yet
Wyse 7020 Thin Client Data Sheet
2 pages
Evolution of Fiqh PDF
0% (1)
Evolution of Fiqh PDF
2 pages
Specifications For FK1230™ (FK-5-1-12) Fire Suppression System (25 Bar UL Listed System Used With DOT Cylinder)
50% (2)
Specifications For FK1230™ (FK-5-1-12) Fire Suppression System (25 Bar UL Listed System Used With DOT Cylinder)
6 pages
CAE Phrasal Verbs Glossary
No ratings yet
CAE Phrasal Verbs Glossary
3 pages
Data Transfer Over The Web and Client
No ratings yet
Data Transfer Over The Web and Client
3 pages
Guide: Hospital Safety Index
No ratings yet
Guide: Hospital Safety Index
176 pages
Co-Pyrolysis of Pine Nut Shells With Scrap Tires
No ratings yet
Co-Pyrolysis of Pine Nut Shells With Scrap Tires
9 pages
Complete Download (Ebook PDF) Modern Systems Analysis and Design 8th Valacich PDF All Chapters
100% (3)
Complete Download (Ebook PDF) Modern Systems Analysis and Design 8th Valacich PDF All Chapters
41 pages
Banff Trail Area Redevelopment Plan
No ratings yet
Banff Trail Area Redevelopment Plan
47 pages
IELTS Sample Writing Academic Task 2 6
100% (1)
IELTS Sample Writing Academic Task 2 6
3 pages
EKC 464 - Pretreatment Techniques For Biomass - Part2 - Student Version
No ratings yet
EKC 464 - Pretreatment Techniques For Biomass - Part2 - Student Version
17 pages
0904 County Line
No ratings yet
0904 County Line
6 pages
ANSWER KEY - Extra Listening
No ratings yet
ANSWER KEY - Extra Listening
3 pages
#Form Master List Alat Ukur - All
No ratings yet
#Form Master List Alat Ukur - All
18 pages
Thesisrip Edited
No ratings yet
Thesisrip Edited
49 pages
LESSON 3 6 Charlie and the Chocolate Factory Activities
No ratings yet
LESSON 3 6 Charlie and the Chocolate Factory Activities
18 pages
Changelog
No ratings yet
Changelog
6 pages
SABS System
No ratings yet
SABS System
8 pages
City Branding Presentation Cairo The City of Deads: Presented by Shailesh Patel
No ratings yet
City Branding Presentation Cairo The City of Deads: Presented by Shailesh Patel
10 pages
Physics Class Xii Sample Paper Test 05 For Board Exam 2023 Answers
No ratings yet
Physics Class Xii Sample Paper Test 05 For Board Exam 2023 Answers
17 pages
Rubric S
No ratings yet
Rubric S
1 page
Thermodynamics 2 Rankine Cycle
No ratings yet
Thermodynamics 2 Rankine Cycle
207 pages
BS 648 - Schedule of Weights of Building Materials
No ratings yet
BS 648 - Schedule of Weights of Building Materials
50 pages
Pesticide Applicators Questionnaire Content Validation: A Fuzzy Delphi Method
No ratings yet
Pesticide Applicators Questionnaire Content Validation: A Fuzzy Delphi Method
8 pages

2 SQL Hadoop Analyzing Big Data Hive m2 Intro Slides

Uploaded by

2 SQL Hadoop Analyzing Big Data Hive m2 Intro Slides

Uploaded by

Introduction to Hive

 Why Hive? Motivation

 Hive Principles – Schema on Read

 Hive Principles – The Hive Warehouse

 HiveQL – SELECT, Sub queries, UNION ALL, CREATE DATABASE,

 Demo – Working with Hive and loading data

 Opens up Big Data to the masses

 Provides a SQL-like query language and interfaces

 Builds on Hadoop core using MapReduce for execution

 Originally started at Facebook

 Imposes no Hive-specific format

 Local Hive warehouse

FROM • Interchangeable constructs

SELECT subq.mycol SELECT t3.mycol

 Created movies hive managed table

 LOAD DATA INPATH [path]

 Consider using EXTERNAL tables if data is already in HDFS

 Hive as an important player in the Big Data community

 Hive warehouse and schema on read concepts

You might also like