Hive Data Manipulation

The document provides an overview of common HiveQL commands and functions for data manipulation in Hive, including loading, inserting, partitioning, and exporting data, as well as selecting, filtering, joining, aggregating, and sampling data. Key HiveQL commands and functions covered include LOAD DATA, INSERT, SELECT, WHERE, GROUP BY, JOIN, TABLESAMPLE, and CREATE VIEW.

Uploaded by

pa ott

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

181 views

Hive Data Manipulation

Uploaded by

pa ott

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Hive data Manipulation

HiveQL
Loading Data
Managed Tables
LOAD DATA LOCAL INPATH $
{env:HOME}/employees_data
OVERWRITE ITO TABLE employees
${env:HOME} can be replaced by /home/cloudera/ in
Cloudera
INPATH cannot contain any directories
LOAD DATA LOCAL copies local data to HDFS
LOAD DATA (without LOCAL) moves data to HDFS
Both source and destination must be in HDFS
In the above the destination HDFS file is in the directory:
/user/cloudera/hive/warehouse/mydb.db/employees/country=US/state=TX
Inserting Data
INSERT statement
INSERT OVERWRITE TABLE employees
SELECT * FROM employees1 e1
WHERE e1.cnty = US and e1.st = CA;
(Assumes the data is already in another
table called employees1)
Employees1 table is scanned for EACH
INSERT statement
Partitioning
From employees1 e1
INSERT OVERWRITE TABLE employees
PARTITION (country= US state = CA)
SELECT * WHERE e1.cnty = US and e1.st = CA
INSERT OVERWRITE TABLE employees
PARTITION (country= US state = NY)
SELECT * WHERE e1.cnty = US and e1.st = NY
INSERT INTO TABLE employees
PARTITION (country= US state = NV)
SELECT * WHERE e1.cnty = US and e1.st = NV;
You can mix OVERWRITE and INTO clauses
INSERT (3)
INSERT OVERWRITE TABLE employees
Partition (country, state)
SELECT e1.cnty, e1.st
FROM
Hive determines the values of partition
keys (country, state) from the last two
columns in the SELECT clause
Source column values and output column
values are determined by POSITION, not
matching names
Create and Load Table in One Query

CREATE TABLE ca_employees

AS SELECT name,salary,address
FROM employees
WHERE state = CA;
- Hive takes the schema from the SELECT
clause
- Loads the data with three fields: name,
salary and address
- This can also be used to extract subsets
from large tables
Exporting Data
How do we get data out of the
Tables?
hadoop fs cp source_path
target_path
Or
INSERT OVERWRITE LOCAL
DIRECTORY /tmp/cs_employees
SELECT etc.
This will create data in
/tmp/ca_employees dir
Hive - SELECT
SELECT name, salary FROM employees;
SELECT e.name, e.salary FROM employees e;
SELECT name, subordinates[0] FROM employees; - data
from Array
SELECT name, deductions[State Texas] FROM Employees;
- data from Map
SELECT name, address FROM employees; - data from
STRUCT (address)
SELECT name, deductions FROM employees; - data from
MAP (deductions)
- both above (address and deductions) will output in JSON
format
- Use dot notation for struct: address.city
Columns
SELECT symbol, price* FROM
stocks;
- gets all columns that start with the
name price
SELECT upper(name), salary * 1.1
FROM employees;
- does column calculations
Arithmetic Operators
Operator Description
+ Add
_ Subtract
* Multiply
/ Divide
% Modulo
& Bitwise AND
| Bitwise OR
^ Bitwise XOR
~ Bitwise NOT
Built-in Functions

round(d) round(d, N) floor(d)

ceil(d) ceiling(DOUBLE d) rand()
rand(seed) exp(d) ln(d)
log10(d) log2(d) log(base, d)
pow(d, p) power9d,p) sqrt(d)
abs(d) e() pi()
count(*) count(expr) counts count(DISTINCT, expr)
not null
sum(col) sum(DISTINC, col) avg(col)
avg(DISTINCT, col) min(col) max(col)
There are others
please see Hive
documentation
Table Generating Functions
explode(array) return one row for
each element in the array
explode(map) v.0.8.0 or later one
row for each map K-V pair
json_tuple(jsonstr,p1,p2,..pn)
returns a tuple -jsonstr->input
p1,p2,..pn->output columns
stack(n,col1,col2colM) convert M
cols into n rows of size (M/n)
Other Built-in-Functions
test in(v1,v2..vn) return true if test is in the list of values
length(s) length of string
reverse(s) reverse of string
concat(s1,s2,sn)
concat_ws(separator,s1,s2.sn)
substr(s,start_index)
substr(s, start, int length)
upper(s)
lower(s)
trim(s), ltrim(s), rtrim(s)
regex_replace(s,regex, repl_str)
to_date(timestamp), year(ts), month(ts),day(ts)
split(s, pattern)
Others
LIMIT
Nested SELECT
CASE..WHEN ..THEN
WHERE
JOIN, ON and HAVING clauses:
A = B, A<=B, A>=B.. etc.. can be used
LIKE address.street LIKE %AVE
RLIKE can use Java regular expressions
address.street RLIKE .*(Chicago|Ontario).*
GROUP BY, HAVING, JOIN
JOINS
JOIN (Inner JOIN) all non-matching records discarded.
Must find matching records inevery joined table
ON clause specifies condition for Joining
LEFT OUTER JOIN
OUTER JOIN JOIN is evaluated first and then WHERE
clause is applied!
RIGHT OUTER JOIN
FULL OUTER JOIN all matching records from all tables
LEFT SEMI JOIN returns records from left table if
matching records are found on the RIGHT table
Cartesian JOIN (cross product) use JOIN without ON..
ORDER BY, SORT BY (ASC and DESC)
SAMPLING
SELECT * FROM numbers
TABLESAMPLE(BUCKET 3 OUT OF 10)
ON rand()) s;
SELECT * FROM numbers
TABLESAMPLE(BUCKET 3 OUT OF 10)
ON number) s;
SELECT * FROM numbers
TABLESAMPLE(0.1 PERCENTY) s;
VIEWS
Allows query to be saved and treated
like a Table
Logical construct not materialized
View
CREATE VIEW short AS
SELECT * FROM people JOIN cart
ON (cart.people_id=people.id)
WHERE firstname=john;
SELECT lastname from short WHERE
id=3;

IBM Spectrum Copy Data Management - Level 2 Quiz - PASSED - M Davies (2021!07!22 08-46-05 UTC)
No ratings yet
IBM Spectrum Copy Data Management - Level 2 Quiz - PASSED - M Davies (2021!07!22 08-46-05 UTC)
7 pages
PRELIM EXAMINATION - Database Management System 1
No ratings yet
PRELIM EXAMINATION - Database Management System 1
10 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
15 pages
Comparative Study of Soaps HLL, P&G, Godrej, Nirma and Johnson & Johnson
0% (1)
Comparative Study of Soaps HLL, P&G, Godrej, Nirma and Johnson & Johnson
92 pages
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
No ratings yet
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
11 pages
Hive Using HiveQL
No ratings yet
Hive Using HiveQL
1 page
Hive - Hands On Exercises: Intellipaat Software Solutions Pvt. LTD
No ratings yet
Hive - Hands On Exercises: Intellipaat Software Solutions Pvt. LTD
8 pages
Lab_ADT_1
No ratings yet
Lab_ADT_1
31 pages
SQL
No ratings yet
SQL
57 pages
Apache Hive: An Introduction
No ratings yet
Apache Hive: An Introduction
51 pages
hive
No ratings yet
hive
15 pages
SQL Questions 1-100
No ratings yet
SQL Questions 1-100
18 pages
Hive_Main
No ratings yet
Hive_Main
33 pages
DSCI 5350 - Lecture 5 PDF
No ratings yet
DSCI 5350 - Lecture 5 PDF
64 pages
Hive Notes PDF
No ratings yet
Hive Notes PDF
12 pages
HDFSandhivecommands
No ratings yet
HDFSandhivecommands
15 pages
Hive
No ratings yet
Hive
29 pages
Sql_Interview_Questions_Top_100
No ratings yet
Sql_Interview_Questions_Top_100
18 pages
Hadoop Hive
No ratings yet
Hadoop Hive
61 pages
Hive Code
No ratings yet
Hive Code
6 pages
mysql guide
No ratings yet
mysql guide
6 pages
SQL Notes
No ratings yet
SQL Notes
9 pages
Learn_Advanced_Sql (1)
No ratings yet
Learn_Advanced_Sql (1)
48 pages
Bigdata@master: 4.set The Environmental Variable HIVE - HOME in Bashrc File
No ratings yet
Bigdata@master: 4.set The Environmental Variable HIVE - HOME in Bashrc File
91 pages
A_3_hive
No ratings yet
A_3_hive
5 pages
50 Most Useful SQL Queriesx
No ratings yet
50 Most Useful SQL Queriesx
46 pages
SQL CheatSheet
No ratings yet
SQL CheatSheet
17 pages
Practical No. 3 DBMS
No ratings yet
Practical No. 3 DBMS
6 pages
Creating Database
No ratings yet
Creating Database
52 pages
dbms labv3
No ratings yet
dbms labv3
65 pages
QL SQL1 4
No ratings yet
QL SQL1 4
9 pages
SQL Theory With Query
No ratings yet
SQL Theory With Query
11 pages
SQL INFO
No ratings yet
SQL INFO
12 pages
SQL Cheatsheet
No ratings yet
SQL Cheatsheet
16 pages
SQL-Query
No ratings yet
SQL-Query
14 pages
Hive Practical 2
No ratings yet
Hive Practical 2
11 pages
Practicals
100% (1)
Practicals
72 pages
Week-11 - 12-Hivepdf - 2023 - 11 - 10 - 12 - 47 - 43
No ratings yet
Week-11 - 12-Hivepdf - 2023 - 11 - 10 - 12 - 47 - 43
8 pages
Hive Data Manipulation
100% (1)
Hive Data Manipulation
4 pages
BDA Unit-5-PPT
No ratings yet
BDA Unit-5-PPT
39 pages
Department OF Computer Science and Engineering: Data Base Management System Laboratory
No ratings yet
Department OF Computer Science and Engineering: Data Base Management System Laboratory
37 pages
Fatima Code
No ratings yet
Fatima Code
24 pages
Hive Documet
No ratings yet
Hive Documet
33 pages
SQL - Structured Query Language A Standard That Specifies How
No ratings yet
SQL - Structured Query Language A Standard That Specifies How
66 pages
Mysql Cheat Sheet
No ratings yet
Mysql Cheat Sheet
8 pages
SQL 1732644814
No ratings yet
SQL 1732644814
7 pages
Database Management System (DBMS) : SQL (Structure Query Language)
No ratings yet
Database Management System (DBMS) : SQL (Structure Query Language)
51 pages
7349 Assignment2
No ratings yet
7349 Assignment2
3 pages
Python Vocabularies
100% (1)
Python Vocabularies
101 pages
Dbms
No ratings yet
Dbms
52 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
13 pages
MY SQL Cheat Sheet
No ratings yet
MY SQL Cheat Sheet
6 pages
Hive Commands Simplin
No ratings yet
Hive Commands Simplin
5 pages
DBMS Lab Cycle 1,2
No ratings yet
DBMS Lab Cycle 1,2
9 pages
hive table session
No ratings yet
hive table session
23 pages
Constraints: Create KEY Name NOT Default
No ratings yet
Constraints: Create KEY Name NOT Default
17 pages
DBMS_Lab_5
No ratings yet
DBMS_Lab_5
19 pages
MySQL Cheat Sheet & Quick Reference
No ratings yet
MySQL Cheat Sheet & Quick Reference
26 pages
My SQL Cheat Sheet PDF 1730815018
No ratings yet
My SQL Cheat Sheet PDF 1730815018
8 pages
Database practices lab Record NEW
No ratings yet
Database practices lab Record NEW
58 pages
JOINS
No ratings yet
JOINS
17 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Analytical Thinking Skills of Teacher Candidate ST
No ratings yet
Analytical Thinking Skills of Teacher Candidate ST
13 pages
Dell Powerscale - Scale Out NAS
No ratings yet
Dell Powerscale - Scale Out NAS
68 pages
Indonesia's Occupational Tasks and Skills
No ratings yet
Indonesia's Occupational Tasks and Skills
207 pages
Array Vs Linked List
100% (1)
Array Vs Linked List
26 pages
Hacking With WebSockets
No ratings yet
Hacking With WebSockets
43 pages
Main Project
No ratings yet
Main Project
46 pages
Oracle Exadata
100% (1)
Oracle Exadata
6 pages
Fundamentals of Health Care Analytics 2021 Important Questions
No ratings yet
Fundamentals of Health Care Analytics 2021 Important Questions
3 pages
Analisis Penerapan PSAP Nomor 13 Tentang Penyajian Laporan Keuangan
No ratings yet
Analisis Penerapan PSAP Nomor 13 Tentang Penyajian Laporan Keuangan
11 pages
Monitoring and Evalution
No ratings yet
Monitoring and Evalution
40 pages
Sanmati Engineering College Brochure PDF
No ratings yet
Sanmati Engineering College Brochure PDF
22 pages
1.02 The Main Components of Computer Systems
No ratings yet
1.02 The Main Components of Computer Systems
9 pages
Stat I Chapter 1 and 2
No ratings yet
Stat I Chapter 1 and 2
29 pages
Rumourclock: Visual Representation of Online Romour Spreading
No ratings yet
Rumourclock: Visual Representation of Online Romour Spreading
9 pages
Com - Amepro.pdv
No ratings yet
Com - Amepro.pdv
13 pages
Blood Bank Management System - Database System
100% (3)
Blood Bank Management System - Database System
27 pages
Abebech
100% (2)
Abebech
27 pages
Recovery Process db2 v13
No ratings yet
Recovery Process db2 v13
3 pages
Getting Started With SSIS
No ratings yet
Getting Started With SSIS
60 pages
Parameter Data Type Memory Area Description: 2.3.2 Mode of Operation
No ratings yet
Parameter Data Type Memory Area Description: 2.3.2 Mode of Operation
39 pages
6 Rman Recovery Section B
No ratings yet
6 Rman Recovery Section B
3 pages
Client Centric Consistency Models
100% (1)
Client Centric Consistency Models
11 pages
Text Cleaning Methods in NLP - Part-2
No ratings yet
Text Cleaning Methods in NLP - Part-2
5 pages
Report On Employee Welfare
No ratings yet
Report On Employee Welfare
50 pages
Investment Behaviour of Working Women
100% (1)
Investment Behaviour of Working Women
8 pages
CC105 Software Design
No ratings yet
CC105 Software Design
33 pages
Day 28 Master Spark Concept
No ratings yet
Day 28 Master Spark Concept
5 pages

Hive Data Manipulation

Uploaded by

Hive Data Manipulation

Uploaded by

Hive data Manipulation

CREATE TABLE ca_employees

round(d) round(d, N) floor(d)

You might also like