Hive Data Manipulation
Hive Data Manipulation
HiveQL
Loading Data
Managed Tables
LOAD DATA LOCAL INPATH $
{env:HOME}/employees_data
OVERWRITE ITO TABLE employees
${env:HOME} can be replaced by /home/cloudera/ in
Cloudera
INPATH cannot contain any directories
LOAD DATA LOCAL copies local data to HDFS
LOAD DATA (without LOCAL) moves data to HDFS
Both source and destination must be in HDFS
In the above the destination HDFS file is in the directory:
/user/cloudera/hive/warehouse/mydb.db/employees/country=US/state=TX
Inserting Data
INSERT statement
INSERT OVERWRITE TABLE employees
SELECT * FROM employees1 e1
WHERE e1.cnty = US and e1.st = CA;
(Assumes the data is already in another
table called employees1)
Employees1 table is scanned for EACH
INSERT statement
Partitioning
From employees1 e1
INSERT OVERWRITE TABLE employees
PARTITION (country= US state = CA)
SELECT * WHERE e1.cnty = US and e1.st = CA
INSERT OVERWRITE TABLE employees
PARTITION (country= US state = NY)
SELECT * WHERE e1.cnty = US and e1.st = NY
INSERT INTO TABLE employees
PARTITION (country= US state = NV)
SELECT * WHERE e1.cnty = US and e1.st = NV;
You can mix OVERWRITE and INTO clauses
INSERT (3)
INSERT OVERWRITE TABLE employees
Partition (country, state)
SELECT e1.cnty, e1.st
FROM
Hive determines the values of partition
keys (country, state) from the last two
columns in the SELECT clause
Source column values and output column
values are determined by POSITION, not
matching names
Create and Load Table in One Query