2 SQL Hadoop Analyzing Big Data Hive m2 Intro Slides
2 SQL Hadoop Analyzing Big Data Hive m2 Intro Slides
Ahmad Alkilani
www.pluralsight.com
Outline
Hive’s Architecture
JDBC
Thrift Server
ODBC
Metastore
HiveQL
Hive CLI
Hive Web UI
HDInsight
Driver
Query processing ,compiling, optimizing
Execution
MapReduce
HDFS
Hive Principles – Schema on Read
XML JSON
Log Files
Log Files
Hive: Read as the following structure
Text
Text
HDFS
Hive Principles – The Hive Warehouse
Hive warehouse
Meta data about all the objects known to Hive, persisted in in the meta store
Consists of
Databases
Database_A Database_B
Tables
Partitions
2012
Buckets/Clusters
Hive Basics
The SELECT statement
SELECT • DISTINCT Clause
exp1, exp2, exp3 SELECT DISTINCT col1, col2, col3 FROM some_table;
FROM
some_table • Aliasing
WHERE SELECT col1 + col2 AS col3 FROM some_table;
where_condition
LIMIT • REGEX Column Specification
number_of_records; SELECT '(ID|Name)?+.+’ FROM some_table;
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ….)];
USE db_name;
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name;
/somewhere/on/hdf
shumanresources.db
Create Table
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
[(col_name data type [COMMENT col_comment], ...)]
[PARTITIONED BY (col_name data type [COMMENT col_comment], ...)]
[ROW FORMAT row_format] [STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)];
HDFS
/hive/warehouse
advertising finance.db /mydata/2013/07/2
1
sales deals
ctry=USA ctry=UAE
/mydata/2013/07/2
6
/mydata/2012/03/1
/somewhere/on/hdfs 9
humanresources.db
employees my_ext_tabl
e
Working with Hive
Demo
Demo Recap
Pluralsight database
Hive creates pluralsight.db directory
HiveQL
SELECT, UNION ALL, Sub Queries, DISTINCT, Aliasing
Create database
External and Hive managed tables
Loading data into the Hive warehouse
Truncate or overwrite
Different methods for creating tables
CTAS
LIKE