SlideShare a Scribd company logo
ยฎ
ยฉ 2014 MapR Technologies 1
ยฎ
ยฉ 2014 MapR Technologies
Self Service Data Exploration with
Apache Drill
{
Author: { โ€œnameโ€ : โ€œAditya Kishoreโ€, โ€œgithubโ€ : โ€œadityakishoreโ€, โ€œtwitterโ€ : โ€œ@adioreโ€ }
Presenter: {โ€œnameโ€:โ€Ted Dunningโ€, โ€œgithubโ€: โ€œtdunningโ€, โ€œtwitterโ€: โ€œ@ted_dunningโ€}
}
ยฎ
ยฉ 2014 MapR Technologies 2
Data is doubling in
size every two years
ยฎ
ยฉ 2014 MapR Technologies 3
2011 2013
In 2020 it is estimated to be 44 zettabytes of data in the world
2020
Source: IDC Digital Universe
44ZETTABYTES*
4.4ZETTABYTES
1.8ZETTABYTES
โ€ฆ
* Equivalent of 700 trillion 64GB iPhones
ยฎ
ยฉ 2014 MapR Technologies 4
UNSTRUCTURED
DATA
1980 2000 20101990 2020
Unstructured data will account for more than 80%
of the data collected by organizations
Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data
TotalDataStored
STRUCTURED
DATA
ยฎ
ยฉ 2014 MapR Technologies 5
Evolving distance to data
Business
(analysts,
developers)
โ€œPlumbingโ€
development
Business
(analysts, developers)
Existing approaches require
a middleman (IT)
Data
Data
Data
Business
(analysts,
developers)
Modeling and
transformations
Map/Reduce
Traditional
SQL-on-Hadoop
New
SQL-on-Hadoop
ยฎ
ยฉ 2014 MapR Technologies 6
SQL in a NoSchema World
โ€ขโ€ฏ SQL
โ€ขโ€ฏ BI (Tableau, MicroStrategy, etc.)
โ€ขโ€ฏ Low latency
โ€ขโ€ฏ Scalability
โ€ขโ€ฏ Create and maintain schemas on:
โ€“โ€ฏ HDFS (Parquet, JSON, etc.)
โ€“โ€ฏ HBase
โ€“โ€ฏ MongoDB
โ€ขโ€ฏ Transform or copy data
2 DONโ€™T WANT WANT
ยฎ
ยฉ 2014 MapR Technologies 7
โ€ขโ€ฏSchema-free scale-out query engine for Hadoop and NoSQL
โ€ขโ€ฏLow latency
โ€ขโ€ฏExtreme ease of use
โ€ขโ€ฏIndustry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs
APACHE DRILL
ยฎ
ยฉ 2014 MapR Technologies 8
Drillโ€™s Data Model is Flexible
HBase
JSON
BSON
CSV
TSV
Parquet
Avro
Schema-lessFixed schema
Flat
Complex
Flexibility
Name! Gender! Age!
Michael! M! 6!
Jennifer! F! 3!
{!
name: {!
first: Michael,!
last: Smith!
},!
hobbies: [ski, soccer],!
district: Los Altos!
}!
{!
name: {!
first: Jennifer,!
last: Gates!
},!
hobbies: [sing],!
preschool: CCLC!
}!
RDBMS/SQL-on-Hadoop table
Apache Drill table
Flexibility
ยฎ
ยฉ 2014 MapR Technologies 9
Running Drill takes 10 minutes
	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย full_name	
 ย 	
 ย 	
 ย 	
 ย |	
 ย position_title	
 ย |	
 ย 	
 ย 	
 ย salary	
 ย 	
 ย 	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย Sheri	
 ย Nowmer	
 ย |	
 ย President	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 80000.0	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
1	
 ย row	
 ย selected	
 ย (0.417	
 ย seconds)	
 ย 
DOWNLOAD https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes
EXTRACT
$	
 ย tar	
 ย xf	
 ย apache-ยญโ€drill-ยญโ€0.7.0.tar.gz	
 ย 
$	
 ย cd	
 ย apache-ยญโ€drill-ยญโ€0.7.0	
 ย 
RUN $	
 ย bin/sqlline	
 ย -ยญโ€u	
 ย jdbc:drill:zk=local	
 ย 
>	
 ย SELECT	
 ย full_name,	
 ย position_title,	
 ย salary	
 ย 
	
 ย 	
 ย FROM	
 ย cp.`employee.json	
 ย `	
 ย 
	
 ย 	
 ย LIMIT	
 ย 1;	
 ย QUERY
& step by step
In SQL format
ยฎ
ยฉ 2014 MapR Technologies 10
Introduce external data sources to Drill
ร˜๏ƒ˜โ€ฏ
SELECT	
 ย *	
 ย FROM	
 ย dfs.root.`/
E:/drill/data/yelp/
review.json`;	
 ย 
ร˜๏ƒ˜โ€ฏ
SELECT	
 ย *	
 ย FROM	
 ย 
dfs.yelp.`review.json`	
 ย 
LIMIT	
 ย 1;	
 ย 
ร˜๏ƒ˜โ€ฏ
USE	
 ย dfs.yelp;	
 ย 
ร˜๏ƒ˜โ€ฏ
SELECT	
 ย *	
 ย FROM	
 ย 
`review.json`	
 ย LIMIT	
 ย 1;	
 ย 
ร˜๏ƒ˜โ€ฏ
SELECT	
 ย *	
 ย FROM	
 ย hbase.users	
 ย 
LIMIT	
 ย 1;	
 ย 
Storage Plugin
Provider
Workspace Table
files Path Path relative to workspace
mongo Database Collection
hive Database Table
hbase Namespace Table
Coordinates:
Currently
Supported
Providers
. .
ยฎ
ยฉ 2014 MapR Technologies 11
Introduce external data sources to Drill
Storage Plugin
Provider
Workspace Table
files Path Path relative to workspace
mongo Database Collection
hive Database Table
hbase Namespace Table
Coordinates:
Currently
Supported
Providers
. .
Example:
ร˜๏ƒ˜โ€ฏ SELECT	
 ย *	
 ย FROM	
 ย dfs.root.`/E:/drill/data/yelp/
review.json`;	
 ย 
ร˜๏ƒ˜โ€ฏ SELECT	
 ย *	
 ย FROM	
 ย dfs.yelp.`review.json`	
 ย LIMIT	
 ย 1;	
 ย 
ร˜๏ƒ˜โ€ฏ USE	
 ย dfs.yelp;	
 ย 
ร˜๏ƒ˜โ€ฏ SELECT	
 ย *	
 ย FROM	
 ย `review.json`	
 ย LIMIT	
 ย 1;	
 ย 
ร˜๏ƒ˜โ€ฏ SELECT	
 ย *	
 ย FROM	
 ย hbase.users	
 ย LIMIT	
 ย 1;	
 ย 
ร˜๏ƒ˜โ€ฏ
SELECT	
 ย *	
 ย FROM	
 ย dfs.root.`/
E:/drill/data/yelp/
review.json`;	
 ย 
ร˜๏ƒ˜โ€ฏ
SELECT	
 ย *	
 ย FROM	
 ย 
dfs.yelp.`review.json`	
 ย 
LIMIT	
 ย 1;	
 ย 
ร˜๏ƒ˜โ€ฏ
USE	
 ย dfs.yelp;	
 ย 
ร˜๏ƒ˜โ€ฏ
SELECT	
 ย *	
 ย FROM	
 ย 
`review.json`	
 ย LIMIT	
 ย 1;	
 ย 
ร˜๏ƒ˜โ€ฏ
SELECT	
 ย *	
 ย FROM	
 ย hbase.users	
 ย 
LIMIT	
 ย 1;	
 ย 
ยฎ
ยฉ 2014 MapR Technologies 12
{	
 ย 
	
 ย 	
 ย "votes":	
 ย {"funny":	
 ย 0,	
 ย "useful":	
 ย 2,	
 ย "cool":	
 ย 1},	
 ย 
	
 ย 	
 ย "user_id":	
 ย "Xqd0DzHaiyRqVH3WRG7hzg",	
 ย 
	
 ย 	
 ย "review_id":	
 ย "15SdjuK7DmYqUAj6rjGowg",	
 ย 
	
 ย 	
 ย "stars":	
 ย 5,	
 ย 
	
 ย 	
 ย "date":	
 ย "2007-ยญโ€05-ยญโ€17",	
 ย 
	
 ย 	
 ย "text":	
 ย "dr.	
 ย goldberg	
 ย offers	
 ย everything	
 ย ...",	
 ย 
	
 ย 	
 ย "type":	
 ย "review",	
 ย 
	
 ย 	
 ย "business_id":	
 ย "vcNAWiLM4dR7D2nwwJ7nCA"	
 ย 
}	
 ย 
Inventory: DFS Files
ยฎ
ยฉ 2014 MapR Technologies 13
business.json (1)
{	
 ย 
	
 ย "business_id":	
 ย "4bEjOyTaDG24SY5TxsaUNQ",	
 ย 
	
 ย "full_address":	
 ย "3655	
 ย Las	
 ย Vegas	
 ย Blvd	
 ย SnThe	
 ย StripnLas	
 ย Vegas,	
 ย NV	
 ย 89109",	
 ย 
	
 ย "hours":	
 ย {	
 ย 
	
 ย  	
 ย "Monday":	
 ย {"close":	
 ย "23:00",	
 ย "open":	
 ย "07:00"},	
 ย 
	
 ย  	
 ย "Tuesday":	
 ย {"close":	
 ย "23:00",	
 ย "open":	
 ย "07:00"},	
 ย 
	
 ย  	
 ย "Friday":	
 ย {"close":	
 ย "00:00",	
 ย "open":	
 ย "07:00"},	
 ย 
	
 ย  	
 ย "Wednesday":	
 ย {"close":	
 ย "23:00",	
 ย "open":	
 ย "07:00"},	
 ย 
	
 ย  	
 ย "Thursday":	
 ย {"close":	
 ย "23:00",	
 ย "open":	
 ย "07:00"},	
 ย 
	
 ย  	
 ย "Sunday":	
 ย {"close":	
 ย "23:00",	
 ย "open":	
 ย "07:00"},	
 ย 
	
 ย  	
 ย "Saturday":	
 ย {"close":	
 ย "00:00",	
 ย "open":	
 ย "07:00"}	
 ย 
	
 ย },	
 ย 
	
 ย "open":	
 ย true,	
 ย 
	
 ย "categories":	
 ย ["Breakfast	
 ย &	
 ย Brunch",	
 ย "Steakhouses",	
 ย "French",	
 ย "Restaurants"],	
 ย 
	
 ย "city":	
 ย "Las	
 ย Vegas",	
 ย 
	
 ย "review_count":	
 ย 4084,	
 ย 
	
 ย "name":	
 ย "Mon	
 ย Ami	
 ย Gabi",	
 ย 
	
 ย "neighborhoods":	
 ย ["The	
 ย Strip"],	
 ย 
	
 ย "longitude":	
 ย -ยญโ€115.172588519464,	
 ย 
ยฎ
ยฉ 2014 MapR Technologies 14
business.json (2)
	
 ย "state":	
 ย "NV",	
 ย 
	
 ย "stars":	
 ย 4.0,	
 ย 
	
 ย  	
 ย "attributes":	
 ย {	
 ย 
	
 ย  	
 ย "Alcohol":	
 ย "full_barโ€,	
 ย 
	
 ย  	
 ย  	
 ย "Noise	
 ย Level":	
 ย "average",	
 ย 
	
 ย  	
 ย "Has	
 ย TV":	
 ย false,	
 ย 
	
 ย  	
 ย "Attire":	
 ย "casual",	
 ย 
	
 ย  	
 ย "Ambience":	
 ย {	
 ย 
	
 ย  	
 ย  	
 ย "romantic":	
 ย true,	
 ย 
	
 ย  	
 ย  	
 ย "intimate":	
 ย false,	
 ย 
	
 ย  	
 ย  	
 ย "touristy":	
 ย false,	
 ย 
	
 ย  	
 ย  	
 ย "hipster":	
 ย false,	
 ย 
	
 ย  	
 ย  	
 ย  	
 ย "classy":	
 ย true,	
 ย 
	
 ย  	
 ย  	
 ย "trendy":	
 ย false,	
 ย 
	
 ย  	
 ย  	
 ย  	
 ย "casual":	
 ย false	
 ย 
	
 ย  	
 ย },	
 ย 
	
 ย  	
 ย "Good	
 ย For":	
 ย {"dessert":	
 ย false,	
 ย "latenight":	
 ย false,	
 ย "lunch":	
 ย false,	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย "dinner":	
 ย true,	
 ย "breakfast":	
 ย false,	
 ย "brunch":	
 ย false},	
 ย 
	
 ย }	
 ย 
}	
 ย 
ยฎ
ยฉ 2014 MapR Technologies 15
Use cases
LAS VEGAS
NEW
RESTAURANT
ยฎ
ยฉ 2014 MapR Technologies 16
NEW RESTAURANT
Customers
for opening
party
>	
 ย SELECT	
 ย name,	
 ย review_count	
 ย 
	
 ย 	
 ย FROM	
 ย dfs.yelp.`user.json`	
 ย 
	
 ย 	
 ย ORDER	
 ย BY	
 ย review_count	
 ย DESC	
 ย 
	
 ย 	
 ย LIMIT	
 ย 50;	
 ย 
	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย 	
 ย 	
 ย 	
 ย name	
 ย 	
 ย 	
 ย 	
 ย |	
 ย review_count	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย Victor	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 8062	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย Jennifer	
 ย 	
 ย 	
 ย |	
 ย 4244	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย Anita	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 3829	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย ......	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย ....	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย Eileen	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 1947	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย J	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 1946	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย Matt	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 1942	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
50	
 ย rows	
 ย selected	
 ย (1.16	
 ย seconds)	
 ย 
ยฎ
ยฉ 2014 MapR Technologies 17
Cities
with most
businesses
NEW RESTAURANT
>	
 ย SELECT	
 ย state,	
 ย city,	
 ย COUNT(*)	
 ย AS	
 ย businesses	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย FROM	
 ย dfs.yelp.`business.json`	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย GROUP	
 ย BY	
 ย state,	
 ย city	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย ORDER	
 ย BY	
 ย reviews	
 ย DESC	
 ย LIMIT	
 ย 10;	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย 	
 ย 	
 ย state	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 	
 ย 	
 ย 	
 ย city	
 ย 	
 ย 	
 ย 	
 ย |	
 ย businesses	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย NV	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย Las	
 ย Vegas	
 ย 	
 ย |	
 ย 12021	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย AZ	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย Phoenix	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 7499	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย AZ	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย Scottsdale	
 ย |	
 ย 3605	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย EDH	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย Edinburgh	
 ย 	
 ย |	
 ย 2804	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย AZ	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย Mesa	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 2041	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย AZ	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย Tempe	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 2025	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย NV	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย Henderson	
 ย 	
 ย |	
 ย 1914	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย AZ	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย Chandler	
 ย 	
 ย 	
 ย |	
 ย 1637	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย WI	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย Madison	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 1630	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย AZ	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย Glendale	
 ย 	
 ย 	
 ย |	
 ย 1196	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
ยฎ
ยฉ 2014 MapR Technologies 18
Use cases
LAS VEGAS
LAS VEGAS
RESTAURANT
ยฎ
ยฉ 2014 MapR Technologies 19
Open
restaurants
at 22:00
LAS VEGAS RESTAURANT
>	
 ย SELECT	
 ย name,	
 ย b.hours	
 ย 
	
 ย 	
 ย FROM	
 ย dfs.yelp.`business.json`	
 ย b	
 ย 
	
 ย 	
 ย WHERE	
 ย b.hours.Saturday.`open`	
 ย <	
 ย '22:00'	
 ย AND	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย b.hours.Saturday.`close`	
 ย >	
 ย '22:00'	
 ย 
	
 ย 	
 ย LIMIT	
 ย 1;	
 ย 
	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย 	
 ย 	
 ย 	
 ย name	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 	
 ย 	
 ย hours	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย Chang	
 ย Jiang	
 ย Chinese	
 ย Kitchen	
 ย |	
 ย {"Tuesday":
{"close":"22:00","open":"11:00"},"Friday":
{"close":"22:30","open":"11:00"},"Monday":
{"close":"22:00","open":"11:00"},"Wednesday":
{"close":"22:00","open":"11:00"},"Thursday":
{"close":"22:00","open":"11:00"},"Sunday":
{"close":"21:00","open":"16:00"},"Saturday":{"close":"22:30","open":"11:00"}}	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
1	
 ย row	
 ย selected	
 ย (0.013	
 ย seconds)	
 ย 
	
 ย 
ยฎ
ยฉ 2014 MapR Technologies 20
Finding
hummus
at 22:00
LAS VEGAS RESTAURANT
>	
 ย SELECT	
 ย name,	
 ย stars,	
 ย b.hours.Wednesday,	
 ย categories	
 ย 
	
 ย 	
 ย FROM	
 ย dfs.yelp.`business.json`	
 ย b	
 ย 
	
 ย 	
 ย WHERE	
 ย b.hours.Wednesday.`open`	
 ย <	
 ย '22:00'	
 ย AND	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย b.hours.Wednesday.`close`	
 ย >	
 ย '22:00'	
 ย AND	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย REPEATED_CONTAINS(categories,	
 ย 'Mediterranean')	
 ย 
AND	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย city	
 ย =	
 ย 'Las	
 ย Vegas'	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย ORDER	
 ย BY	
 ย stars	
 ย DESC	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย LIMIT	
 ย 1;	
 ย 
	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย 	
 ย 	
 ย 	
 ย name	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 	
 ย 	
 ย stars	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 	
 ย 	
 ย EXPR$2	
 ย 	
 ย 	
 ย |	
 ย categories	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย Marrakech	
 ย Moroccan	
 ย Restaurant	
 ย |	
 ย 4.0	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย {"close":"23:00","open":"17:30"}	
 ย |	
 ย 
["Mediterranean","Middle	
 ย Eastern","Moroccan","Restaurants"]	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
1	
 ย row	
 ย selected	
 ย (2.185	
 ย seconds)	
 ย 
ยฎ
ยฉ 2014 MapR Technologies 21
โ€ขโ€ฏWorking with repeated values
APACHE DRILL
Unique benefits
ยฎ
ยฉ 2014 MapR Technologies 22
Flatten Repeated Values
>	
 ย SELECT	
 ย name,	
 ย categories	
 ย 
	
 ย 	
 ย FROM	
 ย dfs.yelp.`business.json`	
 ย LIMIT	
 ย 2;	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย 	
 ย 	
 ย 	
 ย name	
 ย 	
 ย 	
 ย 	
 ย |	
 ย categories	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย Eric	
 ย Goldberg,	
 ย MD	
 ย |	
 ย ["Doctors","Health	
 ย &	
 ย Medical"]	
 ย |	
 ย 
|	
 ย Pine	
 ย Cone	
 ย Restaurant	
 ย |	
 ย ["Restaurants"]	
 ย |	
 ย 
|	
 ย Deforest	
 ย Family	
 ย Restaurant	
 ย |	
 ย ["American	
 ย (Traditional)","Restaurants"]	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
	
 ย 
>	
 ย SELECT	
 ย name,	
 ย FLATTEN(categories)	
 ย AS	
 ย categories	
 ย 
	
 ย 	
 ย FROM	
 ย dfs.yelp.`business.json`	
 ย LIMIT	
 ย 3;	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย 	
 ย 	
 ย 	
 ย name	
 ย 	
 ย 	
 ย 	
 ย |	
 ย categories	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย Eric	
 ย Goldberg,	
 ย MD	
 ย |	
 ย Doctors	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย Eric	
 ย Goldberg,	
 ย MD	
 ย |	
 ย Health	
 ย &	
 ย Medical	
 ย |	
 ย 
|	
 ย Pine	
 ย Cone	
 ย Restaurant	
 ย |	
 ย Restaurants	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
ยฎ
ยฉ 2014 MapR Technologies 23
Most and Least Common Business Categories
>	
 ย SELECT	
 ย category,	
 ย COUNT(*)	
 ย AS	
 ย businesses	
 ย 
	
 ย 	
 ย FROM	
 ย (SELECT	
 ย name,	
 ย FLATTEN(categories)	
 ย AS	
 ย category	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย FROM	
 ย dfs.yelp.`business.json`)	
 ย 
	
 ย 	
 ย GROUP	
 ย BY	
 ย category	
 ย ORDER	
 ย BY	
 ย businesses	
 ย DESC;	
 ย 
+------------+------------+
| category | businesses |
+------------+------------+
| Restaurants | 14303 |
| ............... |
| Firewood | 1 |
+------------+------------+
715 rows selected (3.439 seconds)	
 ย 
	
 ย 
>	
 ย SELECT	
 ย name,	
 ย categories	
 ย FROM	
 ย dfs.yelp.`business.json`	
 ย 
	
 ย 	
 ย WHERE	
 ย true	
 ย AND	
 ย REPEATED_CONTAINS(categories,	
 ย 'Australian');	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย 	
 ย 	
 ย 	
 ย name	
 ย 	
 ย 	
 ย 	
 ย |	
 ย categories	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย The	
 ย Australian	
 ย AZ	
 ย |	
 ย ["Bars","Burgers","Nightlife","Australian","Sports	
 ย Bars","Restaurants"]	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
ยฎ
ยฉ 2014 MapR Technologies 24
โ€ขโ€ฏViews - Dynamic and Materialized
APACHE DRILL
ยฎ
ยฉ 2014 MapR Technologies 25
Create a view combining business and reviews datasets.
>	
 ย CREATE	
 ย OR	
 ย REPLACE	
 ย VIEW	
 ย dfs.tmp.BusinessReviews	
 ย AS	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย SELECT	
 ย b.name,	
 ย b.stars,	
 ย r.votes.funny,	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย r.votes.useful,	
 ย r.votes.cool,	
 ย r.`date`	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย FROM	
 ย dfs.yelp.`business.json`	
 ย b,	
 ย dfs.yelp.`review.json`	
 ย r	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย WHERE	
 ย r.business_id	
 ย =	
 ย b.business_id;	
 ย 
	
 ย 
+------------+------------+
| ok | summary |
+------------+------------+
| true | View 'BusinessReviews' created successfully in 'dfs.tmp' schema |
+------------+------------+
	
 ย 
>	
 ย SELECT	
 ย COUNT(*)	
 ย AS	
 ย Total	
 ย FROM	
 ย dfs.tmp.BusinessReviews;	
 ย 
	
 ย 
+------------+
| Total |
+------------+
| 1125458 |
+------------+
ยฎ
ยฉ 2014 MapR Technologies 26
Materialized Views AKA Tables
>	
 ย ALTER	
 ย SESSION	
 ย SET	
 ย `store.format`	
 ย =	
 ย 'parquet';	
 ย 
	
 ย 
>	
 ย CREATE	
 ย TABLE	
 ย dfs.tmp.BusinessReviewsTbl	
 ย AS	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย SELECT	
 ย b.name,	
 ย b.stars,	
 ย r.votes.funny	
 ย funny,	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย r.votes.useful	
 ย useful,	
 ย r.votes.cool	
 ย cool,	
 ย r.`date`	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย FROM	
 ย dfs.yelp.`business.json`	
 ย b,	
 ย dfs.yelp.`review.json`	
 ย r	
 ย 
	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย WHERE	
 ย r.business_id	
 ย =	
 ย b.business_id;	
 ย 
	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย 	
 ย Fragment	
 ย 	
 ย |	
 ย Number	
 ย of	
 ย records	
 ย written	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
|	
 ย 1_0	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 176448	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย 1_1	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 192439	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย 1_2	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 198625	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย 1_3	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 200863	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย 1_4	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 181420	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
|	
 ย 1_5	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 175663	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย 	
 ย |	
 ย 
+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+	
 ย 
ยฎ
ยฉ 2014 MapR Technologies 27
DRILL ARCHITECTURE
Under the hood
ยฎ
ยฉ 2014 MapR Technologies 28
High Level Architecture
Cluster of commodity servers
โ€“โ€ฏ Daemon (drillbit) on each node
ZooKeeper maintains ephemeral cluster membership information
โ€“โ€ฏ Drillbit uses ZooKeeper to find other drillbits in the cluster
โ€“โ€ฏ Client uses ZooKeeper to find drillbits
Built-in, optimistic query execution engine. Doesnโ€™t require a
particular storage or execution system (MapReduce, Spark, Tez)
โ€“โ€ฏ Better performance and manageability
Data processing unit is columnar record batches	
 ย 
โ€“โ€ฏ Enables schema flexibility with negligible performance impact
ยฎ
ยฉ 2014 MapR Technologies 29
Drill Maximizes Data Locality
Data Source Best Practice
HDFS or MapR-FS drillbit on each DataNode
HBase or MapR-DB drillbit on each RegionServer
MongoDB drillbit on each mongod node (when using replicas, run it on the replica node)
drillbit
DataNode/
RegionServer/
mongod
drillbit
DataNode/
RegionServer/
mongod
drillbit
DataNode/
RegionServer/
mongod
ZooKeeper
ZooKeeper
ZooKeeper
โ€ฆ
ยฎ
ยฉ 2014 MapR Technologies 30
Core Modules within drillbit	
 ย 
SQL Parser
Hive
HBase
Distributed Cache
StoragePlugins
MongoDB
DFS
PhysicalPlan
ExecutionLogicalPlan Optimizer
RPC Endpoint
ยฎ
ยฉ 2014 MapR Technologies 31
SELECT * Query Execution
drillbit	
 ย 
ZooKeeper
Client
(JDBC, ODBC,
REST)
1.โ€ฏ Find drillbits
(once per session)
3.โ€ฏ Create logical and physical execution plans
4.โ€ฏ Farm out execution of fragments to cluster
(completely distributed execution)
ZooKeeper
ZooKeeper
drillbit	
 ย drillbit	
 ย 
2.โ€ฏ Submit query to
drillbit
5.โ€ฏ Return results
to client
* CTAS (CREATE TABLE AS SELECT) queries include steps 1-4
ยฎ
ยฉ 2014 MapR Technologies 32
Participate
โ€ขโ€ฏ Learn: http://drill.apache.org/
โ€ขโ€ฏ Download: http://drill.apache.org/download/
โ€ขโ€ฏ Ask Questions: user@drill.apache.org
โ€ขโ€ฏ Engage on Twitter: @ApacheDrill
ยฎ
ยฉ 2014 MapR Technologies 33
Thank You
@mapr maprtech
aditya@mapr.com
Aditya Kishore
MapRTechnologies
maprtech
mapr-technologies
adi@apache.org
ยฎ
ยฉ 2014 MapR Technologies 34
Or Run Drill in Distributed Modeโ€ฆ
$	
 ย zkServer	
 ย start	
 ย 
โ€ขโ€ฏ Make sure ZooKeeper (zkServer) is running:
โ€ขโ€ฏ Access the Web UI: http://localhost:8047
โ€ขโ€ฏ Connect a client to the cluster (e.g., sqlline):
โ€ขโ€ฏ Clients (like sqlline) connect to ZooKeeper to discover the cluster nodes
โ€ขโ€ฏ If you have multiple Drill clusters registered in one ZooKeeper ensemble, specify the desired
cluster in the JDBC connection string: jdbc:drill:zk=localhost:2181/drill/
<clustername>
โ€ขโ€ฏ Not sure if ZooKeeper is running? Run telnet	
 ย localhost	
 ย 2181 and make sure it connects
โ€ขโ€ฏ Define the Drill cluster name and ZooKeeper nodes in conf/drill-ยญโ€override.conf
โ€ขโ€ฏ Start drillbit:	
 ย 
$	
 ย bin/drillbit.sh	
 ย start	
 ย 
$	
 ย bin/sqlline	
 ย -ยญโ€u	
 ย jdbc:drill:zk=localhost:2181	
 ย 
ยฎ
ยฉ 2014 MapR Technologies 35
user.json
{	
 ย 
	
 ย "yelping_since":	
 ย "2007-ยญโ€08",	
 ย 
	
 ย "votes":	
 ย {	
 ย 
	
 ย  	
 ย "funny":	
 ย 198,	
 ย 
	
 ย  	
 ย "useful":	
 ย 415,	
 ย 
	
 ย  	
 ย "cool":	
 ย 206	
 ย 
	
 ย },	
 ย 
	
 ย "review_count":	
 ย 283,	
 ย 
	
 ย "name":	
 ย "Adele",	
 ย 
	
 ย "user_id":	
 ย "9NJdKpRNwwaL4cvKq0cN6g",	
 ย 
	
 ย "friends":	
 ย ["DrKQzBFAvxhyjLgbPSW2Qw",	
 ย "ebXx-ยญโ€G5eFqWkfDuk22f81w",	
 ย "qWLezzHxOXN-ยญโ€
GQdInixZzw"],	
 ย 
	
 ย "fans":	
 ย 10,	
 ย 
	
 ย "average_stars":	
 ย 3.6499999999999999,	
 ย 
	
 ย "compliments":	
 ย {	
 ย 
	
 ย  	
 ย "funny":	
 ย 4,	
 ย 
	
 ย  	
 ย "hot":	
 ย 17,	
 ย 
	
 ย  	
 ย "cool":	
 ย 20	
 ย 
	
 ย },	
 ย 
	
 ย "elite":	
 ย [2008,	
 ย 2009,	
 ย 2010,	
 ย 2011,	
 ย 2012,	
 ย 2013,	
 ย 2014]	
 ย 
}	
 ย 

More Related Content

What's hot (20)

PDF
Data Exploration with Apache Drill: Day 1
Charles Givre
ย 
PDF
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
The Hive
ย 
PPTX
M7 and Apache Drill, Micheal Hausenblas
Modern Data Stack France
ย 
DOCX
Apache Drill with Oracle, Hive and HBase
Nag Arvind Gudiseva
ย 
PDF
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Duyhai Doan
ย 
PPTX
Hive and Apache Tez: Benchmarked at Yahoo! Scale
DataWorks Summit
ย 
PDF
Spark Cassandra Connector: Past, Present, and Future
Russell Spitzer
ย 
PDF
Overview of running R in the Oracle Database
Brendan Tierney
ย 
PPTX
Working with Delimited Data in Apache Drill 1.6.0
Vince Gonzalez
ย 
PDF
Spark Cassandra Connector Dataframes
Russell Spitzer
ย 
PPTX
Introduction to Apache Drill - interactive query and analysis at scale
MapR Technologies
ย 
PDF
Lightning fast analytics with Spark and Cassandra
nickmbailey
ย 
PDF
Spark cassandra connector.API, Best Practices and Use-Cases
Duyhai Doan
ย 
PDF
Lightning fast analytics with Spark and Cassandra
Rustam Aliyev
ย 
PPTX
Hadoop and Spark for the SAS Developer
DataWorks Summit
ย 
PPTX
OrientDB vs Neo4j - Comparison of query/speed/functionality
Curtis Mosters
ย 
PDF
Big data analytics with Spark & Cassandra
Matthias Niehoff
ย 
PDF
OCF.tw's talk about "Introduction to spark"
Giivee The
ย 
PDF
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Edureka!
ย 
PPT
Hive ICDE 2010
ragho
ย 
Data Exploration with Apache Drill: Day 1
Charles Givre
ย 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
The Hive
ย 
M7 and Apache Drill, Micheal Hausenblas
Modern Data Stack France
ย 
Apache Drill with Oracle, Hive and HBase
Nag Arvind Gudiseva
ย 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Duyhai Doan
ย 
Hive and Apache Tez: Benchmarked at Yahoo! Scale
DataWorks Summit
ย 
Spark Cassandra Connector: Past, Present, and Future
Russell Spitzer
ย 
Overview of running R in the Oracle Database
Brendan Tierney
ย 
Working with Delimited Data in Apache Drill 1.6.0
Vince Gonzalez
ย 
Spark Cassandra Connector Dataframes
Russell Spitzer
ย 
Introduction to Apache Drill - interactive query and analysis at scale
MapR Technologies
ย 
Lightning fast analytics with Spark and Cassandra
nickmbailey
ย 
Spark cassandra connector.API, Best Practices and Use-Cases
Duyhai Doan
ย 
Lightning fast analytics with Spark and Cassandra
Rustam Aliyev
ย 
Hadoop and Spark for the SAS Developer
DataWorks Summit
ย 
OrientDB vs Neo4j - Comparison of query/speed/functionality
Curtis Mosters
ย 
Big data analytics with Spark & Cassandra
Matthias Niehoff
ย 
OCF.tw's talk about "Introduction to spark"
Giivee The
ย 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Edureka!
ย 
Hive ICDE 2010
ragho
ย 

Similar to Introduction to Apache Drill - NYC Apache Drill Meetup (20)

PDF
Self service data exploration with apache drill
MapR Technologies
ย 
PDF
Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)
Spark Summit
ย 
PDF
Apache drill self service data exploration (113)
MapR Technologies
ย 
PDF
What and Why and How: Apache Drill ! - Tugdual Grall
distributed matters
ย 
PDF
Adding Complex Data to Spark Stack by Tug Grall
Spark Summit
ย 
PPTX
Apache Drill โ€“ Hands-On SQL References
MapR Technologies
ย 
PPTX
Free Code Friday: Drill 101 - Basics of Apache Drill
MapR Technologies
ย 
PDF
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
SpagoWorld
ย 
PPTX
Drilling into Data with Apache Drill
DataWorks Summit
ย 
PDF
Analyzing Real-World Data with Apache Drill
Tomer Shiran
ย 
PPTX
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
ย 
PPTX
Analyzing Real-World Data with Apache Drill
tshiran
ย 
PDF
Webinar: Selecting the Right SQL-on-Hadoop Solution
MapR Technologies
ย 
PDF
Drill into Drill โ€“ How Providing Flexibility and Performance is Possible
MapR Technologies
ย 
PPTX
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
Jeremy Schiff
ย 
PPTX
Analyzing the Big Data of Yelp Academic Dataset
Mohammed Al Hamadi
ย 
PPTX
Building a Flexible UI with Oracle ApEx
Bradley Brown
ย 
PDF
Building Highly Flexible, High Performance Query Engines
MapR Technologies
ย 
PDF
2014 08-20-pit-hug
Andy Pernsteiner
ย 
PPT
Restaurant Finder Android Application project Presentation
Abhinav Jain
ย 
Self service data exploration with apache drill
MapR Technologies
ย 
Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)
Spark Summit
ย 
Apache drill self service data exploration (113)
MapR Technologies
ย 
What and Why and How: Apache Drill ! - Tugdual Grall
distributed matters
ย 
Adding Complex Data to Spark Stack by Tug Grall
Spark Summit
ย 
Apache Drill โ€“ Hands-On SQL References
MapR Technologies
ย 
Free Code Friday: Drill 101 - Basics of Apache Drill
MapR Technologies
ย 
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
SpagoWorld
ย 
Drilling into Data with Apache Drill
DataWorks Summit
ย 
Analyzing Real-World Data with Apache Drill
Tomer Shiran
ย 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
ย 
Analyzing Real-World Data with Apache Drill
tshiran
ย 
Webinar: Selecting the Right SQL-on-Hadoop Solution
MapR Technologies
ย 
Drill into Drill โ€“ How Providing Flexibility and Performance is Possible
MapR Technologies
ย 
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
Jeremy Schiff
ย 
Analyzing the Big Data of Yelp Academic Dataset
Mohammed Al Hamadi
ย 
Building a Flexible UI with Oracle ApEx
Bradley Brown
ย 
Building Highly Flexible, High Performance Query Engines
MapR Technologies
ย 
2014 08-20-pit-hug
Andy Pernsteiner
ย 
Restaurant Finder Android Application project Presentation
Abhinav Jain
ย 
Ad

Recently uploaded (20)

PPTX
Chess King 25.0.0.2500 With Crack Full Free Download
cracked shares
ย 
PDF
Show Which Projects Support Your Strategy and Deliver Results with OnePlan df
OnePlan Solutions
ย 
PDF
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
ย 
PPTX
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
ย 
PDF
AI Software Engineering based on Multi-view Modeling and Engineering Patterns
Hironori Washizaki
ย 
PDF
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
ย 
PDF
Virtual Threads in Java: A New Dimension of Scalability and Performance
Tier1 app
ย 
PPTX
Function & Procedure: Function Vs Procedure in PL/SQL
Shani Tiwari
ย 
PDF
Simplify React app login with asgardeo-sdk
vaibhav289687
ย 
PDF
custom development enhancement | Togglenow.pdf
aswinisuhu
ย 
PPTX
MiniTool Partition Wizard Crack 12.8 + Serial Key Download Latest [2025]
filmoracrack9001
ย 
PPTX
Odoo Migration Services by CandidRoot Solutions
CandidRoot Solutions Private Limited
ย 
PPTX
prodad heroglyph crack 2.0.214.2 Full Free Download
cracked shares
ย 
PDF
AI Prompts Cheat Code prompt engineering
Avijit Kumar Roy
ย 
PDF
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
ย 
PDF
ESUG 2025: Pharo 13 and Beyond (Stephane Ducasse)
ESUG
ย 
PDF
chapter 5.pdf cyber security and Internet of things
PalakSharma980227
ย 
PDF
Notification System for Construction Logistics Application
Safe Software
ย 
PPTX
Transforming Insights: How Generative AI is Revolutionizing Data Analytics
LetsAI Solutions
ย 
PDF
Windows 10 Professional Preactivated.pdf
asghxhsagxjah
ย 
Chess King 25.0.0.2500 With Crack Full Free Download
cracked shares
ย 
Show Which Projects Support Your Strategy and Deliver Results with OnePlan df
OnePlan Solutions
ย 
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
ย 
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
ย 
AI Software Engineering based on Multi-view Modeling and Engineering Patterns
Hironori Washizaki
ย 
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
ย 
Virtual Threads in Java: A New Dimension of Scalability and Performance
Tier1 app
ย 
Function & Procedure: Function Vs Procedure in PL/SQL
Shani Tiwari
ย 
Simplify React app login with asgardeo-sdk
vaibhav289687
ย 
custom development enhancement | Togglenow.pdf
aswinisuhu
ย 
MiniTool Partition Wizard Crack 12.8 + Serial Key Download Latest [2025]
filmoracrack9001
ย 
Odoo Migration Services by CandidRoot Solutions
CandidRoot Solutions Private Limited
ย 
prodad heroglyph crack 2.0.214.2 Full Free Download
cracked shares
ย 
AI Prompts Cheat Code prompt engineering
Avijit Kumar Roy
ย 
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
ย 
ESUG 2025: Pharo 13 and Beyond (Stephane Ducasse)
ESUG
ย 
chapter 5.pdf cyber security and Internet of things
PalakSharma980227
ย 
Notification System for Construction Logistics Application
Safe Software
ย 
Transforming Insights: How Generative AI is Revolutionizing Data Analytics
LetsAI Solutions
ย 
Windows 10 Professional Preactivated.pdf
asghxhsagxjah
ย 
Ad

Introduction to Apache Drill - NYC Apache Drill Meetup

  • 1. ยฎ ยฉ 2014 MapR Technologies 1 ยฎ ยฉ 2014 MapR Technologies Self Service Data Exploration with Apache Drill { Author: { โ€œnameโ€ : โ€œAditya Kishoreโ€, โ€œgithubโ€ : โ€œadityakishoreโ€, โ€œtwitterโ€ : โ€œ@adioreโ€ } Presenter: {โ€œnameโ€:โ€Ted Dunningโ€, โ€œgithubโ€: โ€œtdunningโ€, โ€œtwitterโ€: โ€œ@ted_dunningโ€} }
  • 2. ยฎ ยฉ 2014 MapR Technologies 2 Data is doubling in size every two years
  • 3. ยฎ ยฉ 2014 MapR Technologies 3 2011 2013 In 2020 it is estimated to be 44 zettabytes of data in the world 2020 Source: IDC Digital Universe 44ZETTABYTES* 4.4ZETTABYTES 1.8ZETTABYTES โ€ฆ * Equivalent of 700 trillion 64GB iPhones
  • 4. ยฎ ยฉ 2014 MapR Technologies 4 UNSTRUCTURED DATA 1980 2000 20101990 2020 Unstructured data will account for more than 80% of the data collected by organizations Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data TotalDataStored STRUCTURED DATA
  • 5. ยฎ ยฉ 2014 MapR Technologies 5 Evolving distance to data Business (analysts, developers) โ€œPlumbingโ€ development Business (analysts, developers) Existing approaches require a middleman (IT) Data Data Data Business (analysts, developers) Modeling and transformations Map/Reduce Traditional SQL-on-Hadoop New SQL-on-Hadoop
  • 6. ยฎ ยฉ 2014 MapR Technologies 6 SQL in a NoSchema World โ€ขโ€ฏ SQL โ€ขโ€ฏ BI (Tableau, MicroStrategy, etc.) โ€ขโ€ฏ Low latency โ€ขโ€ฏ Scalability โ€ขโ€ฏ Create and maintain schemas on: โ€“โ€ฏ HDFS (Parquet, JSON, etc.) โ€“โ€ฏ HBase โ€“โ€ฏ MongoDB โ€ขโ€ฏ Transform or copy data 2 DONโ€™T WANT WANT
  • 7. ยฎ ยฉ 2014 MapR Technologies 7 โ€ขโ€ฏSchema-free scale-out query engine for Hadoop and NoSQL โ€ขโ€ฏLow latency โ€ขโ€ฏExtreme ease of use โ€ขโ€ฏIndustry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs APACHE DRILL
  • 8. ยฎ ยฉ 2014 MapR Technologies 8 Drillโ€™s Data Model is Flexible HBase JSON BSON CSV TSV Parquet Avro Schema-lessFixed schema Flat Complex Flexibility Name! Gender! Age! Michael! M! 6! Jennifer! F! 3! {! name: {! first: Michael,! last: Smith! },! hobbies: [ski, soccer],! district: Los Altos! }! {! name: {! first: Jennifer,! last: Gates! },! hobbies: [sing],! preschool: CCLC! }! RDBMS/SQL-on-Hadoop table Apache Drill table Flexibility
  • 9. ยฎ ยฉ 2014 MapR Technologies 9 Running Drill takes 10 minutes ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย full_name ย  ย  ย  ย | ย position_title ย | ย  ย  ย salary ย  ย  ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย Sheri ย Nowmer ย | ย President ย  ย  ย  ย  ย  ย | ย 80000.0 ย  ย  ย  ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  1 ย row ย selected ย (0.417 ย seconds) ย  DOWNLOAD https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes EXTRACT $ ย tar ย xf ย apache-ยญโ€drill-ยญโ€0.7.0.tar.gz ย  $ ย cd ย apache-ยญโ€drill-ยญโ€0.7.0 ย  RUN $ ย bin/sqlline ย -ยญโ€u ย jdbc:drill:zk=local ย  > ย SELECT ย full_name, ย position_title, ย salary ย  ย  ย FROM ย cp.`employee.json ย ` ย  ย  ย LIMIT ย 1; ย QUERY & step by step In SQL format
  • 10. ยฎ ยฉ 2014 MapR Technologies 10 Introduce external data sources to Drill ร˜๏ƒ˜โ€ฏ SELECT ย * ย FROM ย dfs.root.`/ E:/drill/data/yelp/ review.json`; ย  ร˜๏ƒ˜โ€ฏ SELECT ย * ย FROM ย  dfs.yelp.`review.json` ย  LIMIT ย 1; ย  ร˜๏ƒ˜โ€ฏ USE ย dfs.yelp; ย  ร˜๏ƒ˜โ€ฏ SELECT ย * ย FROM ย  `review.json` ย LIMIT ย 1; ย  ร˜๏ƒ˜โ€ฏ SELECT ย * ย FROM ย hbase.users ย  LIMIT ย 1; ย  Storage Plugin Provider Workspace Table files Path Path relative to workspace mongo Database Collection hive Database Table hbase Namespace Table Coordinates: Currently Supported Providers . .
  • 11. ยฎ ยฉ 2014 MapR Technologies 11 Introduce external data sources to Drill Storage Plugin Provider Workspace Table files Path Path relative to workspace mongo Database Collection hive Database Table hbase Namespace Table Coordinates: Currently Supported Providers . . Example: ร˜๏ƒ˜โ€ฏ SELECT ย * ย FROM ย dfs.root.`/E:/drill/data/yelp/ review.json`; ย  ร˜๏ƒ˜โ€ฏ SELECT ย * ย FROM ย dfs.yelp.`review.json` ย LIMIT ย 1; ย  ร˜๏ƒ˜โ€ฏ USE ย dfs.yelp; ย  ร˜๏ƒ˜โ€ฏ SELECT ย * ย FROM ย `review.json` ย LIMIT ย 1; ย  ร˜๏ƒ˜โ€ฏ SELECT ย * ย FROM ย hbase.users ย LIMIT ย 1; ย  ร˜๏ƒ˜โ€ฏ SELECT ย * ย FROM ย dfs.root.`/ E:/drill/data/yelp/ review.json`; ย  ร˜๏ƒ˜โ€ฏ SELECT ย * ย FROM ย  dfs.yelp.`review.json` ย  LIMIT ย 1; ย  ร˜๏ƒ˜โ€ฏ USE ย dfs.yelp; ย  ร˜๏ƒ˜โ€ฏ SELECT ย * ย FROM ย  `review.json` ย LIMIT ย 1; ย  ร˜๏ƒ˜โ€ฏ SELECT ย * ย FROM ย hbase.users ย  LIMIT ย 1; ย 
  • 12. ยฎ ยฉ 2014 MapR Technologies 12 { ย  ย  ย "votes": ย {"funny": ย 0, ย "useful": ย 2, ย "cool": ย 1}, ย  ย  ย "user_id": ย "Xqd0DzHaiyRqVH3WRG7hzg", ย  ย  ย "review_id": ย "15SdjuK7DmYqUAj6rjGowg", ย  ย  ย "stars": ย 5, ย  ย  ย "date": ย "2007-ยญโ€05-ยญโ€17", ย  ย  ย "text": ย "dr. ย goldberg ย offers ย everything ย ...", ย  ย  ย "type": ย "review", ย  ย  ย "business_id": ย "vcNAWiLM4dR7D2nwwJ7nCA" ย  } ย  Inventory: DFS Files
  • 13. ยฎ ยฉ 2014 MapR Technologies 13 business.json (1) { ย  ย "business_id": ย "4bEjOyTaDG24SY5TxsaUNQ", ย  ย "full_address": ย "3655 ย Las ย Vegas ย Blvd ย SnThe ย StripnLas ย Vegas, ย NV ย 89109", ย  ย "hours": ย { ย  ย  ย "Monday": ย {"close": ย "23:00", ย "open": ย "07:00"}, ย  ย  ย "Tuesday": ย {"close": ย "23:00", ย "open": ย "07:00"}, ย  ย  ย "Friday": ย {"close": ย "00:00", ย "open": ย "07:00"}, ย  ย  ย "Wednesday": ย {"close": ย "23:00", ย "open": ย "07:00"}, ย  ย  ย "Thursday": ย {"close": ย "23:00", ย "open": ย "07:00"}, ย  ย  ย "Sunday": ย {"close": ย "23:00", ย "open": ย "07:00"}, ย  ย  ย "Saturday": ย {"close": ย "00:00", ย "open": ย "07:00"} ย  ย }, ย  ย "open": ย true, ย  ย "categories": ย ["Breakfast ย & ย Brunch", ย "Steakhouses", ย "French", ย "Restaurants"], ย  ย "city": ย "Las ย Vegas", ย  ย "review_count": ย 4084, ย  ย "name": ย "Mon ย Ami ย Gabi", ย  ย "neighborhoods": ย ["The ย Strip"], ย  ย "longitude": ย -ยญโ€115.172588519464, ย 
  • 14. ยฎ ยฉ 2014 MapR Technologies 14 business.json (2) ย "state": ย "NV", ย  ย "stars": ย 4.0, ย  ย  ย "attributes": ย { ย  ย  ย "Alcohol": ย "full_barโ€, ย  ย  ย  ย "Noise ย Level": ย "average", ย  ย  ย "Has ย TV": ย false, ย  ย  ย "Attire": ย "casual", ย  ย  ย "Ambience": ย { ย  ย  ย  ย "romantic": ย true, ย  ย  ย  ย "intimate": ย false, ย  ย  ย  ย "touristy": ย false, ย  ย  ย  ย "hipster": ย false, ย  ย  ย  ย  ย "classy": ย true, ย  ย  ย  ย "trendy": ย false, ย  ย  ย  ย  ย "casual": ย false ย  ย  ย }, ย  ย  ย "Good ย For": ย {"dessert": ย false, ย "latenight": ย false, ย "lunch": ย false, ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย "dinner": ย true, ย "breakfast": ย false, ย "brunch": ย false}, ย  ย } ย  } ย 
  • 15. ยฎ ยฉ 2014 MapR Technologies 15 Use cases LAS VEGAS NEW RESTAURANT
  • 16. ยฎ ยฉ 2014 MapR Technologies 16 NEW RESTAURANT Customers for opening party > ย SELECT ย name, ย review_count ย  ย  ย FROM ย dfs.yelp.`user.json` ย  ย  ย ORDER ย BY ย review_count ย DESC ย  ย  ย LIMIT ย 50; ย  ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย  ย  ย  ย name ย  ย  ย  ย | ย review_count ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย Victor ย  ย  ย  ย  ย | ย 8062 ย  ย  ย  ย  ย  ย  ย  ย  ย | ย  | ย Jennifer ย  ย  ย | ย 4244 ย  ย  ย  ย  ย  ย  ย  ย  ย | ย  | ย Anita ย  ย  ย  ย  ย  ย | ย 3829 ย  ย  ย  ย  ย  ย  ย  ย  ย | ย  | ย ...... ย  ย  ย  ย  ย | ย .... ย  ย  ย  ย  ย  ย  ย  ย  ย | ย  | ย Eileen ย  ย  ย  ย  ย | ย 1947 ย  ย  ย  ย  ย  ย  ย  ย  ย | ย  | ย J ย  ย  ย  ย  ย  ย  ย  ย  ย  ย | ย 1946 ย  ย  ย  ย  ย  ย  ย  ย  ย | ย  | ย Matt ย  ย  ย  ย  ย  ย  ย | ย 1942 ย  ย  ย  ย  ย  ย  ย  ย  ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  50 ย rows ย selected ย (1.16 ย seconds) ย 
  • 17. ยฎ ยฉ 2014 MapR Technologies 17 Cities with most businesses NEW RESTAURANT > ย SELECT ย state, ย city, ย COUNT(*) ย AS ย businesses ย  ย  ย  ย  ย FROM ย dfs.yelp.`business.json` ย  ย  ย  ย  ย GROUP ย BY ย state, ย city ย  ย  ย  ย  ย ORDER ย BY ย reviews ย DESC ย LIMIT ย 10; ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย  ย  ย state ย  ย  ย  ย | ย  ย  ย  ย city ย  ย  ย  ย | ย businesses ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย NV ย  ย  ย  ย  ย  ย  ย  ย  ย | ย Las ย Vegas ย  ย | ย 12021 ย  ย  ย  ย  ย  ย | ย  | ย AZ ย  ย  ย  ย  ย  ย  ย  ย  ย | ย Phoenix ย  ย  ย  ย | ย 7499 ย  ย  ย  ย  ย  ย  ย | ย  | ย AZ ย  ย  ย  ย  ย  ย  ย  ย  ย | ย Scottsdale ย | ย 3605 ย  ย  ย  ย  ย  ย  ย | ย  | ย EDH ย  ย  ย  ย  ย  ย  ย  ย | ย Edinburgh ย  ย | ย 2804 ย  ย  ย  ย  ย  ย  ย | ย  | ย AZ ย  ย  ย  ย  ย  ย  ย  ย  ย | ย Mesa ย  ย  ย  ย  ย  ย  ย | ย 2041 ย  ย  ย  ย  ย  ย  ย | ย  | ย AZ ย  ย  ย  ย  ย  ย  ย  ย  ย | ย Tempe ย  ย  ย  ย  ย  ย | ย 2025 ย  ย  ย  ย  ย  ย  ย | ย  | ย NV ย  ย  ย  ย  ย  ย  ย  ย  ย | ย Henderson ย  ย | ย 1914 ย  ย  ย  ย  ย  ย  ย | ย  | ย AZ ย  ย  ย  ย  ย  ย  ย  ย  ย | ย Chandler ย  ย  ย | ย 1637 ย  ย  ย  ย  ย  ย  ย | ย  | ย WI ย  ย  ย  ย  ย  ย  ย  ย  ย | ย Madison ย  ย  ย  ย | ย 1630 ย  ย  ย  ย  ย  ย  ย | ย  | ย AZ ย  ย  ย  ย  ย  ย  ย  ย  ย | ย Glendale ย  ย  ย | ย 1196 ย  ย  ย  ย  ย  ย  ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย 
  • 18. ยฎ ยฉ 2014 MapR Technologies 18 Use cases LAS VEGAS LAS VEGAS RESTAURANT
  • 19. ยฎ ยฉ 2014 MapR Technologies 19 Open restaurants at 22:00 LAS VEGAS RESTAURANT > ย SELECT ย name, ย b.hours ย  ย  ย FROM ย dfs.yelp.`business.json` ย b ย  ย  ย WHERE ย b.hours.Saturday.`open` ย < ย '22:00' ย AND ย  ย  ย  ย  ย  ย  ย  ย  ย b.hours.Saturday.`close` ย > ย '22:00' ย  ย  ย LIMIT ย 1; ย  ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย  ย  ย  ย name ย  ย  ย  ย | ย  ย  ย hours ย  ย  ย  ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย Chang ย Jiang ย Chinese ย Kitchen ย | ย {"Tuesday": {"close":"22:00","open":"11:00"},"Friday": {"close":"22:30","open":"11:00"},"Monday": {"close":"22:00","open":"11:00"},"Wednesday": {"close":"22:00","open":"11:00"},"Thursday": {"close":"22:00","open":"11:00"},"Sunday": {"close":"21:00","open":"16:00"},"Saturday":{"close":"22:30","open":"11:00"}} ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  1 ย row ย selected ย (0.013 ย seconds) ย  ย 
  • 20. ยฎ ยฉ 2014 MapR Technologies 20 Finding hummus at 22:00 LAS VEGAS RESTAURANT > ย SELECT ย name, ย stars, ย b.hours.Wednesday, ย categories ย  ย  ย FROM ย dfs.yelp.`business.json` ย b ย  ย  ย WHERE ย b.hours.Wednesday.`open` ย < ย '22:00' ย AND ย  ย  ย  ย  ย  ย  ย  ย  ย b.hours.Wednesday.`close` ย > ย '22:00' ย AND ย  ย  ย  ย  ย  ย  ย  ย  ย REPEATED_CONTAINS(categories, ย 'Mediterranean') ย  AND ย  ย  ย  ย  ย  ย  ย  ย  ย city ย = ย 'Las ย Vegas' ย  ย  ย  ย  ย ORDER ย BY ย stars ย DESC ย  ย  ย  ย  ย LIMIT ย 1; ย  ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย  ย  ย  ย name ย  ย  ย  ย | ย  ย  ย stars ย  ย  ย  ย | ย  ย  ย EXPR$2 ย  ย  ย | ย categories ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย Marrakech ย Moroccan ย Restaurant ย | ย 4.0 ย  ย  ย  ย  ย  ย  ย  ย | ย {"close":"23:00","open":"17:30"} ย | ย  ["Mediterranean","Middle ย Eastern","Moroccan","Restaurants"] ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  1 ย row ย selected ย (2.185 ย seconds) ย 
  • 21. ยฎ ยฉ 2014 MapR Technologies 21 โ€ขโ€ฏWorking with repeated values APACHE DRILL Unique benefits
  • 22. ยฎ ยฉ 2014 MapR Technologies 22 Flatten Repeated Values > ย SELECT ย name, ย categories ย  ย  ย FROM ย dfs.yelp.`business.json` ย LIMIT ย 2; ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย  ย  ย  ย name ย  ย  ย  ย | ย categories ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย Eric ย Goldberg, ย MD ย | ย ["Doctors","Health ย & ย Medical"] ย | ย  | ย Pine ย Cone ย Restaurant ย | ย ["Restaurants"] ย | ย  | ย Deforest ย Family ย Restaurant ย | ย ["American ย (Traditional)","Restaurants"] ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  ย  > ย SELECT ย name, ย FLATTEN(categories) ย AS ย categories ย  ย  ย FROM ย dfs.yelp.`business.json` ย LIMIT ย 3; ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย  ย  ย  ย name ย  ย  ย  ย | ย categories ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย Eric ย Goldberg, ย MD ย | ย Doctors ย  ย  ย  ย | ย  | ย Eric ย Goldberg, ย MD ย | ย Health ย & ย Medical ย | ย  | ย Pine ย Cone ย Restaurant ย | ย Restaurants ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย 
  • 23. ยฎ ยฉ 2014 MapR Technologies 23 Most and Least Common Business Categories > ย SELECT ย category, ย COUNT(*) ย AS ย businesses ย  ย  ย FROM ย (SELECT ย name, ย FLATTEN(categories) ย AS ย category ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย FROM ย dfs.yelp.`business.json`) ย  ย  ย GROUP ย BY ย category ย ORDER ย BY ย businesses ย DESC; ย  +------------+------------+ | category | businesses | +------------+------------+ | Restaurants | 14303 | | ............... | | Firewood | 1 | +------------+------------+ 715 rows selected (3.439 seconds) ย  ย  > ย SELECT ย name, ย categories ย FROM ย dfs.yelp.`business.json` ย  ย  ย WHERE ย true ย AND ย REPEATED_CONTAINS(categories, ย 'Australian'); ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย  ย  ย  ย name ย  ย  ย  ย | ย categories ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย The ย Australian ย AZ ย | ย ["Bars","Burgers","Nightlife","Australian","Sports ย Bars","Restaurants"] ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย 
  • 24. ยฎ ยฉ 2014 MapR Technologies 24 โ€ขโ€ฏViews - Dynamic and Materialized APACHE DRILL
  • 25. ยฎ ยฉ 2014 MapR Technologies 25 Create a view combining business and reviews datasets. > ย CREATE ย OR ย REPLACE ย VIEW ย dfs.tmp.BusinessReviews ย AS ย  ย  ย  ย  ย SELECT ย b.name, ย b.stars, ย r.votes.funny, ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย r.votes.useful, ย r.votes.cool, ย r.`date` ย  ย  ย  ย  ย  ย  ย FROM ย dfs.yelp.`business.json` ย b, ย dfs.yelp.`review.json` ย r ย  ย  ย  ย  ย  ย  ย WHERE ย r.business_id ย = ย b.business_id; ย  ย  +------------+------------+ | ok | summary | +------------+------------+ | true | View 'BusinessReviews' created successfully in 'dfs.tmp' schema | +------------+------------+ ย  > ย SELECT ย COUNT(*) ย AS ย Total ย FROM ย dfs.tmp.BusinessReviews; ย  ย  +------------+ | Total | +------------+ | 1125458 | +------------+
  • 26. ยฎ ยฉ 2014 MapR Technologies 26 Materialized Views AKA Tables > ย ALTER ย SESSION ย SET ย `store.format` ย = ย 'parquet'; ย  ย  > ย CREATE ย TABLE ย dfs.tmp.BusinessReviewsTbl ย AS ย  ย  ย  ย  ย SELECT ย b.name, ย b.stars, ย r.votes.funny ย funny, ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย r.votes.useful ย useful, ย r.votes.cool ย cool, ย r.`date` ย  ย  ย  ย  ย  ย  ย FROM ย dfs.yelp.`business.json` ย b, ย dfs.yelp.`review.json` ย r ย  ย  ย  ย  ย  ย  ย WHERE ย r.business_id ย = ย b.business_id; ย  ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย  ย Fragment ย  ย | ย Number ย of ย records ย written ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย  | ย 1_0 ย  ย  ย  ย  ย  ย  ย  ย | ย 176448 ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย | ย  | ย 1_1 ย  ย  ย  ย  ย  ย  ย  ย | ย 192439 ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย | ย  | ย 1_2 ย  ย  ย  ย  ย  ย  ย  ย | ย 198625 ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย | ย  | ย 1_3 ย  ย  ย  ย  ย  ย  ย  ย | ย 200863 ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย | ย  | ย 1_4 ย  ย  ย  ย  ย  ย  ย  ย | ย 181420 ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย | ย  | ย 1_5 ย  ย  ย  ย  ย  ย  ย  ย | ย 175663 ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย | ย  +-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€-ยญโ€+ ย 
  • 27. ยฎ ยฉ 2014 MapR Technologies 27 DRILL ARCHITECTURE Under the hood
  • 28. ยฎ ยฉ 2014 MapR Technologies 28 High Level Architecture Cluster of commodity servers โ€“โ€ฏ Daemon (drillbit) on each node ZooKeeper maintains ephemeral cluster membership information โ€“โ€ฏ Drillbit uses ZooKeeper to find other drillbits in the cluster โ€“โ€ฏ Client uses ZooKeeper to find drillbits Built-in, optimistic query execution engine. Doesnโ€™t require a particular storage or execution system (MapReduce, Spark, Tez) โ€“โ€ฏ Better performance and manageability Data processing unit is columnar record batches ย  โ€“โ€ฏ Enables schema flexibility with negligible performance impact
  • 29. ยฎ ยฉ 2014 MapR Technologies 29 Drill Maximizes Data Locality Data Source Best Practice HDFS or MapR-FS drillbit on each DataNode HBase or MapR-DB drillbit on each RegionServer MongoDB drillbit on each mongod node (when using replicas, run it on the replica node) drillbit DataNode/ RegionServer/ mongod drillbit DataNode/ RegionServer/ mongod drillbit DataNode/ RegionServer/ mongod ZooKeeper ZooKeeper ZooKeeper โ€ฆ
  • 30. ยฎ ยฉ 2014 MapR Technologies 30 Core Modules within drillbit ย  SQL Parser Hive HBase Distributed Cache StoragePlugins MongoDB DFS PhysicalPlan ExecutionLogicalPlan Optimizer RPC Endpoint
  • 31. ยฎ ยฉ 2014 MapR Technologies 31 SELECT * Query Execution drillbit ย  ZooKeeper Client (JDBC, ODBC, REST) 1.โ€ฏ Find drillbits (once per session) 3.โ€ฏ Create logical and physical execution plans 4.โ€ฏ Farm out execution of fragments to cluster (completely distributed execution) ZooKeeper ZooKeeper drillbit ย drillbit ย  2.โ€ฏ Submit query to drillbit 5.โ€ฏ Return results to client * CTAS (CREATE TABLE AS SELECT) queries include steps 1-4
  • 32. ยฎ ยฉ 2014 MapR Technologies 32 Participate โ€ขโ€ฏ Learn: http://drill.apache.org/ โ€ขโ€ฏ Download: http://drill.apache.org/download/ โ€ขโ€ฏ Ask Questions: [email protected] โ€ขโ€ฏ Engage on Twitter: @ApacheDrill
  • 33. ยฎ ยฉ 2014 MapR Technologies 33 Thank You @mapr maprtech [email protected] Aditya Kishore MapRTechnologies maprtech mapr-technologies [email protected]
  • 34. ยฎ ยฉ 2014 MapR Technologies 34 Or Run Drill in Distributed Modeโ€ฆ $ ย zkServer ย start ย  โ€ขโ€ฏ Make sure ZooKeeper (zkServer) is running: โ€ขโ€ฏ Access the Web UI: http://localhost:8047 โ€ขโ€ฏ Connect a client to the cluster (e.g., sqlline): โ€ขโ€ฏ Clients (like sqlline) connect to ZooKeeper to discover the cluster nodes โ€ขโ€ฏ If you have multiple Drill clusters registered in one ZooKeeper ensemble, specify the desired cluster in the JDBC connection string: jdbc:drill:zk=localhost:2181/drill/ <clustername> โ€ขโ€ฏ Not sure if ZooKeeper is running? Run telnet ย localhost ย 2181 and make sure it connects โ€ขโ€ฏ Define the Drill cluster name and ZooKeeper nodes in conf/drill-ยญโ€override.conf โ€ขโ€ฏ Start drillbit: ย  $ ย bin/drillbit.sh ย start ย  $ ย bin/sqlline ย -ยญโ€u ย jdbc:drill:zk=localhost:2181 ย 
  • 35. ยฎ ยฉ 2014 MapR Technologies 35 user.json { ย  ย "yelping_since": ย "2007-ยญโ€08", ย  ย "votes": ย { ย  ย  ย "funny": ย 198, ย  ย  ย "useful": ย 415, ย  ย  ย "cool": ย 206 ย  ย }, ย  ย "review_count": ย 283, ย  ย "name": ย "Adele", ย  ย "user_id": ย "9NJdKpRNwwaL4cvKq0cN6g", ย  ย "friends": ย ["DrKQzBFAvxhyjLgbPSW2Qw", ย "ebXx-ยญโ€G5eFqWkfDuk22f81w", ย "qWLezzHxOXN-ยญโ€ GQdInixZzw"], ย  ย "fans": ย 10, ย  ย "average_stars": ย 3.6499999999999999, ย  ย "compliments": ย { ย  ย  ย "funny": ย 4, ย  ย  ย "hot": ย 17, ย  ย  ย "cool": ย 20 ย  ย }, ย  ย "elite": ย [2008, ย 2009, ย 2010, ย 2011, ย 2012, ย 2013, ย 2014] ย  } ย