SlideShare a Scribd company logo
๊ทธ๋ฃจํ„ฐ / ์ •์žฌํ™”
Expanding Your Data
Warehouse with Tajo
2015.10.27
About me
โ€ข Gruter Corp / BigData Engineer (jhjung@gruter.com)
โ€ข Committer and PMC member of Apache Tajo
โ€ข The author of Hadoop book
โ€ข Home Page: http://blrunner.com
โ€ข Twitter: @blrunner78
Agenda
1. ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค๋ž€?
2. ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค์˜ ๋ฌธ์ œ์ 
3. ์Šคํ† ๋ฆฌ์ง€ ์—”์ง„์„ ํ™•์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•
4. SQL ์—”์ง„์„ ํ™•์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•
5. ์ ์šฉ ์‚ฌ๋ก€
๋ฐ์ดํ„ฐ
์›จ์–ดํ•˜์šฐ์Šค๋ž€?
์ถœ์ฒ˜: http://www.internetguncatalog.com/Portals/0/Warehouse%2006-08-07%20(26).jpg
What is Data Warehousing?
์ •๋ณด(data)
+ ์ฐฝ๊ณ (warehouse)
What is Data Warehousing?
A data warehouse is a system used for
reporting and data analysis. DWs are
central repositories of integrated data
from one or more disparate sources.
- Wikipedia -
What is Data Warehousing?
= data warehouse
= DW or DWH
= Enterprise data warehouse
= EDW
What is Data Warehousing?
DW์˜ ์•„ํ‚คํ…์ฒ˜๋Š”
์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑ๋ ๊นŒ?
Front-End
Analytics
Data WarehouseSource Data
OLTP
CRM
ERP
ecommerce
Other
ODS
(Operational Data
Store)
Data
Warehouse
Data Mart OLAP
Visualizat
ion
ETL
ETL
ETL
Reports
Data
Mining
์ „ํ†ต์ ์ธ DW ์•„ํ‚คํ…์ฒ˜
๊ธฐ์กด ๋ฐ์ดํ„ฐ
์›จ์–ดํ•˜์šฐ์Šค์˜
๋ฌธ์ œ์ 
๋ฐ์ดํ„ฐ ๋ณผ๋ฅจ๊ณผ ๋ณต์žก๋„๊ฐ€
๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
- Web logs / Click Stream
- User Generated Contents
- Sensors / RFID / Devices
- Spatial & GPS
- Speech to Text
- Etc โ€ฆ
๊ทธ๋ƒฅ ๋ฒ„๋ฆฌ๊ณ  ์‹ถ์ง€๋งŒ
๋ถ„์„ ์š”๊ฑด์€
๋” ๋งŽ๊ณ , ๋” ๋‹ค์–‘ํ•œ
๋ฐ์ดํ„ฐ๋ฅผ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค.
- ์„œ๋น„์Šค/์ œํ’ˆ ํ’ˆ์งˆ ๊ฐœ์„ 
- ๊ณ ๊ฐ Retention / ์ถ”์ฒœ
- ๊ฒฝ์˜ ์˜์‚ฌ ๊ฒฐ์ •
- Etc โ€ฆ
๊ทธ๋ž˜์„œ DW ํ™•์žฅ์„
์‹œ๋„ํ•˜์ง€๋งŒโ€ฆ
์ถœ์ฒ˜: https://thinkbannedthoughts.files.wordpress.com/2012/05/banging-head-on-wall.jpg
์šฉ๋Ÿ‰ ๋‹จ์œ„๋กœ ์ฆ๊ฐ€๋˜๋Š”
๋ผ์ด์„ ์Šค ๋น„์šฉ
-1 ํ…Œ๋ผ ๋ฐ”์ดํŠธ: 2๋งŒ ~ 5๋งŒ ๋‹ฌ๋Ÿฌ (2013๋…„ ๊ธฐ์ค€)
๏ƒ 2์ฒœ๋งŒ ~ 5์ฒœ๋งŒ์›
-1 ํŽ˜ํƒ€ ๋ฐ”์ดํŠธ: 2์ฒœ๋งŒ ~ 5์ฒœ๋งŒ ๋‹ฌ๋Ÿฌ
๏ƒ  200 ~ 500์–ต์›
์ถœ์ฒ˜: http://www.slideshare.net/cloudera/hadoop-extending-your-data-warehouse
๋ฐ์ดํ„ฐ ์šฉ๋Ÿ‰์„
์ฆ๊ฐ€์‹œํ‚ค๋”๋ผ๋„โ€ฆ
๋Š˜์–ด๋‚˜๋Š” ๋ฐ์ดํ„ฐ ๋ณผ๋ฅจ๋งŒํผ,
์ฒ˜๋ฆฌ ์†๋„๊ฐ€ ์ฆ๊ฐ€ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
Processing
Times
Infrastructure
CostsData Volumes
Assuming constant SLAs
์ถœ์ฒ˜: http://www.slideshare.net/cloudera/hadoop-extending-your-data-warehouse
๋˜ํ•œ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ํฌ๋งท์„
์ •ํ˜•ํ™”๋œ ํฌ๋งท์œผ๋กœ ๋ณ€๊ฒฝํ•ด์•ผ
ํ•ฉ๋‹ˆ๋‹ค.
์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ์ด๋Ÿฐ DW๋ฅผ
๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ์„๊นŒ?
- ๋‚ฎ์€ TCO ๋ณด์žฅ
- ์„ ํ˜•์ ์ธ ์šฉ๋Ÿ‰ ๋ฐ ์„ฑ๋Šฅ ํ™•์žฅ
- ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ ์ˆ˜์šฉ
- ๋น ๋ฅธ ํ”„๋กœ์„ธ์‹ฑ ๋ฐ ์†์‰ฌ์šด ๋ฐ์ดํ„ฐ ์ ‘๊ทผ
์Šคํ† ๋ฆฌ์ง€ ์—”์ง„์„
ํ™•์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•
์Šคํ† ๋ฆฌ์ง€ ์—”์ง‚์œผ๋กœ ์–ด๋–ค
์—”์ง‚์ด ์ ์ ˆํ•›๊นŒ?
Hadoop์ด๋ž€
๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์‚ฐ ์ฒ˜๋ฆฌํ•› ์ˆ˜
์žˆ๋Š” ์ž๋ฐ” ๊ธฐ๋ฐ˜์˜
์˜คํ”ˆ์†Œ์Šค ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค.
๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ (HDFS)
+
๋ถ„์‚ฐ ์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ (MapReduce)
Hadoop ์˜ ์ฃผ์š” ํŠน์ง•์€
๋ฌด์—‡์ด ์žˆ์„๊นŒ?
Data Locality
- ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š” ๊ณณ์—์„œ ๋กœ์ง์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
Fault Tolerant
- Hadoop์€ x86 ์„œ๋ฒ„์— ์„ค์น˜ํ•› ์ˆ˜ ์žˆ์Šต๋‹ˆ
๋‹ค. (vs. ์œ ๋‹‰์Šค ์„œ๋ฒ„)
- ํ•˜๋“œ์›จ์–ด ์žฅ์• ๋Š” ํ”ผํ•› ์ˆ˜ ์—†๋‹ค๋Š” ๊ฐ€์ •ํ•˜์—
์„ค๊ณ„๋์Šต๋‹ˆ๋‹ค.
Scalable
- ์„œ๋ฒ„(๋…ธ๋“œ)๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด, ์šฉ๋Ÿ‰๊ณผ ์ปดํ“จํŒ… ์„ฑ
๋Šฅ์˜ ์„ ํ˜•์ ์ธ ํ™•์žฅ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
Hadoop์œผ๋กœ ์–ด๋–ป๊ฒŒ
์Šคํ† ๋ฆฌ์ง€ ์—”์ง‚์„ ๊ตฌ์„ฑํ•ด์•ผ
ํ•›๊นŒ?
๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ Hadoop์œผ
๋กœ ๋ชจ์•„์ค๋‹ˆ๋‹ค!
- DB ๋ฐ์ดํ„ฐ : Sqoop, SQL-on-hadoop
- Raw ํŒŒ์ผ : `cp` ๋ช…๋ น์–ด ํ•œ ์ค„์ด๋ฉด ๋!
- ๋กœ๊ทธ ์ˆ˜์ง‘ : Flume, Scribe, Chuckwa, โ€ฆ
์–ด๋–ป๊ฒŒ ๋ชจ์•„์ค„๊นŒ?
Case by Case
์ตœ์ข…์ ์œผ๋กœ ์ด๋Ÿฐ ๊ทธ๋ฆผ์ด
๋‚˜์˜ต๋‹ˆ๋‹ค.
Front-End
Analytics
Data WarehouseSource Data
OLTP
CRM
ERP
ecommerce
Other
ODS
(Operational Data
Store)
Data
Warehouse
Data Mart
OLAP
Visualizat
ion
ETL
ETL
ETL
Reports
Data
Mining
Hadoop ๊ธฐ๋ฐ˜ DW ์•„ํ‚คํ…์ฒ˜
๊ทธ๋Ÿฐ๋ฐ โ€ฆ
Front-End
Analytics
Data WarehouseSource Data
OLTP
CRM
ERP
ecommerce
Other
ODS
(Operational Data
Store)
Data
Warehouse
Data Mart
OLAP
Visualizat
ion
ETL
ETL
ETL
Reports
Data
Mining
ETL๊ณผ ๋ถ„์„์งˆ์˜๋Š” ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌํ•ด์•ผํ• ๊นŒ?
SQL ์—”์ง„์„
ํ™•์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•
SQL ์—”์ง‚์œผ๋กœ ์–ด๋–ค ์—”์ง‚์ด
์ ์ ˆํ•›๊นŒ?
Expanding Your Data Warehouse with Tajo
Tajo๋ž€ Hadoop ๊ธฐ๋ฐ˜์˜
๋น…๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค
์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค.
Tajo๋Š” ์•„ํŒŒ์น˜ ํƒ‘ ๋ ˆ๋ฒจ ํ”„
๋กœ์ ํŠธ์ด๋ฉฐ, ANSI SQL์„
์ง€์›ํ•˜๋ฉฐ, ์ž์ฒด ๋ถ„์‚ฐ ์ฒ˜๋ฆฌ
์—”์ง‚์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
Tajo ์•„ํ‚คํ…์ฒ˜
Tajo Master
Catalog Server
Tajo Master
Catalog Server
DBMS
HiveMetaStore
Tajo Master
Catalog Server
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
JDBC client
TSQLWebUI
REST API
Storage
Submit
a query
Manage
metadataAllocate
a query
Send tasks
& monitor
Send tasks
& monitor
Tajo์˜ ๋น„๊ต์šฐ์œ„๋Š”?
ANSI SQL ์ง€์› = ์ƒˆ๋กœ์šด ์—”์ง‚์—
๋Œ€ํ•š ํ•™์Šต ๋น„์šฉ์„ ์ตœ์†Œํ™”ํ•˜๊ณ , ๊ธฐ
์กด ์‹œ์Šคํ…œ์„ ์‰ฝ๊ฒŒ ์ „ํ™–ํ•› ์ˆ˜ ์žˆ์Šต
๋‹ˆ๋‹ค.
ํด๋Ÿฌ์Šคํ„ฐ ํ™•์žฅ์„ฑ = ๋…ธ๋“œ๋ฅผ ์ถ”๊ฐ€ํ•›
์ˆ˜๋ก ์„ ํ˜•์ ์ธ ์šฉ๋Ÿ‰ ๋ฐ ์„ฑ๋Šฅ ํ™•์žฅ
์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
๊ณ ์„ฑ๋Šฅ ๋ถ„์‚ฐ ์ฒ˜๋ฆฌ ์—”์ง‚ = ์ˆ˜์‹œ๊ฐ„
์ด์ƒ ์†Œ์š”๋˜๋Š” ETL ์งˆ์˜๋ถ€ํ„ฐ, ์ˆ˜๋ฐฑ
๋ฐ€๋ฆฌ ์„ธ์ปจ๋“œ๋‚ด์— ์ฒ˜๋ฆฌ๋˜๋Š”
Interactive ์งˆ์˜๊นŒ์ง€ ๋ชจ๋‘ ์ง€์›ํ•ฉ
๋‹ˆ๋‹ค.
๊ตฌ์ฒด์ ์ธ ์งˆ์˜ ์ฒ˜๋ฆฌ ์†๋„๋Š”?
โ€ข ์Šค์บ”์†๋„: ๋ฌผ๋ฆฌ์  ๋””์Šคํฌ๋‹น 100MB/sec (SATA ๊ธฐ์ค€)
โ€ข 1TB๋ฅผ 10 ์—ฌ๋Œ€์˜ ๋…ธ๋“œ๋กœ ์ฒ˜๋ฆฌ
- ๋…ธ๋“œ๋‹น 10 ์—ฌ๊ฐœ์˜ ๋””์Šคํฌ๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•จ
- ๊ฐ„๋‹จํ•œ aggregation ์ฟผ๋ฆฌ: 30์ดˆ ~ 1๋ถ„ ๋‚ด์™ธ
- ๊ฐ„๋‹จํ•œ join ์ฟผ๋ฆฌ: 1 ~ 2 ๋ถ„ ๋‚ด์™ธ
- ๋ณต์žกํ•œ join ๋ฐ distinct aggregation : ์ˆ˜ ๋ถ„์—์„œ 10์—ฌ๋ถ„
Tajo ์˜ ์ฃผ์š” ํŠน์ง•์€
๋ฌด์—‡์ด ์žˆ์„๊นŒ?
ํ’๋ถ€ํ•š SQL ์ง€์›
โ€ข ์งˆ์˜ ๋ถ„์‚ฐ ์ฒ˜๋ฆฌ
- Inner join, and left/right/full outer join
- GroupBy, sort, multiple distinct aggregation, window
function
โ€ข SQL ๋ฐ์ดํ„ฐ ํƒ€์ž…
- CHAR, BOOL, INT, DOUBLE, TEXT, DATE, Etc
โ€ข ๋‹ค์–‘ํ•œ ํŒŒ์ผ ํฌ๋งท
- Text file, SequenceFile, RCFile, ORC, Parquet, Avro
์ฟผ๋ฆฌ ์ตœ์ ํ™”
โ€ข Cost-based Join Optimization (Greedy Heuristic)
- ์‚ฌ์šฉ์ž๊ฐ€ ์ตœ์„ ์˜ Join ์ˆœ์„œ๋ฅผ ์ถ”์ธกํ•˜๋Š” ์ˆ˜๊ณ  ์ œ๊ฑฐ
โ€ข ํ™•์žฅ ๊ฐ€๋Šฅํ•œ rewrite rule ์—”์ง„
- rewrite rule ์ธํ„ฐํŽ˜์ด์Šค ์ œ๊ณต๊ณผ ๋‹ค์–‘ํ•œ ์œ ํ‹ธ๋ฆฌํ‹ฐ ์ œ๊ณต
โ€ข ์ฟผ๋ฆฌ ์ตœ์ ํ™” (Progressive Query Optimization)
- ์‹คํ–‰ ์‹œ๊ฐ„ ํ†ต๊ณ„ ์ˆ˜์ง‘
- ๋ถ„์‚ฐ ์ •๋ ฌ์„ ์œ„ํ•œ ๋ฒ”์œ„ ๋ถ„ํ•  (range partitioning)์˜ ์ ์ ƒํ•œ
ํŒŒํ‹ฐ์…˜ ๋ฒ”์œ„, ๊ฐœ์ˆ˜ ๋“ฑ์„ ๋Ÿฐํƒ€์ž„์— ์กฐ์ •
- ๋ถ„์‚ฐ Join, ๊ทธ๋ฃน๋ฐ”์ด๋ฅผ ์œ„ํ•œ ํŒŒํ‹ฐ์…˜ ๊ฐœ์ˆ˜๋ฅผ ๋Ÿฐํƒ€์ž„์— ์กฐ์ •
์ฟผ๋ฆฌ Federation ๋ฐ TableSpace ์ง€์›
โ€ข ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์†Œ์Šค๊ฐ„์˜ Join ๋ฐ Union ์ฟผ๋ฆฌ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
โ€ข ์žฅ์ 
- ๋ฐ์ดํ„ฐ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ : RDBMS ๏ƒ  ํ•˜๋‘ก
- ๊ธฐ์กด RBMS ๋ฐ์ดํ„ฐ์™€ ํ•˜๋‘ก ๋ฐ์ดํ„ฐ์˜ Join ์ฟผ๋ฆฌ ์ฒ˜๋ฆฌ
- SQL๋ฅผ ์ด์šฉํ•œ NoSQL ๋ฐ ๋‹ค์–‘ํ•œ ์Šคํ† ๋ฆฌ์ง€ (S3, Swift, HBase,
ElasticSearch, Kafka)
- SQL ๋„๊ตฌ๋ฅผ ์ด์šฉํ•œ ์ธํ„ฐํŽ˜์ด์Šค ํ‘œ์ค€ํ™”
HDFS NoSQL S3 Swift
Tajo
Nested ๋ฐ JSON ํฌ๋งท ์ง€์›
๋ณ„๋„ ๊ฐ€๊ณต์—†์ด Nested ๋ฐ JSON ํฌ๋งท ํŒŒ์ผ์˜
SQL ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
์ž…๋ ฅ ๋ฐ์ดํ„ฐ
ํ…Œ์ด๋ธ” ์ •์˜
SQL ๋ฌธ
ํŒŒํ‹ฐ์…˜ ํ…Œ์ด๋ธ”
โ€ข ํ…Œ์ด๋ธ”์ƒ์„ฑ์ฟผ๋ฆฌ๋ฌธ
CREATE TABLE student (
id INT,
name TEXT,
grade TEXT
) USING PARQUET
PARTITION BY COLUMN (country TEXT, city TEXT);
/tajo/warehouse/student/country=KOREA/city=SEOUL/
/tajo/warehouse/student/country=KOREA/city=PUSAN/
/tajo/warehouse/student/country=KOREA/city=INCHEON/
/tajo/warehouse/student/country=USA/city=NEWYORK/
/tajo/warehouse/student/country=USA/city=BOSTON/
. . .
Hive ํ˜ธํ™–์„ฑ ์ง€์›์„ ์œ„ํ•œ Column Value ๋ฐฉ์‹์˜ ํ…Œ์ด๋ธ” ํŒŒํ‹ฐ
์…˜ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
โ€ข ํŒŒํ‹ฐ์…˜๋””๋ ‰ํ„ฐ๋ฆฌ๊ตฌ์„ฑ
0.11.0 ๋ฒ„์ „์—์„œ๋Š”
โ€ข 2015๋…„ 10์›” ๋ฆด๋ฆฌ์ฆˆ
โ€ข ์ฃผ์š” ๊ธฐ๋Šฅ
- ๋‹ค์ค‘ ์ฟผ๋ฆฌ ๋™์‹œ ์‹คํ–‰ ์ง€์›
- Tablespace ๋ฐ JDBC Storage ์ง€์›
- Nested Record ํƒ€์ž… ์ง€์›
- JSON ๋“ฑ self-describing ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ schemaless ์ง€์›
- ORC ํŒŒ์ผ ์ง€์›
- JDBC ๋ฐ Client์˜ ResultSet fetch์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ
- Python UDF/UDAF ์ง€์›
- ํ–ฅ์ƒ๋œ ์กฐ์ธ ์ตœ์ ํ™”, ์ฟผ๋ฆฌ ์ฒ˜๋ฆฌ ์„ฑ๋Šฅ
- ๋ฐ˜์‘์†๋„ ํ–ฅ์ƒ ๋ฐ ๋ฒ„๊ทธ ์ˆ˜์ •
0.12.0 ๋ฒ„์ „์—์„œ๋Š”
โ€ข ์ฃผ์š” ๊ธฐ๋Šฅ
- YARN ์ง€์›
- ์‚ฌ์šฉ์ž ์ธ์ฆ ์ง€์›
- Scalar ๋ฐ Exist ์„œ๋ธŒ์ฟผ๋ฆฌ ์ง€์›
- ALTER TABLE ADD/DROP ํŒŒํ‹ฐ์…˜ ์ง€์›
- ํ•˜์ด๋ธŒ UDF ํ˜ธํ™–
- WITH ์ ƒ ์ง€์›
Tajo๋กœ ์–ด๋–ป๊ฒŒ
SQL ์—”์ง‚์„ ๊ตฌ์„ฑํ•ด์•ผํ•›๊นŒ?
- Hadoop ๊ฒฝ๋กœ๋ฅผ ์„ค์ •ํ•˜๊ณ ,
- DW์˜ Root ๊ฒฝ๋กœ๋ฅผ ์„ค์ •ํ•˜๊ณ ,
- ์–ด๋–ค DB๋ฅผ ์นดํƒˆ๋กœ๊ทธ๋กœ ์“ธ์ง€ ๊ฒฝ์ •ํ•˜๊ณ ,
- ์–ผ๋งˆ๋‚˜ ๋งŽ์€ Task๋ฅผ ๋™์‹œ์— ์‹คํ–‰ํ•›์ง€
๊ฒฐ์ •ํ•˜๋ฉดโ€ฆ
์ตœ์ข…์ ์œผ๋กœ ์ด๋Ÿฐ ๊ทธ๋ฆผ์ด
๋‚˜์˜ต๋‹ˆ๋‹ค.
Front-End
Analytics
Data WarehouseSource Data
OLTP
CRM
ERP
ecommerce
Other
ODS
(Operational Data
Store)
Data
Warehouse
Data Mart
OLAP
Visualizat
ion
ETL
ETL
ETL
Reports
Data
Mining
Tajo ๊ธฐ๋ฐ˜ DW ์•„ํ‚คํ…์ฒ˜
์‹ค์ œ ์ ์šฉ ์‚ฌ๋ก€
๋ฅผ ์•Œ์•„๋ด…์‹œ๋‹ค.
์ƒ์šฉ DW ๋Œ€์ฒด
โ€ข ๊ตญ๋‚ด ์ด๋™ํ†ต์‹  ์ ์œ ์œจ 1์œ„ ํšŒ์‚ฌ
- ETL ์ž‘์—… ๋Œ€์ฒด : ์ผ์ผ 4TB, 120๊ฐœ ์ด์ƒ ์งˆ์˜ ์ฒ˜๋ฆฌ
- OLAP ๋ถ„์„ ๋Œ€์ฒด: 500๊ฐœ ์ด์ƒ ์งˆ์˜ ์ฒ˜๋ฆฌ
โ€ข ๋„์ž… ํšจ๊ณผ
- ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•œ ์•„ํ‚คํ…์ฒ˜ ๊ฐ„์†Œํ™”
๏ƒ  DW ETL, OLAP, Hadoop ETL์„ ์œ„ํ•œ ์‹œ์Šคํ…œ ํ†ตํ•ฉ
- ์ €๋น„์šฉ์œผ๋กœ ์ƒ์šฉ ์ˆ˜์ค€์˜ SLA์™€ ๋ฐ์ดํ„ฐ ๋ณผ๋ฅจ ํ™•์žฅ
๏ƒ  ์ƒ์šฉ DW ๋ผ์ด์„ ์Šค ๋น„์šฉ ์ ƒ๊ฐ
๋ฐ์ดํ„ฐ Discovery
โ€ข ๊ตญ๋‚ด ์Œ์›์‹œ์žฅ ์ ์œ ์œจ 1์œ„ ํšŒ์‚ฌ
โ€ข 2,800๋งŒ ๊ณ ๊ฐ์˜ ์†Œ๋น„ ์ด๋ ฅ๊ณผ ํ™—๋™ ๋‚ด์—ญ ๋ถ„์„
โ€ข ๋„์ž… ํšจ๊ณผ
- Hive ์—์„œ Tajo๋กœ ๋ถ„์„ ์ž‘์—… ์ ‚ํ™–
๏ƒ  ์ตœ์†Œ 1.5๋ฐฐ์—์„œ ์ตœ๋Œ€ ์ˆ˜์‹ญ๋ฐฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ
- ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ Interactive ์งˆ์˜ ์ˆ˜ํ–‰
Cohort ๋ถ„์„
โ€ข ์Šค๋งˆํŠธํฐ ์ž ๊ธˆํ™”๋ฉด ๋ฆฌ์›Œ๋“œํ˜• ๊ด‘๊ณ  ์„œ๋น„์Šค
โ€ข S3์— ์ €์žฅ๋œ ์›๋ณธ ๋กœ๊ทธ์— ๋Œ€ํ•œ Cohort ๋ถ„์„
โ€ข ๋ถ„์„ ๊ฒฐ๊ณผ๋Š” RDS์— ์ €์žฅ
โ€ข ๋„์ž… ํšจ๊ณผ
- EC2 ์ธ์Šคํ„ด์Šค ์ŠคํŽ™ : c3.2xlarge
๏ƒ  vCPU : 8, ๋ฉ”๋ชจ๋ฆฌ: 15GB, HDD: 2 x 80GB
- EC2 ์ธ์Šคํ„ด์Šค 10๊ฐœ๋กœ ์ˆ˜์‹ญGB ๋กœ๊ทธ๋ฅผ ์•ฝ 40์ดˆ์— ์ฒ˜๋ฆฌ
๏ƒ  ์ด ๋น„์šฉ : 0.420 * 10 = 4.20 ๋‹ฌ๋Ÿฌ (ํ•œํ™”: 4756.08์›)
Welcome to Tajo
โ€ข Homepage: http://tajo.apache.org
โ€ข ํ•œ๊ตญ ํƒ€์กฐ ์‚ฌ์šฉ์ž ๊ทธ๋ฃน
โ€ข ๊ตฌ๊ธ€ ๊ทธ๋ฃน: https://groups.google.com/forum/#!forum/tajo-
user-kr
โ€ข ํŽ˜์ด์Šค๋ถ: https://www.facebook.com/groups/tajokorea
โ€ข ํƒ€์กฐ ํ•œ๊ธ€ ๋ฌธ์„œํ™” ํ”„๋กœ์ ํŠธ: http://bit.ly/1Ir417T
โ€ข ๊ธฐํƒ€ ์ฐธ๊ณ  ์‚ฌ์ดํŠธ
โ€ข http://www.gruter.com/blog/tag/apache-tajo
โ€ข http://teamblog.gruter.com/tag/apache-tajo
tajo> select question from you;
THANK YOU!

More Related Content

PPTX
Tajo and SQL-on-Hadoop in Tech Planet 2013
Gruter
ย 
PDF
Hadoop๊ณผ SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Matthew (์ •์žฌํ™”)
ย 
PDF
ํ•˜๋‘ก ์•Œ์•„๋ณด๊ธฐ(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
SeungYong Baek
ย 
PDF
แ„’แ…กแ„ƒแ…ฎแ†ธ แ„Œแ…ฉแ‡‚แ„‹แ…ณแ†ซแ„‹แ…ฃแ†จแ„‹แ…ตแ„Œแ…ตแ„†แ…กแ†ซ ๋งŒ๋ณ‘ํ†ต์น˜์•ฝ์€ ์•„๋‹ˆ๋‹ค
๋ฏผ์ฒ  ์ •๋ฏผ์ฒ 
ย 
PPTX
๋น…๋ฐ์ดํ„ฐ ๊ตฌ์ถ• ์‚ฌ๋ก€
Taehyeon Oh
ย 
PPTX
Introduction to Apache Tajo
Gruter
ย 
PDF
Hadoop ์ œ์ฃผ๋Œ€
DaeHeon Oh
ย 
PDF
GRUTER๊ฐ€ ๋“ค๋ ค์ฃผ๋Š” Big Data Platform ๊ตฌ์ถ• ์ „๋žต๊ณผ ์ ์šฉ ์‚ฌ๋ก€: Tajo์™€ SQL-on-Hadoop
Gruter
ย 
Tajo and SQL-on-Hadoop in Tech Planet 2013
Gruter
ย 
Hadoop๊ณผ SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)
Matthew (์ •์žฌํ™”)
ย 
ํ•˜๋‘ก ์•Œ์•„๋ณด๊ธฐ(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
SeungYong Baek
ย 
แ„’แ…กแ„ƒแ…ฎแ†ธ แ„Œแ…ฉแ‡‚แ„‹แ…ณแ†ซแ„‹แ…ฃแ†จแ„‹แ…ตแ„Œแ…ตแ„†แ…กแ†ซ ๋งŒ๋ณ‘ํ†ต์น˜์•ฝ์€ ์•„๋‹ˆ๋‹ค
๋ฏผ์ฒ  ์ •๋ฏผ์ฒ 
ย 
๋น…๋ฐ์ดํ„ฐ ๊ตฌ์ถ• ์‚ฌ๋ก€
Taehyeon Oh
ย 
Introduction to Apache Tajo
Gruter
ย 
Hadoop ์ œ์ฃผ๋Œ€
DaeHeon Oh
ย 
GRUTER๊ฐ€ ๋“ค๋ ค์ฃผ๋Š” Big Data Platform ๊ตฌ์ถ• ์ „๋žต๊ณผ ์ ์šฉ ์‚ฌ๋ก€: Tajo์™€ SQL-on-Hadoop
Gruter
ย 

What's hot (20)

PDF
์‹ค์‹œ๊ฐ„ ๋น… ๋ฐ์ดํ„ฐ ๊ธฐ์ˆ  ํ˜„ํ™ฉ ๋ฐ Daum ํ™œ์šฉ ์‚ฌ๋ก€ ์†Œ๊ฐœ (2013)
Channy Yun
ย 
PDF
์„œ์šธ ํ•˜๋‘ก ์‚ฌ์šฉ์ž ๋ชจ์ž„ ๋ฐœํ‘œ์ž๋ฃŒ
Teddy Choi
ย 
KEY
Distributed Programming Framework, hadoop
LGU+
ย 
PPTX
Hadoop administration
Ryan Guhnguk Ahn
ย 
PDF
Big data analysis with R and Apache Tajo (in Korean)
Gruter
ย 
PDF
์Šคํƒ€ํŠธ์—…์‚ฌ๋ก€๋กœ ๋ณธ ๋กœ๊ทธ ๋ฐ์ดํ„ฐ๋ถ„์„ : Tajo on AWS
Gruter
ย 
PDF
HDFS Overview
JEONGPHIL HAN
ย 
PDF
Tajo TPC-H Benchmark Test on AWS
Gruter
ย 
PDF
Hadoop overview
HyeonSeok Choi
ย 
PPTX
about hadoop yes
Eunsil Yoon
ย 
PPTX
An introduction to hadoop
MinJae Kang
ย 
PPTX
SQL-on-Hadoop with Apache Tajo, and application case of SK Telecom
Gruter
ย 
PPT
Hadoop Introduction (1.0)
Keeyong Han
ย 
PPTX
Vectorized processing in_a_nutshell_DeView2014
Gruter
ย 
PDF
[Open Technet Summit 2014] ์“ฐ๊ธฐ ์‰ฌ์šด Hadoop ๊ธฐ๋ฐ˜ ๋น…๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ ์•„ํ‚คํ…์ฒ˜ ๋ฐ ํ™œ์šฉ ๋ฐฉ์•ˆ
์น˜์™„ ๋ฐ•
ย 
PDF
GRUTER๊ฐ€ ๋“ค๋ ค์ฃผ๋Š” Big Data Platform ๊ตฌ์ถ• ์ „๋žต๊ณผ ์ ์šฉ ์‚ฌ๋ก€: ์ธํ„ฐ๋„ท ์‡ผํ•‘๋ชฐ์˜ ์‹ค์‹œ๊ฐ„ ๋ถ„์„ ํ”Œ๋žซํผ ๊ตฌ์ถ• ์‚ฌ๋ก€
Gruter
ย 
PDF
HBase ํ›‘์–ด๋ณด๊ธฐ
beom kyun choi
ย 
PDF
ํ•˜๋‘ก (Hadoop) ๋ฐ ๊ด€๋ จ๊ธฐ์ˆ  ํ›‘์–ด๋ณด๊ธฐ
beom kyun choi
ย 
PDF
Spark_Overview_qna
ํ˜„์ฒ  ๋ฐ•
ย 
PPTX
Gruter TECHDAY 2014 MelOn BigData
Gruter
ย 
์‹ค์‹œ๊ฐ„ ๋น… ๋ฐ์ดํ„ฐ ๊ธฐ์ˆ  ํ˜„ํ™ฉ ๋ฐ Daum ํ™œ์šฉ ์‚ฌ๋ก€ ์†Œ๊ฐœ (2013)
Channy Yun
ย 
์„œ์šธ ํ•˜๋‘ก ์‚ฌ์šฉ์ž ๋ชจ์ž„ ๋ฐœํ‘œ์ž๋ฃŒ
Teddy Choi
ย 
Distributed Programming Framework, hadoop
LGU+
ย 
Hadoop administration
Ryan Guhnguk Ahn
ย 
Big data analysis with R and Apache Tajo (in Korean)
Gruter
ย 
์Šคํƒ€ํŠธ์—…์‚ฌ๋ก€๋กœ ๋ณธ ๋กœ๊ทธ ๋ฐ์ดํ„ฐ๋ถ„์„ : Tajo on AWS
Gruter
ย 
HDFS Overview
JEONGPHIL HAN
ย 
Tajo TPC-H Benchmark Test on AWS
Gruter
ย 
Hadoop overview
HyeonSeok Choi
ย 
about hadoop yes
Eunsil Yoon
ย 
An introduction to hadoop
MinJae Kang
ย 
SQL-on-Hadoop with Apache Tajo, and application case of SK Telecom
Gruter
ย 
Hadoop Introduction (1.0)
Keeyong Han
ย 
Vectorized processing in_a_nutshell_DeView2014
Gruter
ย 
[Open Technet Summit 2014] ์“ฐ๊ธฐ ์‰ฌ์šด Hadoop ๊ธฐ๋ฐ˜ ๋น…๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ ์•„ํ‚คํ…์ฒ˜ ๋ฐ ํ™œ์šฉ ๋ฐฉ์•ˆ
์น˜์™„ ๋ฐ•
ย 
GRUTER๊ฐ€ ๋“ค๋ ค์ฃผ๋Š” Big Data Platform ๊ตฌ์ถ• ์ „๋žต๊ณผ ์ ์šฉ ์‚ฌ๋ก€: ์ธํ„ฐ๋„ท ์‡ผํ•‘๋ชฐ์˜ ์‹ค์‹œ๊ฐ„ ๋ถ„์„ ํ”Œ๋žซํผ ๊ตฌ์ถ• ์‚ฌ๋ก€
Gruter
ย 
HBase ํ›‘์–ด๋ณด๊ธฐ
beom kyun choi
ย 
ํ•˜๋‘ก (Hadoop) ๋ฐ ๊ด€๋ จ๊ธฐ์ˆ  ํ›‘์–ด๋ณด๊ธฐ
beom kyun choi
ย 
Spark_Overview_qna
ํ˜„์ฒ  ๋ฐ•
ย 
Gruter TECHDAY 2014 MelOn BigData
Gruter
ย 
Ad

Viewers also liked (18)

PDF
์Šคํƒ€ํŠธ์—… ์‚ฌ๋ก€๋กœ ๋ณธ ๋กœ๊ทธ ๋ฐ์ดํ„ฐ ๋ถ„์„ : Tajo on AWS
Matthew (์ •์žฌํ™”)
ย 
PDF
Fingra.ph แ„‡แ…ตแ†จแ„ƒแ…ฆแ„‹แ…ตแ„แ…ฅ แ„‰แ…ตแ„€แ…กแ†จแ„’แ…ช
Fingra.ph
ย 
PDF
๋ฐ์ดํ„ฐ๋ถ„์„๊ณผ ์ €๋„๋ฆฌ์ฆ˜ 5์žฅ(๋’ท๋ถ€๋ถ„)
Yerim An
ย 
PDF
NewSQL Database Overview
Steve Min
ย 
PPTX
Html5 video
Steve Min
ย 
PDF
[SSA] 03.newsql database (2014.02.05)
Steve Min
ย 
PDF
Apache Spark Overview part1 (20161107)
Steve Min
ย 
PPTX
Cloud Music v1.0
Steve Min
ย 
PDF
๋น…๋ฐ์ดํ„ฐ_ISP์ˆ˜์—…
jrim Choi
ย 
PPTX
Cloud Computing v1.0
Steve Min
ย 
PDF
Scala overview
Steve Min
ย 
PDF
BigData, Hadoop๊ณผ Node.js
๊ณ ํฌ๋ฆฟ default
ย 
PDF
Apache Htrace overview (20160520)
Steve Min
ย 
PPTX
vertica_tmp_4.5
Hwang Andrew
ย 
PDF
Apache Spark Overview part2 (20161117)
Steve Min
ย 
PDF
[SSA] 04.sql on hadoop(2014.02.05)
Steve Min
ย 
PDF
RESTful API Design, Second Edition
Apigee | Google Cloud
ย 
PDF
[SSA] 01.bigdata database technology (2014.02.05)
Steve Min
ย 
์Šคํƒ€ํŠธ์—… ์‚ฌ๋ก€๋กœ ๋ณธ ๋กœ๊ทธ ๋ฐ์ดํ„ฐ ๋ถ„์„ : Tajo on AWS
Matthew (์ •์žฌํ™”)
ย 
Fingra.ph แ„‡แ…ตแ†จแ„ƒแ…ฆแ„‹แ…ตแ„แ…ฅ แ„‰แ…ตแ„€แ…กแ†จแ„’แ…ช
Fingra.ph
ย 
๋ฐ์ดํ„ฐ๋ถ„์„๊ณผ ์ €๋„๋ฆฌ์ฆ˜ 5์žฅ(๋’ท๋ถ€๋ถ„)
Yerim An
ย 
NewSQL Database Overview
Steve Min
ย 
Html5 video
Steve Min
ย 
[SSA] 03.newsql database (2014.02.05)
Steve Min
ย 
Apache Spark Overview part1 (20161107)
Steve Min
ย 
Cloud Music v1.0
Steve Min
ย 
๋น…๋ฐ์ดํ„ฐ_ISP์ˆ˜์—…
jrim Choi
ย 
Cloud Computing v1.0
Steve Min
ย 
Scala overview
Steve Min
ย 
BigData, Hadoop๊ณผ Node.js
๊ณ ํฌ๋ฆฟ default
ย 
Apache Htrace overview (20160520)
Steve Min
ย 
vertica_tmp_4.5
Hwang Andrew
ย 
Apache Spark Overview part2 (20161117)
Steve Min
ย 
[SSA] 04.sql on hadoop(2014.02.05)
Steve Min
ย 
RESTful API Design, Second Edition
Apigee | Google Cloud
ย 
[SSA] 01.bigdata database technology (2014.02.05)
Steve Min
ย 
Ad

Similar to Expanding Your Data Warehouse with Tajo (20)

PDF
๋น…๋ฐ์ดํ„ฐ, big data
H K Yoon
ย 
PDF
Apache hive
HyeonSeok Choi
ย 
PDF
Big data 20111203_๋ฐฐํฌํŒ
Hyoungjun Kim
ย 
PDF
๋ฐ์ดํ„ฐ ๋ ˆ์ดํฌ ์•Œ์•„๋ณด๊ธฐ(Learn about Data Lake)
SeungYong Baek
ย 
PDF
[2015 07-06-์œค์„์ค€] Oracle ์„ฑ๋Šฅ ์ตœ์ ํ™” ๋ฐ ํ’ˆ์งˆ ๊ณ ๋„ํ™” 4
Seok-joon Yun
ย 
PDF
๋น…๋ฐ์ดํ„ฐ ๊ธฐ์ˆ  ํ˜„ํ™ฉ๊ณผ ์‹œ์žฅ ์ „๋ง(2014)
Channy Yun
ย 
PDF
2012 ๋น…๋ฐ์ดํ„ฐ big data ๋ฐœํ‘œ์ž๋ฃŒ
Wooseung Kim
ย 
PDF
Jco ์†Œ์…œ ๋น…๋ฐ์ดํ„ฐ_20120218
Hyoungjun Kim
ย 
PPTX
Tajo korea meetup oct 2015-spatial tajo
BD
ย 
PDF
[Td 2015]microsoft ๊ฐœ๋ฐœ์ž๋“ค์„ ์œ„ํ•œ ๋‹ฌ์ฝคํ•œ hadoop, hd insight(์ตœ์ข…์šฑ)
Sang Don Kim
ย 
PPTX
์—”์ง€๋‹ˆ์–ด ๊ด€์ ์—์„œ ๋ฐ”๋ผ๋ณธ ๋ฐ์ดํ„ฐ์‹œ๊ฐํ™”
Kenneth Ceyer
ย 
PPTX
Apache spark ์†Œ๊ฐœ ๋ฐ ์‹ค์Šต
๋™ํ˜„ ๊ฐ•
ย 
PPTX
Big data application architecture ์š”์•ฝ2
Seong-Bok Lee
ย 
PDF
AWS๋ฅผ ํ†ตํ•œ ๋น…๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๋น„์ง€๋‹ˆ์Šค ์ธํ…”๋ฆฌ์ „์Šค ๊ตฌ์ถ•- AWS Summit Seoul 2017
Amazon Web Services Korea
ย 
PDF
Accelerate spring boot application with apache ignite
YEON BOK LEE
ย 
PDF
SQL-on-Hadoop ๊ทธ๋ฆฌ๊ณ  Tajo - Tech Planet 2013
Hyunsik Choi
ย 
PDF
Object storage์˜ ์ดํ•ด์™€ ํ™œ์šฉ
Seoro Kim
ย 
PDF
Daumโ€™s Business Analytics Use-cases based on Bigdata technology (2012)
Channy Yun
ย 
PDF
AWS ํ™œ์šฉํ•œ Data Lake ๊ตฌ์„ฑํ•˜๊ธฐ
Nak Joo Kwon
ย 
PDF
DB๊ด€์ ์—์„œ ๋ณธ ๋น…๋ฐ์ดํ„ฐ (2019๋…„ 8์›”)
Kee Hoon Lee
ย 
๋น…๋ฐ์ดํ„ฐ, big data
H K Yoon
ย 
Apache hive
HyeonSeok Choi
ย 
Big data 20111203_๋ฐฐํฌํŒ
Hyoungjun Kim
ย 
๋ฐ์ดํ„ฐ ๋ ˆ์ดํฌ ์•Œ์•„๋ณด๊ธฐ(Learn about Data Lake)
SeungYong Baek
ย 
[2015 07-06-์œค์„์ค€] Oracle ์„ฑ๋Šฅ ์ตœ์ ํ™” ๋ฐ ํ’ˆ์งˆ ๊ณ ๋„ํ™” 4
Seok-joon Yun
ย 
๋น…๋ฐ์ดํ„ฐ ๊ธฐ์ˆ  ํ˜„ํ™ฉ๊ณผ ์‹œ์žฅ ์ „๋ง(2014)
Channy Yun
ย 
2012 ๋น…๋ฐ์ดํ„ฐ big data ๋ฐœํ‘œ์ž๋ฃŒ
Wooseung Kim
ย 
Jco ์†Œ์…œ ๋น…๋ฐ์ดํ„ฐ_20120218
Hyoungjun Kim
ย 
Tajo korea meetup oct 2015-spatial tajo
BD
ย 
[Td 2015]microsoft ๊ฐœ๋ฐœ์ž๋“ค์„ ์œ„ํ•œ ๋‹ฌ์ฝคํ•œ hadoop, hd insight(์ตœ์ข…์šฑ)
Sang Don Kim
ย 
์—”์ง€๋‹ˆ์–ด ๊ด€์ ์—์„œ ๋ฐ”๋ผ๋ณธ ๋ฐ์ดํ„ฐ์‹œ๊ฐํ™”
Kenneth Ceyer
ย 
Apache spark ์†Œ๊ฐœ ๋ฐ ์‹ค์Šต
๋™ํ˜„ ๊ฐ•
ย 
Big data application architecture ์š”์•ฝ2
Seong-Bok Lee
ย 
AWS๋ฅผ ํ†ตํ•œ ๋น…๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๋น„์ง€๋‹ˆ์Šค ์ธํ…”๋ฆฌ์ „์Šค ๊ตฌ์ถ•- AWS Summit Seoul 2017
Amazon Web Services Korea
ย 
Accelerate spring boot application with apache ignite
YEON BOK LEE
ย 
SQL-on-Hadoop ๊ทธ๋ฆฌ๊ณ  Tajo - Tech Planet 2013
Hyunsik Choi
ย 
Object storage์˜ ์ดํ•ด์™€ ํ™œ์šฉ
Seoro Kim
ย 
Daumโ€™s Business Analytics Use-cases based on Bigdata technology (2012)
Channy Yun
ย 
AWS ํ™œ์šฉํ•œ Data Lake ๊ตฌ์„ฑํ•˜๊ธฐ
Nak Joo Kwon
ย 
DB๊ด€์ ์—์„œ ๋ณธ ๋น…๋ฐ์ดํ„ฐ (2019๋…„ 8์›”)
Kee Hoon Lee
ย 

Expanding Your Data Warehouse with Tajo