SlideShare a Scribd company logo
Efficient  In-­‐situ  Processing  of  
Various  Storage  Types  on  
Apache  Tajo
Hadoop  Summit  2015  San  Jose
Hyunsik Choi,  Gruter Inc.
Agenda
• Tajo  Overview
• Various  Storage  Support
• Motivation
• Design  Consideration
• What  we  did/are  doing
An  overview  of  Apache  Tajo
Tajo:  A  Data  Warehouse  System
• Data  Warehouse  System
• Apache  Top-­‐level  project
• Low  latency,  and  long  running   batch  queries  in  a  single  system
• ~100  ms up  to  several  hours
• Fault  tolerance
• Features
• ANSI  SQL  compliance
• Mature  SQL  features:  Joins,  Group  by,  Sort,  Multiple  distinct  aggregations  and    Window  function
• Partitioned  table  support
• Java/Python  UDF  support
• JDBC  driver  and  Java-­‐based  asynchronous  API
• SQL  data  type  and  Nested  type  support
• Direct  JSON  support
Master
 Server
TajoMaster
Slave Server
TajoWorker
QueryMaster
Local
 Query
 Engine
StorageManager
Local
FileSystem
HDFS
Client
JDBC SQL
 Shell Web
 UI
Slave
 Server
TajoWorker
QueryMaster
Local
 Query
 Engine
StorageManager
Local
FileSystem
HDFS
Slave
 Server
TajoWorker
QueryMaster
Local
 Query
 Engine
StorageManager
Local
FileSystem
HDFS
CatalogStore
DBMS
 (MySQL,
 ..)
Hive
 Meta
 StoreSubmit
 
 a
 query
Manage
 metadata
Allocate
 a

More Related Content

What's hot (20)

PDF
HBaseCon 2015- HBase @ Flipboard
Matthew Blair
 
PDF
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 
PDF
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
PPTX
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
Cloudera, Inc.
 
PDF
Tajo: A Distributed Data Warehouse System for Hadoop
Hyunsik Choi
 
PPTX
Apache Spark on Apache HBase: Current and Future
HBaseCon
 
PPTX
Apache phoenix
Osama Hussein
 
PPTX
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 
PPTX
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon
 
PDF
Hadoop User Group - Status Apache Drill
MapR Technologies
 
PPT
Pnuts Review
Ruchika Mehresh
 
PDF
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera, Inc.
 
PPTX
HBaseCon 2013: Full-Text Indexing for Apache HBase
Cloudera, Inc.
 
PPTX
HBase in Practice
DataWorks Summit/Hadoop Summit
 
PPTX
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
PDF
Optimizing Hive Queries
Owen O'Malley
 
PPTX
Architecting Applications with Hadoop
markgrover
 
PPTX
NoSQL: Cassadra vs. HBase
Antonio Severien
 
PDF
Optimizing Hive Queries
DataWorks Summit
 
PDF
Apache Big Data EU 2015 - HBase
Nick Dimiduk
 
HBaseCon 2015- HBase @ Flipboard
Matthew Blair
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
Cloudera, Inc.
 
Tajo: A Distributed Data Warehouse System for Hadoop
Hyunsik Choi
 
Apache Spark on Apache HBase: Current and Future
HBaseCon
 
Apache phoenix
Osama Hussein
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon
 
Hadoop User Group - Status Apache Drill
MapR Technologies
 
Pnuts Review
Ruchika Mehresh
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera, Inc.
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
Cloudera, Inc.
 
HBase in Practice
DataWorks Summit/Hadoop Summit
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
Optimizing Hive Queries
Owen O'Malley
 
Architecting Applications with Hadoop
markgrover
 
NoSQL: Cassadra vs. HBase
Antonio Severien
 
Optimizing Hive Queries
DataWorks Summit
 
Apache Big Data EU 2015 - HBase
Nick Dimiduk
 

Similar to Efficient in situ processing of various storage types on apache tajo (20)

PPTX
Efficient In-situ Processing of Various Storage Types on Apache Tajo
DataWorks Summit
 
PDF
Introduction to Apache Tajo: Data Warehouse for Big Data
Gruter
 
PDF
Introduction to Apache Tajo: Data Warehouse for Big Data
Jihoon Son
 
PDF
What's New Tajo 0.10 and Its Beyond
Gruter
 
PDF
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Data Con LA
 
PDF
HBase ArcheTypes
Matteo Bertozzi
 
PDF
Optimization on Key-value Stores in Cloud Environment
Fei Dong
 
PDF
Intro to HBase - Lars George
JAX London
 
PDF
Hbase mhug 2015
Joseph Niemiec
 
PPTX
Hbasepreso 111116185419-phpapp02
Gokuldas Pillai
 
PPTX
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Arseny Chernov
 
PPTX
The Challenges of SQL on Hadoop
DataWorks Summit
 
PDF
Nyc hadoop meetup introduction to h base
智杰 付
 
PDF
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Data Con LA
 
PPTX
Introduction to Apache HBase
Gokuldas Pillai
 
PDF
Apache Tajo - An open source big data warehouse
hadoopsphere
 
PPTX
HBase in Practice
larsgeorge
 
PDF
HBase for Architects
Nick Dimiduk
 
PPTX
HBase Introduction
Hanborq Inc.
 
PDF
Sept 17 2013 - THUG - HBase a Technical Introduction
Adam Muise
 
Efficient In-situ Processing of Various Storage Types on Apache Tajo
DataWorks Summit
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Gruter
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Jihoon Son
 
What's New Tajo 0.10 and Its Beyond
Gruter
 
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Data Con LA
 
HBase ArcheTypes
Matteo Bertozzi
 
Optimization on Key-value Stores in Cloud Environment
Fei Dong
 
Intro to HBase - Lars George
JAX London
 
Hbase mhug 2015
Joseph Niemiec
 
Hbasepreso 111116185419-phpapp02
Gokuldas Pillai
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Arseny Chernov
 
The Challenges of SQL on Hadoop
DataWorks Summit
 
Nyc hadoop meetup introduction to h base
智杰 付
 
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Data Con LA
 
Introduction to Apache HBase
Gokuldas Pillai
 
Apache Tajo - An open source big data warehouse
hadoopsphere
 
HBase in Practice
larsgeorge
 
HBase for Architects
Nick Dimiduk
 
HBase Introduction
Hanborq Inc.
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Adam Muise
 
Ad

Recently uploaded (20)

PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
Ad

Efficient in situ processing of various storage types on apache tajo