SlideShare a Scribd company logo
The Convergence of Reporting
and Interactive BI on Hadoop
Gustavo Arocena
June 19, 2018
Db2 Big SQL
IBM Canada Lab
Toronto
SQL-based Interactive BI on Hadoops
3
This Photo by Unknown Author is licensed under CC BY
SQL-based Interactive BI on Hadoop
4
Time2008 2011 2011 2012
The good old times
EDW
Analytic DB
Not there yet
SQL on Hadoop
EDW HDFS
Analytic DB
“Big Data”
It works, but …
SQL on Hadoop
EDW HDFS
Analytic DB
“Big Data”
Everyone Happy?
BI Accelerator
EDW HDFS
Analytic DB SQL on Hadoop
“Big Data”
• Enable offloading of Interactive BI
to Hadoop
• Interactive BI on Big Data
• Varying degree of autonomics
(auto-creation, auto-refresh)
• Fast response for analytic queries
SELECT p.category, max(s.amount)
FROM products p, sales s
WHERE p.id = s.pid
GROUP BY p.category
BI Accelerators – The Value Prop
5
This Photo by Unknown Author is licensed under CC BY-SA
BI Accelerators – The Small Print
6
• Duplication
• Licensing
• Skills
• Vendors for service/support
• Complexity
• Data architecture
• Data copying & refreshing
• Narrow scope
• Only repetitive, tool-generated queries
• Low integration with Hadoop platform
BI Accelerator
EDW HDFS
Analytic DB SQL on Hadoop
“Big Data”
BI Acceleration Techniques
7
CREATE HADOOP TABLE sales
(id integer,
city string,
amount double)
SELECT sum(amount)
FROM sales
WHERE amount < 500
AND city = ‘Toronto’
Columnar Storage
Cubing Indexing
Cache
1st use
2nd use
Caching
• Data
• Columnar stats
• Query results
Why Not in SQL on Hadoop ?
8
Interactive
BI
Concurrent
workloads
Enterprise features
Core SQL processing
SQL on Hadoop Maturity Levels
6-7 years
Reducing IO and Computation
9
2020? Cost-based optimization
Partitioning
Columnar storage
Cubing Caching Indexing
BI Accelerators
SQL on Hadoop
2018
Partitioning
Columnar storage
Cubing Caching Indexing
BI AcceleratorsSQL on Hadoop
Cost-based optimization
Partitioning
Columnar storage
Cubing
2014 Caching Indexing
SQL on Hadoop
BI Accelerators
Cost-based optimization
The Convergence
10
In memory
caching
Time
SQL on Hadoop
Partitioning
Cost-based
optimization
Columnar
storage
BI performance
BI Accelerators
Cubing
Indexing
~ 2012
On disk
caching
~ 2009 ~ 2020
11
Jethro AtScale Kylin
Engine Multiple instances of single node SMP engine Not an engine MOLAP engine, storing cube cells in HBase
Acceleration
techniques
Indexing
Cubing
Caching
Cubing
Caching
Approximate answers (e.g. count distinct)
Cubing
Cost-based Optimizer
Unique
features
• Computes cubes “bottom up” on demand
• Creates inverted indexes for all columns
• Re-ingests all the data into proprietary fmt
• Imposes star schema on all data
• Automatic and manual cubes
• Uses another engine to execute queries
• Brute force cube building
• Routes query to Hive when not in cube
• Uses Spark to speed up cube building
EDW HDFS
Analytic DB
EDW HDFS
Analytic DB Spark Hive HBase
EDW HDFS
Analytic DB Hive/SparkSQL/…
Using MPP Engines for Interactive BI
12
“MPP is a parallel architecture. Full scans are the
worst-case scenario, not the norm”
“You need scans for queries that can’t be answered
using cube/cache/index”
“MPP engines scale better than non-MPP ones”
This Photo by Unknown Author is licensed under CC BY-SA
“MPP is a full scan architecture”
“You can do BI on Hadoop without table scans”
“MPP SQL engines do not scale to many users”
This Photo by Unknown Author is licensed under CC BY-SA
IBM Db2 Big SQL
13
Top performance on complex workloads
No Lock-In
Reporting AND Interactive BI
Built-in federation to Oracle, Netezza, Db2
Workload management
Big SQL Head
Big SQL WorkerBig SQL WorkerBig SQL WorkerBig SQL WorkerBig SQL Worker
HDFS
Hive MS
Hadoop NN
Deep Platform Integration with no Lock-In
Big SQL Performance and Resource Utilization on Complex Workloads
14
Hadoop DS @ 100TB, 4 concurrent streams
13.7
43.2
BIG SQL SPARK SQL
Hours
Elapsed Time
76.4
88.2
BIG SQL SPARK SQL
%
CPU Utilization
107
388
BIG SQL SPARK SQL
MB/Sec
Disk Reads
25
237
BIG SQL SPARK SQL
MB/Sec
Disk Writes
- 15%
1/3
1/3 1/9
https://developer.ibm.com/hadoop/2017/02/07/experiences-comparing-big-sql-and-spark-sql-at-100tb/
Accelerating BI with MQTs
15
CREATE HADOOP TABLE joinMQT AS (
SELECT p_type, p_color, lo_quantity
FROM lineorder, part, dwdate
WHERE p_partkey = lo_partkey
AND lo_orderdate = d_datekey
AND d_year BETWEEN 2000 AND 2010)
PARTITION BY d_year
SORT BY p_type
STORED AS ORC
DATA INITIALLY DEFERRED…;
SELECT AVG(lo_quantity), p_color
FROM lineorder, part, dwdate
WHERE year = 2007
AND p_type = ‘outdoor’
SELECT sum(lo_revenue)
FROM lineorder, customer
WHERE c_custokey = lo_custkey
AND c_city = ‘Toronto’
CREATE TABLE aggMQT AS (
SELECT sum(lo_revenue), c_city
FROM lineorder, customer
WHERE c_custokey = lo_custkey
GROUP BY c_city
DATA INITIALLY DEFERRED…;
Hadoop MQT for join (denormalization) Native MQT for aggregation (cubing)
• High Cardinality
• Stored on HDFS as ORC
• Partitioned
• Sorted for PPD
• Low Cardinality
• Stored on head node in “native” format
• Can be indexed (turns MQT into true cube)
CREATE UNIQUE INDEX aggMQTidx ON
aggMQT(c_city);
• Answered using joinMQT • Answered using aggMQT and aggMQTidx
Speeding Up Dashboards in Big SQL
16
1
• Tune Big SQL, partition data, use ORC format
2
• Copy a fraction/sample of the data to BI tool (e.g. using
Tableau extracts)
3
• Prototype Dashboard using sample data, to get interactive
response during design/prototyping
4
• Once Dashboard is stable, export SQL queries from BI tool
5
• Create necessary MQTs and indexes to speed up the
dashboard queries, to get interactive response in production
6
• Point BI tool to Big SQL (ODBC)
7
• Run Dashboard in Production
• Interactive during design ≠ interactive in production
• Speed up dashboard design by
using just a fraction of the
data
• Speed up production version
using MQTs and indexes
STEPS
SQL on Hadoop in 2018
17
Big SQL
EDW HDFS
Analytic DB
“Big Data”
2018
VS.
SQL on Hadoop
EDW HDFS
Analytic DB
“Big Data”
2012
Big SQL vs BI Accelerators
18
Big SQL BI Accelerators
Reporting queries
 
Predictable
interactive queries
 
Complex hand-
written queries
 
One-off queries
 
Heavy workloads
 
Integration with
Hadoop ecosystem
 
Auto-cubes
 
Full indexing
 
Options for Interactive BI on Hadoop in 2018
19
• “Upload” to Analytic DB
• Expensive
• Painful
• BI Accelerator
• Duplication
• Complexities
• Narrow scope
• Lack of platform integration
• SQL on Hadoop
• Autonomics
The picture above by Unknown Author is licensed under CC BY-NC
Big SQL Roadmap
20
• Caching
• Autonomics
• Security & Governance (Ranger/Atlas)
• Star schema joins
• Interoperability with Hive ACID
• Integration with HDP 3.0 (Ambari/YARN)
Conclusions
21
• Understand BI Acceleration techniques and trade-offs
• SQL on Hadoop 2018
• Cubing, indexing, caching and more
• Reporting AND Interactive BI in a single engine
• SQL on Hadoop still evolving fast!
The convergence of reporting and interactive BI on Hadoop
23
Information in these presentations (including information relating to products that have not yet been
announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include
unintentional technical or typographical errors. IBM shall have no responsibility to update this information.
This document is distributed “as is” without any warranty, either express or implied. In no event,
shall IBM be liable for any damage arising from the use of this information, including but not
limited to, loss of data, business interruption, loss of profit or loss of opportunity.
IBM products and services are warranted per the terms and conditions of the agreements under which
they are provided.
IBM products are manufactured from new parts or new and used parts.
In some cases, a product may not be new and may have been previously installed. Regardless, our
warranty terms apply.”
Any statements regarding IBM's future direction, intent or product plans are subject to change or
withdrawal without notice.
Performance data contained herein was generally obtained in a controlled,
isolated environments. Customer examples are presented as illustrations of how those
customers have used IBM products and the results they may have achieved. Actual performance, cost,
savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to
make such products, programs or services available in all countries in which IBM operates or does
business.
Workshops, sessions and associated materials may have been prepared by independent session
speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for
informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or
advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain
advice of competent legal counsel as to the identification and interpretation of any relevant laws and
regulatory requirements that may affect the customer’s business and any actions the customer may need
to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its
services or products will ensure that the customer follows any law.
Notices and Disclaimers
Information concerning non-IBM products was obtained from the suppliers of those products,
their published announcements or other publicly available sources. IBM has not tested
those products about this publication and cannot confirm the accuracy of performance,
compatibility or any other claims related to non-IBM products. Questions on the capabilities of
non-IBM products should be addressed to the suppliers of those products. IBM does not
warrant the quality of any third-party products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or
implied, including but not limited to, the implied warranties of merchantability and
fitness for a purpose.
The provision of the information contained herein is not intended to, and does not, grant any
right or license under any IBM patents, copyrights, trademarks or other intellectual
property right.
IBM, the IBM logo, ibm.com and Big SQL are trademarks of International Business Machines
Corporation, registered in many jurisdictions worldwide. Other product and service names
might be trademarks of IBM or other companies. A current list of IBM trademarks is available
on the Web at "Copyright and trademark information"
at: www.ibm.com/legal/copytrade.shtml
24
Ad

More Related Content

What's hot (20)

Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
DataWorks Summit/Hadoop Summit
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
DataWorks Summit
 
On Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and AmbariOn Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and Ambari
DataWorks Summit/Hadoop Summit
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
DataWorks Summit
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
DataWorks Summit
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OS
Cuneyt Goksu
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data Store
Cloudera, Inc.
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
DataWorks Summit
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Khalid Salama
 
Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...
DataWorks Summit
 
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
DataWorks Summit
 
Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...
DataWorks Summit
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
DataWorks Summit
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at Walgreens
DataWorks Summit
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
John Archer
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
DataWorks Summit/Hadoop Summit
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
DataWorks Summit
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
DataWorks Summit
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
DataWorks Summit
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OS
Cuneyt Goksu
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data Store
Cloudera, Inc.
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
DataWorks Summit
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Khalid Salama
 
Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...
DataWorks Summit
 
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
DataWorks Summit
 
Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...
DataWorks Summit
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
DataWorks Summit
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at Walgreens
DataWorks Summit
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
John Archer
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 

Similar to The convergence of reporting and interactive BI on Hadoop (20)

The Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on HadoopThe Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on Hadoop
DataWorks Summit
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and Analytics
Seeling Cheung
 
Benchmarking Hadoop - Which hadoop sql engine leads the herd
Benchmarking Hadoop - Which hadoop sql engine leads the herdBenchmarking Hadoop - Which hadoop sql engine leads the herd
Benchmarking Hadoop - Which hadoop sql engine leads the herd
Gord Sissons
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
ModusOptimum
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Cynthia Saracco
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Rizaldy Ignacio
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
Gustav Lundström
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
Indrajit Poddar
 
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
DataWorks Summit
 
Still on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youStill on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for you
ModusOptimum
 
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Visual_BI
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
IBM Cloud Data Services
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
Arcadia Data
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
DataWorks Summit/Hadoop Summit
 
10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16 10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16
Kangaroot
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
 
Informix REST API Tutorial
Informix REST API TutorialInformix REST API Tutorial
Informix REST API Tutorial
Brian Hughes
 
SQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterSQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data Cluster
Maximiliano Accotto
 
Analyzing GeoSpatial data with IBM Cloud Data Services & Esri ArcGIS
Analyzing GeoSpatial data with IBM Cloud Data Services & Esri ArcGISAnalyzing GeoSpatial data with IBM Cloud Data Services & Esri ArcGIS
Analyzing GeoSpatial data with IBM Cloud Data Services & Esri ArcGIS
IBM Cloud Data Services
 
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
Torsten Steinbach
 
The Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on HadoopThe Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on Hadoop
DataWorks Summit
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and Analytics
Seeling Cheung
 
Benchmarking Hadoop - Which hadoop sql engine leads the herd
Benchmarking Hadoop - Which hadoop sql engine leads the herdBenchmarking Hadoop - Which hadoop sql engine leads the herd
Benchmarking Hadoop - Which hadoop sql engine leads the herd
Gord Sissons
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Cynthia Saracco
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Rizaldy Ignacio
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
Indrajit Poddar
 
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
DataWorks Summit
 
Still on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youStill on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for you
ModusOptimum
 
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Visual_BI
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
IBM Cloud Data Services
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
Arcadia Data
 
10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16 10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16
Kangaroot
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
 
Informix REST API Tutorial
Informix REST API TutorialInformix REST API Tutorial
Informix REST API Tutorial
Brian Hughes
 
SQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterSQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data Cluster
Maximiliano Accotto
 
Analyzing GeoSpatial data with IBM Cloud Data Services & Esri ArcGIS
Analyzing GeoSpatial data with IBM Cloud Data Services & Esri ArcGISAnalyzing GeoSpatial data with IBM Cloud Data Services & Esri ArcGIS
Analyzing GeoSpatial data with IBM Cloud Data Services & Esri ArcGIS
IBM Cloud Data Services
 
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
Torsten Steinbach
 
Ad

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Ad

Recently uploaded (20)

How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 

The convergence of reporting and interactive BI on Hadoop

  • 1. The Convergence of Reporting and Interactive BI on Hadoop Gustavo Arocena June 19, 2018 Db2 Big SQL
  • 3. SQL-based Interactive BI on Hadoops 3 This Photo by Unknown Author is licensed under CC BY
  • 4. SQL-based Interactive BI on Hadoop 4 Time2008 2011 2011 2012 The good old times EDW Analytic DB Not there yet SQL on Hadoop EDW HDFS Analytic DB “Big Data” It works, but … SQL on Hadoop EDW HDFS Analytic DB “Big Data” Everyone Happy? BI Accelerator EDW HDFS Analytic DB SQL on Hadoop “Big Data”
  • 5. • Enable offloading of Interactive BI to Hadoop • Interactive BI on Big Data • Varying degree of autonomics (auto-creation, auto-refresh) • Fast response for analytic queries SELECT p.category, max(s.amount) FROM products p, sales s WHERE p.id = s.pid GROUP BY p.category BI Accelerators – The Value Prop 5 This Photo by Unknown Author is licensed under CC BY-SA
  • 6. BI Accelerators – The Small Print 6 • Duplication • Licensing • Skills • Vendors for service/support • Complexity • Data architecture • Data copying & refreshing • Narrow scope • Only repetitive, tool-generated queries • Low integration with Hadoop platform BI Accelerator EDW HDFS Analytic DB SQL on Hadoop “Big Data”
  • 7. BI Acceleration Techniques 7 CREATE HADOOP TABLE sales (id integer, city string, amount double) SELECT sum(amount) FROM sales WHERE amount < 500 AND city = ‘Toronto’ Columnar Storage Cubing Indexing Cache 1st use 2nd use Caching • Data • Columnar stats • Query results
  • 8. Why Not in SQL on Hadoop ? 8 Interactive BI Concurrent workloads Enterprise features Core SQL processing SQL on Hadoop Maturity Levels 6-7 years
  • 9. Reducing IO and Computation 9 2020? Cost-based optimization Partitioning Columnar storage Cubing Caching Indexing BI Accelerators SQL on Hadoop 2018 Partitioning Columnar storage Cubing Caching Indexing BI AcceleratorsSQL on Hadoop Cost-based optimization Partitioning Columnar storage Cubing 2014 Caching Indexing SQL on Hadoop BI Accelerators Cost-based optimization
  • 10. The Convergence 10 In memory caching Time SQL on Hadoop Partitioning Cost-based optimization Columnar storage BI performance BI Accelerators Cubing Indexing ~ 2012 On disk caching ~ 2009 ~ 2020
  • 11. 11 Jethro AtScale Kylin Engine Multiple instances of single node SMP engine Not an engine MOLAP engine, storing cube cells in HBase Acceleration techniques Indexing Cubing Caching Cubing Caching Approximate answers (e.g. count distinct) Cubing Cost-based Optimizer Unique features • Computes cubes “bottom up” on demand • Creates inverted indexes for all columns • Re-ingests all the data into proprietary fmt • Imposes star schema on all data • Automatic and manual cubes • Uses another engine to execute queries • Brute force cube building • Routes query to Hive when not in cube • Uses Spark to speed up cube building EDW HDFS Analytic DB EDW HDFS Analytic DB Spark Hive HBase EDW HDFS Analytic DB Hive/SparkSQL/…
  • 12. Using MPP Engines for Interactive BI 12 “MPP is a parallel architecture. Full scans are the worst-case scenario, not the norm” “You need scans for queries that can’t be answered using cube/cache/index” “MPP engines scale better than non-MPP ones” This Photo by Unknown Author is licensed under CC BY-SA “MPP is a full scan architecture” “You can do BI on Hadoop without table scans” “MPP SQL engines do not scale to many users” This Photo by Unknown Author is licensed under CC BY-SA
  • 13. IBM Db2 Big SQL 13 Top performance on complex workloads No Lock-In Reporting AND Interactive BI Built-in federation to Oracle, Netezza, Db2 Workload management Big SQL Head Big SQL WorkerBig SQL WorkerBig SQL WorkerBig SQL WorkerBig SQL Worker HDFS Hive MS Hadoop NN Deep Platform Integration with no Lock-In
  • 14. Big SQL Performance and Resource Utilization on Complex Workloads 14 Hadoop DS @ 100TB, 4 concurrent streams 13.7 43.2 BIG SQL SPARK SQL Hours Elapsed Time 76.4 88.2 BIG SQL SPARK SQL % CPU Utilization 107 388 BIG SQL SPARK SQL MB/Sec Disk Reads 25 237 BIG SQL SPARK SQL MB/Sec Disk Writes - 15% 1/3 1/3 1/9 https://developer.ibm.com/hadoop/2017/02/07/experiences-comparing-big-sql-and-spark-sql-at-100tb/
  • 15. Accelerating BI with MQTs 15 CREATE HADOOP TABLE joinMQT AS ( SELECT p_type, p_color, lo_quantity FROM lineorder, part, dwdate WHERE p_partkey = lo_partkey AND lo_orderdate = d_datekey AND d_year BETWEEN 2000 AND 2010) PARTITION BY d_year SORT BY p_type STORED AS ORC DATA INITIALLY DEFERRED…; SELECT AVG(lo_quantity), p_color FROM lineorder, part, dwdate WHERE year = 2007 AND p_type = ‘outdoor’ SELECT sum(lo_revenue) FROM lineorder, customer WHERE c_custokey = lo_custkey AND c_city = ‘Toronto’ CREATE TABLE aggMQT AS ( SELECT sum(lo_revenue), c_city FROM lineorder, customer WHERE c_custokey = lo_custkey GROUP BY c_city DATA INITIALLY DEFERRED…; Hadoop MQT for join (denormalization) Native MQT for aggregation (cubing) • High Cardinality • Stored on HDFS as ORC • Partitioned • Sorted for PPD • Low Cardinality • Stored on head node in “native” format • Can be indexed (turns MQT into true cube) CREATE UNIQUE INDEX aggMQTidx ON aggMQT(c_city); • Answered using joinMQT • Answered using aggMQT and aggMQTidx
  • 16. Speeding Up Dashboards in Big SQL 16 1 • Tune Big SQL, partition data, use ORC format 2 • Copy a fraction/sample of the data to BI tool (e.g. using Tableau extracts) 3 • Prototype Dashboard using sample data, to get interactive response during design/prototyping 4 • Once Dashboard is stable, export SQL queries from BI tool 5 • Create necessary MQTs and indexes to speed up the dashboard queries, to get interactive response in production 6 • Point BI tool to Big SQL (ODBC) 7 • Run Dashboard in Production • Interactive during design ≠ interactive in production • Speed up dashboard design by using just a fraction of the data • Speed up production version using MQTs and indexes STEPS
  • 17. SQL on Hadoop in 2018 17 Big SQL EDW HDFS Analytic DB “Big Data” 2018 VS. SQL on Hadoop EDW HDFS Analytic DB “Big Data” 2012
  • 18. Big SQL vs BI Accelerators 18 Big SQL BI Accelerators Reporting queries   Predictable interactive queries   Complex hand- written queries   One-off queries   Heavy workloads   Integration with Hadoop ecosystem   Auto-cubes   Full indexing  
  • 19. Options for Interactive BI on Hadoop in 2018 19 • “Upload” to Analytic DB • Expensive • Painful • BI Accelerator • Duplication • Complexities • Narrow scope • Lack of platform integration • SQL on Hadoop • Autonomics The picture above by Unknown Author is licensed under CC BY-NC
  • 20. Big SQL Roadmap 20 • Caching • Autonomics • Security & Governance (Ranger/Atlas) • Star schema joins • Interoperability with Hive ACID • Integration with HDP 3.0 (Ambari/YARN)
  • 21. Conclusions 21 • Understand BI Acceleration techniques and trade-offs • SQL on Hadoop 2018 • Cubing, indexing, caching and more • Reporting AND Interactive BI in a single engine • SQL on Hadoop still evolving fast!
  • 23. 23 Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed “as is” without any warranty, either express or implied. In no event, shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. IBM products and services are warranted per the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer follows any law. Notices and Disclaimers Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products about this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a purpose. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com and Big SQL are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml
  • 24. 24

Editor's Notes