SlideShare a Scribd company logo
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
CarpeDatum
Building Big Data Analytical Applications with HP Haven
Steve Sarsfield (HP Big Data Software)
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2
Well-known Innovations at HP
Infrastructure Enterprise
Services
Reference
Architectures
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3
Ideal for Hadoop and scale-out parallel processing applications
ProLiant SL4540 Server
Maximize Storage
Capacity and Compute
Performance
• Latest Xeon processors
• Each chassis: 270 TB
• Each rack 2.43 PB
• Software for monitoring
health and utilization of
nodes and other
maintenance.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4
HP Haven
Outcomes
Infrastructure
Engines
100%
Harness
of your data
Scale
Extreme
of your data
Speed
Extreme
In performing
analytics
Deploy
Seamlessly
next-gen
applications
anywhere
Cloud
Supports
with Haven
On-Demand
Advanced
Delivers
analytics leveraging
power of the cluster
Turbocharges
Analytics with
cluster-friendly
R language
Predictive
Composite Analytical Applications
HP Haven Platform
HadoopOn-premise Private cloud Public cloud
Human data
Machine data
Business data
• Understand human information
• Voice, video, facial recognition
and more
HP IDOL
• Big Data analytics
• Full ANSI SQL engine on MPP
architecture
HP Vertica
• Predictive analytics
• Leverage your cluster for
R-based predictive analytics
Distributed R
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5
Vertica Core Capabilities – Built for Speed
We boost performance
Use to take Now takes
1 hour 3.6 Seconds
8 hours (overnight) Under 30 seconds
What 1000% means:
"When we did the first queries, they were done so
fast, we thought they were broken.“
- Michael Relich, Guess?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6
Vertica Optimizations
Columnar
Storage
Compression MPP Scale-Out Distributed
Query
Projections
Speeds Query Time
by Reading Only
Necessary Data
Lowers costly I/O to
boost overall
performance
Provides high
scalability on
clusters with no
name node or other
single point of
failure
Any node can initiate
the queries and use
other nodes for
work. No single point
of failure
Combine high
availability with
special
optimizations for
query performance
CPU
Memory
Disk
CPU
Memory
Disk
CPU
Memory
Disk
A B D C E A
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7
• HP Vertica for SQL on Hadoop offers the only
full-featured query engine on Hadoop
• Same Core Engine
• Hadoop Distribution Agnostic
• Enterprise-ready Solution
• World-class Enterprise Support and Services
• Open platform
• Ready for Haven
• Competitive price point
HP Vertica for SQL on Hadoop
Hadoop Storage
Vertica ANSI SQL
Data
Exploration
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8
One Query Engine to Serve it all
• Query data in place in Hadoop Formats
• Leverage existing Hadoop infrastructure
• Avoid unnecessary data movement
• Single query engine across diverse formats and infrastructure
Query Engine
Format
File System Vertica (EXT4)
Vertica Optimized (ROS, Flex Tables)
HP Vertica
ANSI SQL
Hadoop (HDP, CDH, MapR NFS)
Hadoop (ORC, Parquet, et al)
Query data without moving the data
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9
Hortonworks/Vertica Beats the Competition
Based on TCP-DS Benchmarks
ACID Protection,
Data Integrity
Full-function
ANSI SQL
Completeness of
Query
Concurrency of
Workload
HP Vertica
ANSI SQL
Hadoop (HDP)
ORC file reader
HP Vertica
SQL on Hadoop
Faster than Parquet
About 10% faster
Faster than our old connector strategy
More complete than Impala
98% VS 30% query completion
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10
More HW Partner Integration
Ambari Integration
• Integration with our own management
console
• Ability to monitor Vertica health and
resources, and those Hadoop services that
are enabled on the Vertica nodes.
• Ability to monitor metrics: hostname, IP,
memory usage, CPU usage, network
usage, disk usage.
Vertica Management Console (MC) can now monitor Vertica clusters and databases that are
running on Hadoop nodes
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11
More HW Partner Integration
Kerberos Integration
• Provides security and encryption
• Single Authentication to both
Vertica and Hadoop services via
Kerberos
• Eliminates need for dual
authentication steps for Hadoop
services
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12
HP Haven
Outcomes
Infrastructure
Engines
100%
Harness
of your data
Scale
Extreme
of your data
Speed
Extreme
In performing
analytics
Deploy
Seamlessly
next-gen
applications
anywhere
Cloud
Supports
with Haven
On-Demand
Advanced
Delivers
analytics leveraging
power of the cluster
Turbocharges
Analytics with
cluster-friendly
R language
Predictive
Composite Analytical Applications
HP Haven Platform
HadoopOn-premise Private cloud Public cloud
Human data
Machine data
Business data
• Understand human information
• Voice, video, facial recognition
and more
HP IDOL
• Big Data analytics
• Full ANSI SQL engine on MPP
architecture
HP Vertica
• Predictive analytics
• Leverage your cluster for
R-based predictive analytics
Distributed R
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13
This is your brain on…
R
Distributed R
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14
A scalable and high-performance platform for the R language
Use familiar GUIs
and packages like
R Studio
Analyze data too
large for vanilla R
Leverage multiple
nodes for
distributed
processing
Vastly improved
performance
Distributed R
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15
HP Vertica Distributed R
Build Models
Evaluate Models
Deploy Models
(In-database scoring)BI Integration
1 2
3
Build and evaluate
predictive models on
large datasets using
Distributed R
2
1 Ingest and prepare
data by leveraging HP
Vertica Analytics
Platform
3
Deploy models to
Vertica and use in-
database scoring to
produce prediction
results for BI and
applications.
Operationalize big data predictive analytics
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16
Distributed R
Out-of-the-box Distributed Algorithms
Algorithm Use cases
Linear Regression (GLM) Risk Analysis, Trend Analysis, etc.
Logistic Regression (GLM)
Customer Response modeling, Healthcare analytics
(Disease analysis)
Random Forest Customer churn, Market campaign analysis
K-Means Clustering
Customer segmentation, Fraud detection, Anomaly
detection
Page Rank Identify influencers
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17
#Records (billion) #Servers Time (sec)
0.3 1 145
1.0 3 165
1.7 5 173
2.9 8 270
Algorithm: Logistic regression
Setup: DL 380 servers, 16*2 HT cores/server, 120GB RAM
Distributed R in a single node is ~30x faster than R’s glm
Dataset: 100Mx7 (~6GB), 1xDL 380
Distributed RPredictive Models on billions of observations
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18
HP Haven
Outcomes
Infrastructure
Engines
100%
Harness
of your data
Scale
Extreme
of your data
Speed
Extreme
In performing
analytics
Deploy
Seamlessly
next-gen
applications
anywhere
Cloud
Supports
with Haven
On-Demand
Advanced
Delivers
analytics leveraging
power of the cluster
Turbocharges
Analytics with
cluster-friendly
R language
Predictive
Composite Analytical Applications
HP Haven Platform
HadoopOn-premise Private cloud Public cloud
Human data
Machine data
Business data
• Understand human information
• Voice, video, facial recognition
and more
HP IDOL
• Big Data analytics
• Full ANSI SQL engine on MPP
architecture
HP Vertica
• Predictive analytics
• Leverage your cluster for
R-based predictive analytics
Distributed R
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19
IDOL
Audio, video and picture analysis Find related concepts and search
Optical Character Recognition
HUMAN INFORMATION
Much more at
IdolOnDemand.COM
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
MoreHPPlugs
2
© Copyright 2015 Hewlett-Packard Development Company,L.P. The information contained herein is subject to change without notice.
HP Services – a path to Hadoop value through consulting and SI
Challenges – ROI and
business value inhibitors
• Business led drivers
• Use cases with business
value
• Maturing technology
stack
• Hadoop skills in the
market
Success
Factors
• Expert guidance
• Relevant business cases
• Discovering value-driven
uses
• Execution plan to extract
value
• Leveraging the right
deployment option
Consulting
and SI
• Demonstrated Hadoop
expertise
• Ability to deploy
around existing
infrastructure
• Roadmap to accelerate
time value
• Attain top
management relevance
and impact
HP Analytics & Data
Management Services
• Discovery
• Integration
• Workload
optimization
• Platform
– As a service
– On premises
– Managed
• Analytics
Merge traditional Business Intelligence with new Big Data technologies and transition to a more
data-driven and agile enterprise.
© Copyright 2012 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice.
22
HPBigDataConference2015
August10-13
WestinWaterfrontHotel,Boston
REGISTER TODAY!
For general inquiries: Info.BDC@hp.com
For inquires on becoming a sponsor: Sponsorinfo.BDC@hp.com
#HPBigData2015
40+ technical sessions
Network with hundreds of peers
Hands-on hackathon
Just Announced!
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23
Community Edition
• Free Download 1TB, 3 nodes
my.vertica.com/evaluate
Learn More About – and Try! - HP Vertica
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
ThankYou
Ad

More Related Content

What's hot (20)

Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Esther Vasiete
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
DataWorks Summit
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
DataWorks Summit
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Vinod Kumar Vavilapalli
 
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
DataWorks Summit
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
DataWorks Summit
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopTrafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoop
Krishna-Kumar
 
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
Big Data Montreal
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
 
Admiral Group
Admiral GroupAdmiral Group
Admiral Group
DataWorks Summit/Hadoop Summit
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Brock Noland
 
1 - The Case for Trafodion
1 - The Case for Trafodion1 - The Case for Trafodion
1 - The Case for Trafodion
Rohit Jain
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
DataWorks Summit/Hadoop Summit
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
DataWorks Summit/Hadoop Summit
 
MPP vs Hadoop
MPP vs HadoopMPP vs Hadoop
MPP vs Hadoop
Alexey Grishchenko
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Esther Vasiete
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
DataWorks Summit
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Vinod Kumar Vavilapalli
 
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
DataWorks Summit
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
DataWorks Summit
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopTrafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoop
Krishna-Kumar
 
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
Big Data Montreal
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Brock Noland
 
1 - The Case for Trafodion
1 - The Case for Trafodion1 - The Case for Trafodion
1 - The Case for Trafodion
Rohit Jain
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
DataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
DataWorks Summit
 
Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]
DataWorks Summit
 
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
DataWorks Summit
 
Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets
DataWorks Summit
 
Running Spark and MapReduce together in Production
Running Spark and MapReduce together in ProductionRunning Spark and MapReduce together in Production
Running Spark and MapReduce together in Production
DataWorks Summit
 
Hadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeHadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance Initiative
DataWorks Summit
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
DataWorks Summit
 
Hadoop for Genomics__HadoopSummit2010
Hadoop for Genomics__HadoopSummit2010Hadoop for Genomics__HadoopSummit2010
Hadoop for Genomics__HadoopSummit2010
Yahoo Developer Network
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
Spark Application Development Made Easy
Spark Application Development Made EasySpark Application Development Made Easy
Spark Application Development Made Easy
DataWorks Summit
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
NoSQL Needs SomeSQL
NoSQL Needs SomeSQLNoSQL Needs SomeSQL
NoSQL Needs SomeSQL
DataWorks Summit
 
Big Data Challenges in the Energy Sector
Big Data Challenges in the Energy SectorBig Data Challenges in the Energy Sector
Big Data Challenges in the Energy Sector
DataWorks Summit
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
DataWorks Summit
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
DataWorks Summit
 
Online Approximate OLAP in SparkSQL
Online Approximate OLAP in SparkSQLOnline Approximate OLAP in SparkSQL
Online Approximate OLAP in SparkSQL
DataWorks Summit
 
Tailored for Spark
Tailored for SparkTailored for Spark
Tailored for Spark
DataWorks Summit/Hadoop Summit
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
DataWorks Summit
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
DataWorks Summit
 
Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]Inspiring Travel at Airbnb [WIP]
Inspiring Travel at Airbnb [WIP]
DataWorks Summit
 
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
DataWorks Summit
 
Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets Karta an ETL Framework to process high volume datasets
Karta an ETL Framework to process high volume datasets
DataWorks Summit
 
Running Spark and MapReduce together in Production
Running Spark and MapReduce together in ProductionRunning Spark and MapReduce together in Production
Running Spark and MapReduce together in Production
DataWorks Summit
 
Hadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance InitiativeHadoop in Validated Environment - Data Governance Initiative
Hadoop in Validated Environment - Data Governance Initiative
DataWorks Summit
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
DataWorks Summit
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
Spark Application Development Made Easy
Spark Application Development Made EasySpark Application Development Made Easy
Spark Application Development Made Easy
DataWorks Summit
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
Big Data Challenges in the Energy Sector
Big Data Challenges in the Energy SectorBig Data Challenges in the Energy Sector
Big Data Challenges in the Energy Sector
DataWorks Summit
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
DataWorks Summit
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
DataWorks Summit
 
Online Approximate OLAP in SparkSQL
Online Approximate OLAP in SparkSQLOnline Approximate OLAP in SparkSQL
Online Approximate OLAP in SparkSQL
DataWorks Summit
 
Ad

Similar to Carpe Datum: Building Big Data Analytical Applications with HP Haven (20)

A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
DataWorks Summit
 
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopHP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
MapR Technologies
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
Haven 2 0
Haven 2 0 Haven 2 0
Haven 2 0
Data Science Warsaw
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
Inside Analysis
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop Acceleration
Inside Analysis
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Pentaho
 
LEG Keynote: Linda Knippers - HP
LEG Keynote: Linda Knippers - HPLEG Keynote: Linda Knippers - HP
LEG Keynote: Linda Knippers - HP
Linaro
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
Emil Andreas Siemes
 
Get started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesGet started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languages
JanBask Training
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
EMC
 
HP Moonshot system
HP Moonshot systemHP Moonshot system
HP Moonshot system
HP Enterprise Italia
 
Comparing the TCO of HP NonStop with Oracle RAC
Comparing the TCO of HP NonStop with Oracle RACComparing the TCO of HP NonStop with Oracle RAC
Comparing the TCO of HP NonStop with Oracle RAC
Thomas Burg
 
Actian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionActian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL Edition
Alessandro Salvatico
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
Big Data Spain
 
Big Data & SQL: The On-Ramp to Hadoop
Big Data & SQL: The On-Ramp to Hadoop Big Data & SQL: The On-Ramp to Hadoop
Big Data & SQL: The On-Ramp to Hadoop
Inside Analysis
 
2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview
jdijcks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
DataWorks Summit
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
DataWorks Summit
 
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopHP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
MapR Technologies
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
Inside Analysis
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop Acceleration
Inside Analysis
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Pentaho
 
LEG Keynote: Linda Knippers - HP
LEG Keynote: Linda Knippers - HPLEG Keynote: Linda Knippers - HP
LEG Keynote: Linda Knippers - HP
Linaro
 
Get started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesGet started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languages
JanBask Training
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
EMC
 
Comparing the TCO of HP NonStop with Oracle RAC
Comparing the TCO of HP NonStop with Oracle RACComparing the TCO of HP NonStop with Oracle RAC
Comparing the TCO of HP NonStop with Oracle RAC
Thomas Burg
 
Actian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionActian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL Edition
Alessandro Salvatico
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
Big Data Spain
 
Big Data & SQL: The On-Ramp to Hadoop
Big Data & SQL: The On-Ramp to Hadoop Big Data & SQL: The On-Ramp to Hadoop
Big Data & SQL: The On-Ramp to Hadoop
Inside Analysis
 
2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview
jdijcks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
DataWorks Summit
 
Ad

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 

Carpe Datum: Building Big Data Analytical Applications with HP Haven

  • 1. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. CarpeDatum Building Big Data Analytical Applications with HP Haven Steve Sarsfield (HP Big Data Software)
  • 2. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2 Well-known Innovations at HP Infrastructure Enterprise Services Reference Architectures
  • 3. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3 Ideal for Hadoop and scale-out parallel processing applications ProLiant SL4540 Server Maximize Storage Capacity and Compute Performance • Latest Xeon processors • Each chassis: 270 TB • Each rack 2.43 PB • Software for monitoring health and utilization of nodes and other maintenance.
  • 4. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4 HP Haven Outcomes Infrastructure Engines 100% Harness of your data Scale Extreme of your data Speed Extreme In performing analytics Deploy Seamlessly next-gen applications anywhere Cloud Supports with Haven On-Demand Advanced Delivers analytics leveraging power of the cluster Turbocharges Analytics with cluster-friendly R language Predictive Composite Analytical Applications HP Haven Platform HadoopOn-premise Private cloud Public cloud Human data Machine data Business data • Understand human information • Voice, video, facial recognition and more HP IDOL • Big Data analytics • Full ANSI SQL engine on MPP architecture HP Vertica • Predictive analytics • Leverage your cluster for R-based predictive analytics Distributed R
  • 5. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5 Vertica Core Capabilities – Built for Speed We boost performance Use to take Now takes 1 hour 3.6 Seconds 8 hours (overnight) Under 30 seconds What 1000% means: "When we did the first queries, they were done so fast, we thought they were broken.“ - Michael Relich, Guess?
  • 6. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6 Vertica Optimizations Columnar Storage Compression MPP Scale-Out Distributed Query Projections Speeds Query Time by Reading Only Necessary Data Lowers costly I/O to boost overall performance Provides high scalability on clusters with no name node or other single point of failure Any node can initiate the queries and use other nodes for work. No single point of failure Combine high availability with special optimizations for query performance CPU Memory Disk CPU Memory Disk CPU Memory Disk A B D C E A
  • 7. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7 • HP Vertica for SQL on Hadoop offers the only full-featured query engine on Hadoop • Same Core Engine • Hadoop Distribution Agnostic • Enterprise-ready Solution • World-class Enterprise Support and Services • Open platform • Ready for Haven • Competitive price point HP Vertica for SQL on Hadoop Hadoop Storage Vertica ANSI SQL Data Exploration
  • 8. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8 One Query Engine to Serve it all • Query data in place in Hadoop Formats • Leverage existing Hadoop infrastructure • Avoid unnecessary data movement • Single query engine across diverse formats and infrastructure Query Engine Format File System Vertica (EXT4) Vertica Optimized (ROS, Flex Tables) HP Vertica ANSI SQL Hadoop (HDP, CDH, MapR NFS) Hadoop (ORC, Parquet, et al) Query data without moving the data
  • 9. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9 Hortonworks/Vertica Beats the Competition Based on TCP-DS Benchmarks ACID Protection, Data Integrity Full-function ANSI SQL Completeness of Query Concurrency of Workload HP Vertica ANSI SQL Hadoop (HDP) ORC file reader HP Vertica SQL on Hadoop Faster than Parquet About 10% faster Faster than our old connector strategy More complete than Impala 98% VS 30% query completion
  • 10. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10 More HW Partner Integration Ambari Integration • Integration with our own management console • Ability to monitor Vertica health and resources, and those Hadoop services that are enabled on the Vertica nodes. • Ability to monitor metrics: hostname, IP, memory usage, CPU usage, network usage, disk usage. Vertica Management Console (MC) can now monitor Vertica clusters and databases that are running on Hadoop nodes
  • 11. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11 More HW Partner Integration Kerberos Integration • Provides security and encryption • Single Authentication to both Vertica and Hadoop services via Kerberos • Eliminates need for dual authentication steps for Hadoop services
  • 12. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12 HP Haven Outcomes Infrastructure Engines 100% Harness of your data Scale Extreme of your data Speed Extreme In performing analytics Deploy Seamlessly next-gen applications anywhere Cloud Supports with Haven On-Demand Advanced Delivers analytics leveraging power of the cluster Turbocharges Analytics with cluster-friendly R language Predictive Composite Analytical Applications HP Haven Platform HadoopOn-premise Private cloud Public cloud Human data Machine data Business data • Understand human information • Voice, video, facial recognition and more HP IDOL • Big Data analytics • Full ANSI SQL engine on MPP architecture HP Vertica • Predictive analytics • Leverage your cluster for R-based predictive analytics Distributed R
  • 13. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13 This is your brain on… R Distributed R
  • 14. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14 A scalable and high-performance platform for the R language Use familiar GUIs and packages like R Studio Analyze data too large for vanilla R Leverage multiple nodes for distributed processing Vastly improved performance Distributed R
  • 15. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15 HP Vertica Distributed R Build Models Evaluate Models Deploy Models (In-database scoring)BI Integration 1 2 3 Build and evaluate predictive models on large datasets using Distributed R 2 1 Ingest and prepare data by leveraging HP Vertica Analytics Platform 3 Deploy models to Vertica and use in- database scoring to produce prediction results for BI and applications. Operationalize big data predictive analytics
  • 16. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16 Distributed R Out-of-the-box Distributed Algorithms Algorithm Use cases Linear Regression (GLM) Risk Analysis, Trend Analysis, etc. Logistic Regression (GLM) Customer Response modeling, Healthcare analytics (Disease analysis) Random Forest Customer churn, Market campaign analysis K-Means Clustering Customer segmentation, Fraud detection, Anomaly detection Page Rank Identify influencers
  • 17. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17 #Records (billion) #Servers Time (sec) 0.3 1 145 1.0 3 165 1.7 5 173 2.9 8 270 Algorithm: Logistic regression Setup: DL 380 servers, 16*2 HT cores/server, 120GB RAM Distributed R in a single node is ~30x faster than R’s glm Dataset: 100Mx7 (~6GB), 1xDL 380 Distributed RPredictive Models on billions of observations
  • 18. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18 HP Haven Outcomes Infrastructure Engines 100% Harness of your data Scale Extreme of your data Speed Extreme In performing analytics Deploy Seamlessly next-gen applications anywhere Cloud Supports with Haven On-Demand Advanced Delivers analytics leveraging power of the cluster Turbocharges Analytics with cluster-friendly R language Predictive Composite Analytical Applications HP Haven Platform HadoopOn-premise Private cloud Public cloud Human data Machine data Business data • Understand human information • Voice, video, facial recognition and more HP IDOL • Big Data analytics • Full ANSI SQL engine on MPP architecture HP Vertica • Predictive analytics • Leverage your cluster for R-based predictive analytics Distributed R
  • 19. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19 IDOL Audio, video and picture analysis Find related concepts and search Optical Character Recognition HUMAN INFORMATION Much more at IdolOnDemand.COM
  • 20. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. MoreHPPlugs 2
  • 21. © Copyright 2015 Hewlett-Packard Development Company,L.P. The information contained herein is subject to change without notice. HP Services – a path to Hadoop value through consulting and SI Challenges – ROI and business value inhibitors • Business led drivers • Use cases with business value • Maturing technology stack • Hadoop skills in the market Success Factors • Expert guidance • Relevant business cases • Discovering value-driven uses • Execution plan to extract value • Leveraging the right deployment option Consulting and SI • Demonstrated Hadoop expertise • Ability to deploy around existing infrastructure • Roadmap to accelerate time value • Attain top management relevance and impact HP Analytics & Data Management Services • Discovery • Integration • Workload optimization • Platform – As a service – On premises – Managed • Analytics Merge traditional Business Intelligence with new Big Data technologies and transition to a more data-driven and agile enterprise.
  • 22. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 22 HPBigDataConference2015 August10-13 WestinWaterfrontHotel,Boston REGISTER TODAY! For general inquiries: [email protected] For inquires on becoming a sponsor: [email protected] #HPBigData2015 40+ technical sessions Network with hundreds of peers Hands-on hackathon Just Announced!
  • 23. © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23 Community Edition • Free Download 1TB, 3 nodes my.vertica.com/evaluate Learn More About – and Try! - HP Vertica
  • 24. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. ThankYou