SlideShare a Scribd company logo
GoodData – the Case Study #2: 
Big Data Pipeline for Analytics at Scale 
DB Technologies for Big Data @ FIT CVUT 
2014 GoodData Corporation. All Rights Reserved. 
November 19 2014
GoodData Corporation. All Rights Reserved. 
GoodData Corporation
GoodData Corporation. All Rights Reserved. 
End to End, Analytics 
Platform as a Service 
Traditional BI 
Data Visualization 
Tableau, Qlikview, Spotfire, etc. 
Analytics Engine 
Cognos, Oracle, Business Objects, etc. 
Data Marts 
MySQL, PostgreSQL, etc. 
Data Warehouse 
Oracle, Teradata, Netezza, Microsoft, etc. 
ETL 
Informatica, DataStage, Boomi, Snaplogic, etc. 
Infrastructure 
Servers, Storage, Networking, etc. 
Data Collaboration 
Data Visualization 
Analytics Engine 
Data Marts 
Data Warehouse 
ELT / ETL 
Infrastructure
One Platform. Two Markets. 
For Your Customers 
Powered By GoodData Partner Program 
for disruptive ISVs including Zendesk, 
Switchfly, and Phizzle 
GoodData Corporation. All Rights Reserved. 
For Your Business 
Drive your business with your data. 
Experience and accelerators for 
Social, Sales, Marketing, Yammer
GoodData Corporation. All Rights Reserved. 
Our Focus
Our Customers
GoodData Corporation. All Rights Reserved. 
What The End Users See...
GoodData Corporation. All Rights Reserved. 
What The End Users See...
GoodData Corporation. All Rights Reserved. 
What Is In The Box...
GoodData Corporation. All Rights Reserved. 
End to End, Analytics 
Platform as a Service 
Traditional BI 
Data Visualization 
Tableau, Qlikview, Spotfire, etc. 
Analytics Engine 
Cognos, Oracle, Business Objects, etc. 
Data Marts 
MySQL, PostgreSQL, etc. 
Data Warehouse 
Oracle, Teradata, Netezza, Microsoft, etc. 
ETL 
Informatica, DataStage, Boomi, Snaplogic, etc. 
Infrastructure 
Servers, Storage, Networking, etc. 
Data Collaboration 
Data Visualization 
Analytics Engine 
Data Marts 
Data Warehouse 
ELT / ETL 
Infrastructure
GoodData Platform Zoom-In 
End to End, Analytics 
Platform as a Service 
Data Collaboration 
Data Visualization 
Analytics Engine 
Data Marts 
Data Warehouse 
ELT / ETL 
Infrastructure
GoodData Analytics Platform - The Data Pipeline
GoodData Corporation. All Rights Reserved. 
Let’s Start With The Outcome - The Insights
GoodData Corporation. All Rights Reserved. 
Let’s Start With The Outcome - The Insights 
• User Experience 
○ Visual Appeal 
○ Ease of Use 
○ Performance 
• Analytical Power 
• Many Data Sources 
○ Need to cross analyze all of them 
○ Need to add/remove sources as needed 
• Cost Efficiency 
○ Computational density allowed by multi-tenancy
GoodData Corporation. All Rights Reserved. 
Let’s Start With The Outcome - The Insights 
● Analytical Engine / MAQL 
● Exploration, Visualization 
and Distribution Layer 
● Pluggable Database 
Backends 
● 10s of GB up to TBs
GoodData Corporation. All Rights Reserved. 
Behind The Scenes - The Big Data Pipeline 
• Large Data Throughput 
○ Close to Real-time Updates 
• Many Data Sources 
○ Need to cross analyze all of them 
○ Need to add/remove sources as needed 
• Agility 
○ Capture all data without knowing the analytical use case in advance 
• Cost Efficiency 
○ Computational density allowed by multi-tenancy
GoodData Corporation. All Rights Reserved. 
Behind The Scenes - The Big Data Pipeline 
• Big Data Store 
○ 100s of TBs per customer 
○ Persist All Incoming Data 
○ CSV, XML, JSON, ... 
• Immutable 
○ Append Only 
○ Keep Ingestion History 
• Technologies 
○ Amazon S3 
○ Cloud Files
GoodData Corporation. All Rights Reserved. 
Behind The Scenes - The Big Data Pipeline 
• Agile Data Warehouse 
○ 10s of TBs per customer 
○ Relational Model 
○ Semi-Cleansed 
○ Complete History Captured 
• Technologies 
○ HP Vertica 
○ GoodData BI Integration Services
GoodData Corporation. All Rights Reserved. 
Behind The Scenes - The Big Data Pipeline 
• Combine Input Stage Data Sets 
○ Mapping, Cleansing 
• Perform Data Transformations in Data Warehouse 
○ Benchmarking, Snapshotting, Sampling 
• Generate Data Mart Input Data 
○ Data Warehouse : Data Mart relation is typically 1 : N 
○ 10s of thousands Data Marts in PbG (OEM) use case!
GoodData Corporation. All Rights Reserved. 
Behind The Scenes - The Big Data Pipeline 
• GoodData BI Integration Services 
○ CloudConnect Runtime 
○ Ruby Runtime 
○ Data Integration Console 
Over 2M ETL jobs per week!
GoodData Corporation. All Rights Reserved. 
The Wrap-Up - The Big Data Pipeline 
Progression Through: 
• Big Data Store 
• Data Warehouse 
• Data Marts 
As a means to satisfy the end user: 
• User Experience 
• Analytical Power 
• Many Data Sources 
• Cost Efficiency
GoodData Corporation. All Rights Reserved. 
Questions?
GoodData Corporation. All Rights Reserved. 
Thank you!
Ad

More Related Content

What's hot (20)

How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on Snowflake
AtScale
 
Cloud and Big Data trends
Cloud and Big Data trendsCloud and Big Data trends
Cloud and Big Data trends
Sebastien Goasguen
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Big Data Spain
 
Google Bigtable
Google BigtableGoogle Bigtable
Google Bigtable
GirdhareeSaran
 
Memory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business InnovationMemory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business Innovation
VoltDB
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud Analytics
SnapLogic
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
Deepak Chandramouli
 
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jUnified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Deepak Chandramouli
 
The Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial ServicesThe Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial Services
DataStax
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data Transformation
DataWorks Summit
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
 
Securing and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industrySecuring and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industry
DataWorks Summit
 
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo DataFest 2017: Outpace Your Competition with Real-Time ResponsesDenodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo
 
How Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastHow Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments Webcast
Yellowbrick Data
 
Snaplogic Live: Big Data in Motion
Snaplogic Live: Big Data in MotionSnaplogic Live: Big Data in Motion
Snaplogic Live: Big Data in Motion
SnapLogic
 
Postgres Vision 2018: Your Migration Path - BinckBank Case Study
Postgres Vision 2018: Your Migration Path - BinckBank Case StudyPostgres Vision 2018: Your Migration Path - BinckBank Case Study
Postgres Vision 2018: Your Migration Path - BinckBank Case Study
EDB
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top Contenders
VoltDB
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Rittman Analytics
 
Denodo DataFest 2017: Edge Computing: Collecting vs. Connecting to Streaming ...
Denodo DataFest 2017: Edge Computing: Collecting vs. Connecting to Streaming ...Denodo DataFest 2017: Edge Computing: Collecting vs. Connecting to Streaming ...
Denodo DataFest 2017: Edge Computing: Collecting vs. Connecting to Streaming ...
Denodo
 
How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on Snowflake
AtScale
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Big Data Spain
 
Memory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business InnovationMemory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business Innovation
VoltDB
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud Analytics
SnapLogic
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
Deepak Chandramouli
 
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jUnified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Deepak Chandramouli
 
The Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial ServicesThe Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial Services
DataStax
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data Transformation
DataWorks Summit
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
 
Securing and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industrySecuring and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industry
DataWorks Summit
 
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo DataFest 2017: Outpace Your Competition with Real-Time ResponsesDenodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo
 
How Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastHow Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments Webcast
Yellowbrick Data
 
Snaplogic Live: Big Data in Motion
Snaplogic Live: Big Data in MotionSnaplogic Live: Big Data in Motion
Snaplogic Live: Big Data in Motion
SnapLogic
 
Postgres Vision 2018: Your Migration Path - BinckBank Case Study
Postgres Vision 2018: Your Migration Path - BinckBank Case StudyPostgres Vision 2018: Your Migration Path - BinckBank Case Study
Postgres Vision 2018: Your Migration Path - BinckBank Case Study
EDB
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top Contenders
VoltDB
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Rittman Analytics
 
Denodo DataFest 2017: Edge Computing: Collecting vs. Connecting to Streaming ...
Denodo DataFest 2017: Edge Computing: Collecting vs. Connecting to Streaming ...Denodo DataFest 2017: Edge Computing: Collecting vs. Connecting to Streaming ...
Denodo DataFest 2017: Edge Computing: Collecting vs. Connecting to Streaming ...
Denodo
 

Viewers also liked (16)

Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
Hakka Labs
 
Big data prototyping in AWS cloud
Big data prototyping in AWS cloudBig data prototyping in AWS cloud
Big data prototyping in AWS cloud
Samuel Yee
 
The Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedInThe Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedIn
rajappaiyer
 
Application Lifecycle Transformation...a DevOps Discussion - By David Miller ...
Application Lifecycle Transformation...a DevOps Discussion - By David Miller ...Application Lifecycle Transformation...a DevOps Discussion - By David Miller ...
Application Lifecycle Transformation...a DevOps Discussion - By David Miller ...
Melissa Luongo
 
Designing Data Pipelines Using Hadoop
Designing Data Pipelines Using HadoopDesigning Data Pipelines Using Hadoop
Designing Data Pipelines Using Hadoop
DataWorks Summit
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
Christian Gügi
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Cloudera, Inc.
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
Lynn Langit
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
Hari Shreedharan
 
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
PyData
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
Lars Marius Garshol
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache Spark
DataWorks Summit
 
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Thoughtworks
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
Guido Schmutz
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
Hakka Labs
 
Big data prototyping in AWS cloud
Big data prototyping in AWS cloudBig data prototyping in AWS cloud
Big data prototyping in AWS cloud
Samuel Yee
 
The Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedInThe Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedIn
rajappaiyer
 
Application Lifecycle Transformation...a DevOps Discussion - By David Miller ...
Application Lifecycle Transformation...a DevOps Discussion - By David Miller ...Application Lifecycle Transformation...a DevOps Discussion - By David Miller ...
Application Lifecycle Transformation...a DevOps Discussion - By David Miller ...
Melissa Luongo
 
Designing Data Pipelines Using Hadoop
Designing Data Pipelines Using HadoopDesigning Data Pipelines Using Hadoop
Designing Data Pipelines Using Hadoop
DataWorks Summit
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
Christian Gügi
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Cloudera, Inc.
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
Lynn Langit
 
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
PyData
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
Lars Marius Garshol
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache Spark
DataWorks Summit
 
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Thoughtworks
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
Guido Schmutz
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
Ad

Similar to Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014 (20)

Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
Amihay Zer-Kavod
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
Denodo
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Alluxio, Inc.
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Denodo
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate Data
Inside Analysis
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
 
Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...
Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...
Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...
Joel Natividad
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
Inside Analysis
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
Kimmo Kantojärvi
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Seeling Cheung
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
Bob Hardaway
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Kinetica
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
Denodo
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
Karthik Murugesan
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
markgrover
 
Metadata Lakes for Next-Gen AI/ML - Datastrato
Metadata Lakes for Next-Gen AI/ML - DatastratoMetadata Lakes for Next-Gen AI/ML - Datastrato
Metadata Lakes for Next-Gen AI/ML - Datastrato
Zilliz
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
Dan Lynn
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
Amihay Zer-Kavod
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
Denodo
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Alluxio, Inc.
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Denodo
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate Data
Inside Analysis
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
 
Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...
Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...
Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...
Joel Natividad
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
Inside Analysis
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
Kimmo Kantojärvi
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Seeling Cheung
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
Bob Hardaway
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Kinetica
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
Denodo
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
Karthik Murugesan
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
markgrover
 
Metadata Lakes for Next-Gen AI/ML - Datastrato
Metadata Lakes for Next-Gen AI/ML - DatastratoMetadata Lakes for Next-Gen AI/ML - Datastrato
Metadata Lakes for Next-Gen AI/ML - Datastrato
Zilliz
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
Dan Lynn
 
Ad

Recently uploaded (20)

Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 

Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

  • 1. GoodData – the Case Study #2: Big Data Pipeline for Analytics at Scale DB Technologies for Big Data @ FIT CVUT 2014 GoodData Corporation. All Rights Reserved. November 19 2014
  • 2. GoodData Corporation. All Rights Reserved. GoodData Corporation
  • 3. GoodData Corporation. All Rights Reserved. End to End, Analytics Platform as a Service Traditional BI Data Visualization Tableau, Qlikview, Spotfire, etc. Analytics Engine Cognos, Oracle, Business Objects, etc. Data Marts MySQL, PostgreSQL, etc. Data Warehouse Oracle, Teradata, Netezza, Microsoft, etc. ETL Informatica, DataStage, Boomi, Snaplogic, etc. Infrastructure Servers, Storage, Networking, etc. Data Collaboration Data Visualization Analytics Engine Data Marts Data Warehouse ELT / ETL Infrastructure
  • 4. One Platform. Two Markets. For Your Customers Powered By GoodData Partner Program for disruptive ISVs including Zendesk, Switchfly, and Phizzle GoodData Corporation. All Rights Reserved. For Your Business Drive your business with your data. Experience and accelerators for Social, Sales, Marketing, Yammer
  • 5. GoodData Corporation. All Rights Reserved. Our Focus
  • 7. GoodData Corporation. All Rights Reserved. What The End Users See...
  • 8. GoodData Corporation. All Rights Reserved. What The End Users See...
  • 9. GoodData Corporation. All Rights Reserved. What Is In The Box...
  • 10. GoodData Corporation. All Rights Reserved. End to End, Analytics Platform as a Service Traditional BI Data Visualization Tableau, Qlikview, Spotfire, etc. Analytics Engine Cognos, Oracle, Business Objects, etc. Data Marts MySQL, PostgreSQL, etc. Data Warehouse Oracle, Teradata, Netezza, Microsoft, etc. ETL Informatica, DataStage, Boomi, Snaplogic, etc. Infrastructure Servers, Storage, Networking, etc. Data Collaboration Data Visualization Analytics Engine Data Marts Data Warehouse ELT / ETL Infrastructure
  • 11. GoodData Platform Zoom-In End to End, Analytics Platform as a Service Data Collaboration Data Visualization Analytics Engine Data Marts Data Warehouse ELT / ETL Infrastructure
  • 12. GoodData Analytics Platform - The Data Pipeline
  • 13. GoodData Corporation. All Rights Reserved. Let’s Start With The Outcome - The Insights
  • 14. GoodData Corporation. All Rights Reserved. Let’s Start With The Outcome - The Insights • User Experience ○ Visual Appeal ○ Ease of Use ○ Performance • Analytical Power • Many Data Sources ○ Need to cross analyze all of them ○ Need to add/remove sources as needed • Cost Efficiency ○ Computational density allowed by multi-tenancy
  • 15. GoodData Corporation. All Rights Reserved. Let’s Start With The Outcome - The Insights ● Analytical Engine / MAQL ● Exploration, Visualization and Distribution Layer ● Pluggable Database Backends ● 10s of GB up to TBs
  • 16. GoodData Corporation. All Rights Reserved. Behind The Scenes - The Big Data Pipeline • Large Data Throughput ○ Close to Real-time Updates • Many Data Sources ○ Need to cross analyze all of them ○ Need to add/remove sources as needed • Agility ○ Capture all data without knowing the analytical use case in advance • Cost Efficiency ○ Computational density allowed by multi-tenancy
  • 17. GoodData Corporation. All Rights Reserved. Behind The Scenes - The Big Data Pipeline • Big Data Store ○ 100s of TBs per customer ○ Persist All Incoming Data ○ CSV, XML, JSON, ... • Immutable ○ Append Only ○ Keep Ingestion History • Technologies ○ Amazon S3 ○ Cloud Files
  • 18. GoodData Corporation. All Rights Reserved. Behind The Scenes - The Big Data Pipeline • Agile Data Warehouse ○ 10s of TBs per customer ○ Relational Model ○ Semi-Cleansed ○ Complete History Captured • Technologies ○ HP Vertica ○ GoodData BI Integration Services
  • 19. GoodData Corporation. All Rights Reserved. Behind The Scenes - The Big Data Pipeline • Combine Input Stage Data Sets ○ Mapping, Cleansing • Perform Data Transformations in Data Warehouse ○ Benchmarking, Snapshotting, Sampling • Generate Data Mart Input Data ○ Data Warehouse : Data Mart relation is typically 1 : N ○ 10s of thousands Data Marts in PbG (OEM) use case!
  • 20. GoodData Corporation. All Rights Reserved. Behind The Scenes - The Big Data Pipeline • GoodData BI Integration Services ○ CloudConnect Runtime ○ Ruby Runtime ○ Data Integration Console Over 2M ETL jobs per week!
  • 21. GoodData Corporation. All Rights Reserved. The Wrap-Up - The Big Data Pipeline Progression Through: • Big Data Store • Data Warehouse • Data Marts As a means to satisfy the end user: • User Experience • Analytical Power • Many Data Sources • Cost Efficiency
  • 22. GoodData Corporation. All Rights Reserved. Questions?
  • 23. GoodData Corporation. All Rights Reserved. Thank you!