SlideShare a Scribd company logo
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - Strata - London 2019
Royal Caribbean Cruises, Ltd.
2
• Founded in 1968
• Six companies employing over 65,000
people from 120 countries who have
served over 50 million guests
• Fleet of over 55 ships and growing
• Countless industry “firsts” - such as rock
climbing wall, ice skating, and surfing at
sea
• Each brand delivering a unique Guest
experience
• www.rclcorporate.com
33
44
55
6
77
88
99
1010
1111
What is Cerebro™
Cerebro™ is a project under Excalibur’s data program
focused on delivering a next-generation data
management platform.
Design Drivers and Architecture Principles
12
Cerebro™ is Cloud Native
Cloud-native data lake architecture leveraging vendor managed services
13
Managed Services Container Based
Azure Data Lake Store Azure Data Factory
Storage Type Object Store Document Store Graph Store
Which Data? Sensor data;
financial data;
Reference data;
dynamic schema
Relationships
Which Queries Data science; BI;
large analytical jobs
Single record; small
batches; mutations
Relationship
analysis; mutations
Key Considerations Parquet and Arrow
accelerate queries
Ability to handle
streaming
workloads
Flexibility and ability
to handle
complexity
Cerebro™ Leverages Different Storage Engines
Why there is a need for a Heterogeneous Data Lake
14
Azure Data Lake Store (ADLS)
Cerebro™ Leverages In-
Memory Architecture
• Scalability via distributed in-
memory compute layer, object
storage
• Dremio and Spark anchor in-
memory computing layer
• Parquet and object store (ADLS)
for storage layer, plus MongoDB
and Neo4j
• Dremio and Arrow Flight further
accelerate access and in-
memory processing
15
Compute Layer
Storage Layer
Today Future
(with Arrow Flight)
Cerebro™ - Phase 1
16
• Initial release focused on ingestion of
sources spanning current data silos
• Establishment of a Raw Zone with
Landing and Staging Areas
• Physical storage is file based (CSV,
Parquet) on Azure Data Lake Store
(ADLS) to support variety and variability
of data
• Staging Area requires users to be
familiar with low level data structures in
order to execute queries joining
disparate source systems (e.g. multiple
PMS and Casino sources)
Raw
Zone
Cloud Object Store, Document Store, Graph
Standardized
Zone
Enriched
Zone
Ingest
Batch
CDC
Batch
SFTP
File
RDBMS
Reservations
Customer Master
Property Management
Casino
Clickstream
Marketing
Metadata Management, Data Catalog, Data Ingestion, Data Integration
Data Virtualization, Self-service BI, Advanced Analytics
Data
Engineers
Operational
Analytics
BI
Analysts
Self-Service
Dashboards
Data
Scientists
Advanced
Analytics
Data
Stewards
Compliance
Analytics
Landing Area
Staging Area
Transform Consume
Data Pipeline – Phase 1
17
Data
Engineers
Data
Scientists
• Talend utilized to ingest data from a
number of sources (RDBMS, File-based,
API) into CSV files stored in the Landing
Area (ADLS)
• Talend / Spark leveraged to create
Parquet files in the Staging Area (ADLS)
• In-memory columnar (Arrow) via Dremio
accelerates SQL based query access for
data engineering and data science use
cases
• Leverages data virtualization within
Dremio to support simple ad-hoc
integration and agile exploration
• Supports data science and advanced
analytics (AI/ML) via Azure Databricks
(Python, Scala, Java, R)
Ingest
Talend
Azure HDInsight
Persist
Azure Data Lake Store
Model/PredictExplore
Dremio
Azure Data Catalog
Azure Databricks
Python
Scala
Java
R
Roles
Azure Data Lake Store
Azure HDInsight
Azure Data Catalog
Cerebro™ - Phase 2
18
• Implementation of a Standardized Zone
based on semantic view of entities that
will be easier to query for casual users
• Introduction of MongoDB (Document)
will allow the platform to support low
latency ingestion and consumption of
customer data required to support
downstream applications (Call Center)
• Dremio still leveraged to support
analytical use cases involving customer
data stored in MongoDB (Marketing)
• Introduction of Neo4j (Graph) will
increase overall agility (relationships) as
well as provide insights by leveraging
advanced functionality (patterns,
recommendations)
Raw
Zone
Cloud Object Store, Document Store, Graph
Standardized
Zone
Enriched
Zone
Ingest
Batch
CDC
Batch
SFTP
File
RDBMS
Reservations
Customer Master
Property Management
Casino
Clickstream
Marketing
Metadata Management, Data Catalog, Data Ingestion, Data Integration
Data Virtualization, Self-service BI, Advanced Analytics
Data
Engineers
Operational
Analytics
BI
Analysts
Self-Service
Dashboards
Data
Scientists
Advanced
Analytics
Data
Stewards
Compliance
Analytics
Landing Area
Staging Area
Transform Consume
Downstream
Applications
Developers
Data Pipeline – Phase 2
19
Data
Engineers
Data
Scientists
Ingest/Process
Talend
Azure HDInsight
Azure Databricks
Azure Data Factory
Persist
Azure Data Lake Store
MongoDB Atlas
Neo4j
Model/PredictExplore/Visualize
Dremio
Azure Data Catalog
Power BI
Azure Databricks
Python
Scala
Java
R
Roles
• Talend used to develop pipelines that
process (cleanse, integrate, harmonize)
data sourced from Raw Zone
• Data resulting from pipeline executions
is persisted in the appropriate store(s)
(ADLS, Neo4j and MongoDB) to support
both analytical and operational
requirements
• Develop services to be consumed by
customer facing applications and other
downstream processes via managed
APIs
BI
Analysts
Data
Stewards
Services
Azure Functions
Apigee
Azure Kubernetes Service
Azure HDInsight
Azure Data Lake Store
Azure Data Catalog
Azure Data Factory
Azure Kubernetes Service
Azure Functions
User ExperienceProcessIngestData Sources
Consumers
Modern
Analytics
Modern
Data Platform
BusinessAnalystsDataScientists
Batch
Integration
Applications
Streaming
Integration
Kafka on
HDInsight
On-Premises
Property
Management
Customer
Master
Reservations
Casino
Spark on
HDInsight
Talend
Big Data
Azure Data Lake Store
External
Clickstream
Customer
Feedback
Campaign
Management
Neo4j Causal Cluster
Azure Event Hubs
Self-Service
Data Analytics
Azure Data Catalog
Advanced Analytics
Azure Data Factory
Data Services
Azure Functions
Azure Kubernetes Service
MongoDB Atlas
20
DBeaver EE

More Related Content

PPTX
Big Data in the Cloud with Azure Marketplace Images
PPTX
Building the Data Lake with Azure Data Factory and Data Lake Analytics
PDF
Creating a Modern Data Architecture
PDF
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
PPTX
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
PPTX
The Future of Data Warehousing: ETL Will Never be the Same
PPTX
Breakout: Hadoop and the Operational Data Store
PDF
Data lake analytics for the admin
Big Data in the Cloud with Azure Marketplace Images
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Creating a Modern Data Architecture
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
The Future of Data Warehousing: ETL Will Never be the Same
Breakout: Hadoop and the Operational Data Store
Data lake analytics for the admin

What's hot (20)

PPTX
Modernizing Your Data Warehouse using APS
PPTX
Versa Shore Microsoft APS PDW webinar
PPTX
Tools and approaches for migrating big datasets to the cloud
PDF
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
PDF
Modern Data architecture Design
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Building IoT and Big Data Solutions on Azure
PDF
Leap to Next Generation Data Management with Denodo 7.0
PDF
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
PPTX
Scalable data pipeline
PDF
Data platform architecture
PDF
Dremio introduction
PPTX
Data Virtualization and ETL
PPTX
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
PPTX
PDF
Bridging to a hybrid cloud data services architecture
PPTX
Azure Big Data Story
PPTX
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
PDF
Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...
PPSX
The Analytics Data Store: Information Supply Framework
Modernizing Your Data Warehouse using APS
Versa Shore Microsoft APS PDW webinar
Tools and approaches for migrating big datasets to the cloud
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Modern Data architecture Design
Data Lakehouse Symposium | Day 1 | Part 2
Building IoT and Big Data Solutions on Azure
Leap to Next Generation Data Management with Denodo 7.0
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Scalable data pipeline
Data platform architecture
Dremio introduction
Data Virtualization and ETL
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Bridging to a hybrid cloud data services architecture
Azure Big Data Story
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...
The Analytics Data Store: Information Supply Framework
Ad

Similar to Cerebro: Bringing together data scientists and bi users - Royal Caribbean - Strata - London 2019 (20)

PPTX
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
PDF
Serverless SQL
PPTX
Databricks Platform.pptx
PDF
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
PDF
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
PDF
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
PDF
Engineering practices in big data storage and processing
PDF
IBM Cloud Day January 2021 - A well architected data lake
PDF
Owning Your Own (Data) Lake House
PPTX
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
PPTX
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
PPTX
Transform your DBMS to drive engagement innovation with Big Data
PPTX
Lecture 5- Data Collection and Storage.pptx
PDF
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
PDF
Accelerate and modernize your data pipelines
PPTX
Modern data warehouse
PPTX
Big Data Analytics in the Cloud with Microsoft Azure
PPTX
From Data to Services at the Speed of Business
PDF
IBM Cloud Native Day April 2021: Serverless Data Lake
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
Serverless SQL
Databricks Platform.pptx
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
Engineering practices in big data storage and processing
IBM Cloud Day January 2021 - A well architected data lake
Owning Your Own (Data) Lake House
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Transform your DBMS to drive engagement innovation with Big Data
Lecture 5- Data Collection and Storage.pptx
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Accelerate and modernize your data pipelines
Modern data warehouse
Big Data Analytics in the Cloud with Microsoft Azure
From Data to Services at the Speed of Business
IBM Cloud Native Day April 2021: Serverless Data Lake
Ad

Recently uploaded (20)

PDF
How AI Agents Improve Data Accuracy and Consistency in Due Diligence.pdf
PDF
DevOps & Developer Experience Summer BBQ
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PDF
ai-archetype-understanding-the-personality-of-agentic-ai.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Web Security: Login Bypass, SQLi, CSRF & XSS.pptx
PDF
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
KodekX | Application Modernization Development
PPTX
CroxyProxy Instagram Access id login.pptx
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
PPTX
Belt and Road Supply Chain Finance Blockchain Solution
PDF
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
PDF
Enable Enterprise-Ready Security on IBM i Systems.pdf
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
How AI Agents Improve Data Accuracy and Consistency in Due Diligence.pdf
DevOps & Developer Experience Summer BBQ
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
ai-archetype-understanding-the-personality-of-agentic-ai.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Web Security: Login Bypass, SQLi, CSRF & XSS.pptx
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
Understanding_Digital_Forensics_Presentation.pptx
Modernizing your data center with Dell and AMD
KodekX | Application Modernization Development
CroxyProxy Instagram Access id login.pptx
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Belt and Road Supply Chain Finance Blockchain Solution
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
Enable Enterprise-Ready Security on IBM i Systems.pdf
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

Cerebro: Bringing together data scientists and bi users - Royal Caribbean - Strata - London 2019

  • 2. Royal Caribbean Cruises, Ltd. 2 • Founded in 1968 • Six companies employing over 65,000 people from 120 countries who have served over 50 million guests • Fleet of over 55 ships and growing • Countless industry “firsts” - such as rock climbing wall, ice skating, and surfing at sea • Each brand delivering a unique Guest experience • www.rclcorporate.com
  • 3. 33
  • 4. 44
  • 5. 55
  • 6. 6
  • 7. 77
  • 8. 88
  • 9. 99
  • 10. 1010
  • 11. 1111
  • 12. What is Cerebro™ Cerebro™ is a project under Excalibur’s data program focused on delivering a next-generation data management platform. Design Drivers and Architecture Principles 12
  • 13. Cerebro™ is Cloud Native Cloud-native data lake architecture leveraging vendor managed services 13 Managed Services Container Based Azure Data Lake Store Azure Data Factory
  • 14. Storage Type Object Store Document Store Graph Store Which Data? Sensor data; financial data; Reference data; dynamic schema Relationships Which Queries Data science; BI; large analytical jobs Single record; small batches; mutations Relationship analysis; mutations Key Considerations Parquet and Arrow accelerate queries Ability to handle streaming workloads Flexibility and ability to handle complexity Cerebro™ Leverages Different Storage Engines Why there is a need for a Heterogeneous Data Lake 14 Azure Data Lake Store (ADLS)
  • 15. Cerebro™ Leverages In- Memory Architecture • Scalability via distributed in- memory compute layer, object storage • Dremio and Spark anchor in- memory computing layer • Parquet and object store (ADLS) for storage layer, plus MongoDB and Neo4j • Dremio and Arrow Flight further accelerate access and in- memory processing 15 Compute Layer Storage Layer Today Future (with Arrow Flight)
  • 16. Cerebro™ - Phase 1 16 • Initial release focused on ingestion of sources spanning current data silos • Establishment of a Raw Zone with Landing and Staging Areas • Physical storage is file based (CSV, Parquet) on Azure Data Lake Store (ADLS) to support variety and variability of data • Staging Area requires users to be familiar with low level data structures in order to execute queries joining disparate source systems (e.g. multiple PMS and Casino sources) Raw Zone Cloud Object Store, Document Store, Graph Standardized Zone Enriched Zone Ingest Batch CDC Batch SFTP File RDBMS Reservations Customer Master Property Management Casino Clickstream Marketing Metadata Management, Data Catalog, Data Ingestion, Data Integration Data Virtualization, Self-service BI, Advanced Analytics Data Engineers Operational Analytics BI Analysts Self-Service Dashboards Data Scientists Advanced Analytics Data Stewards Compliance Analytics Landing Area Staging Area Transform Consume
  • 17. Data Pipeline – Phase 1 17 Data Engineers Data Scientists • Talend utilized to ingest data from a number of sources (RDBMS, File-based, API) into CSV files stored in the Landing Area (ADLS) • Talend / Spark leveraged to create Parquet files in the Staging Area (ADLS) • In-memory columnar (Arrow) via Dremio accelerates SQL based query access for data engineering and data science use cases • Leverages data virtualization within Dremio to support simple ad-hoc integration and agile exploration • Supports data science and advanced analytics (AI/ML) via Azure Databricks (Python, Scala, Java, R) Ingest Talend Azure HDInsight Persist Azure Data Lake Store Model/PredictExplore Dremio Azure Data Catalog Azure Databricks Python Scala Java R Roles Azure Data Lake Store Azure HDInsight Azure Data Catalog
  • 18. Cerebro™ - Phase 2 18 • Implementation of a Standardized Zone based on semantic view of entities that will be easier to query for casual users • Introduction of MongoDB (Document) will allow the platform to support low latency ingestion and consumption of customer data required to support downstream applications (Call Center) • Dremio still leveraged to support analytical use cases involving customer data stored in MongoDB (Marketing) • Introduction of Neo4j (Graph) will increase overall agility (relationships) as well as provide insights by leveraging advanced functionality (patterns, recommendations) Raw Zone Cloud Object Store, Document Store, Graph Standardized Zone Enriched Zone Ingest Batch CDC Batch SFTP File RDBMS Reservations Customer Master Property Management Casino Clickstream Marketing Metadata Management, Data Catalog, Data Ingestion, Data Integration Data Virtualization, Self-service BI, Advanced Analytics Data Engineers Operational Analytics BI Analysts Self-Service Dashboards Data Scientists Advanced Analytics Data Stewards Compliance Analytics Landing Area Staging Area Transform Consume Downstream Applications Developers
  • 19. Data Pipeline – Phase 2 19 Data Engineers Data Scientists Ingest/Process Talend Azure HDInsight Azure Databricks Azure Data Factory Persist Azure Data Lake Store MongoDB Atlas Neo4j Model/PredictExplore/Visualize Dremio Azure Data Catalog Power BI Azure Databricks Python Scala Java R Roles • Talend used to develop pipelines that process (cleanse, integrate, harmonize) data sourced from Raw Zone • Data resulting from pipeline executions is persisted in the appropriate store(s) (ADLS, Neo4j and MongoDB) to support both analytical and operational requirements • Develop services to be consumed by customer facing applications and other downstream processes via managed APIs BI Analysts Data Stewards Services Azure Functions Apigee Azure Kubernetes Service Azure HDInsight Azure Data Lake Store Azure Data Catalog Azure Data Factory Azure Kubernetes Service Azure Functions
  • 20. User ExperienceProcessIngestData Sources Consumers Modern Analytics Modern Data Platform BusinessAnalystsDataScientists Batch Integration Applications Streaming Integration Kafka on HDInsight On-Premises Property Management Customer Master Reservations Casino Spark on HDInsight Talend Big Data Azure Data Lake Store External Clickstream Customer Feedback Campaign Management Neo4j Causal Cluster Azure Event Hubs Self-Service Data Analytics Azure Data Catalog Advanced Analytics Azure Data Factory Data Services Azure Functions Azure Kubernetes Service MongoDB Atlas 20 DBeaver EE