SlideShare a Scribd company logo
© 2019 Bloomberg Finance L.P. All rights reserved.
How data modelling helps serve
billions of queries in millisecond
latency with real-time streaming
NoSQL Day 2019 at DataWorks Summit DC
May 21, 2019
Udai Bhan Kashyap, Senior Software Engineer (Equities)
Amit Anand, Senior Software Engineer (Hadoop Infrastructure)
© 2019 Bloomberg Finance L.P. All rights reserved.© 2019 Bloomberg Finance L.P. All rights reserved.
Agenda
About Bloomberg
Problem Definition
HBase and Data Modeling
Q&A
© 2019 Bloomberg Finance L.P. All rights reserved.© 2019 Bloomberg Finance L.P. All rights reserved.
Bloomberg in nutshell
Image Source: Bloomberg
© 2019 Bloomberg Finance L.P. All rights reserved.© 2019 Bloomberg Finance L.P. All rights reserved.
Bloomberg by numbers
Founded in 1981
325,000 subscribers in 170+ countries
Over 19,000 employees in 192 locations
More News reporters than The New York Times + Washington Post + Chicago
Tribune
© 2019 Bloomberg Finance L.P. All rights reserved.© 2019 Bloomberg Finance L.P. All rights reserved.
Bloomberg Tech
5,500+ Software engineers and growing
100+ engineers and data scientists devoted to machine learning
One of the largest private networks in the world
100B+ market data ticks per day, peak of 10 million+ messages/sec
News content from 125K+ sources
2M news stories each day (that's >500 news stories ingested/sec)
More than a billion emails and IB chats handled each day
© 2019 Bloomberg Finance L.P. All rights reserved.
Serve billions of queries with low latency, transactional consistency, and real-
time streaming
Problem Definition
© 2019 Bloomberg Finance L.P. All rights reserved.
Fast access and availability of the data is key for users and
automated systems due to the market-impacting nature of the data
Wide variety of applications (e.g., Quantitative Analysis,
Backtesting, Screening, etc.) require huge amount of data,
demanding a high throughput and low latency system
Low Latency
Image source: Pixabay
© 2019 Bloomberg Finance L.P. All rights reserved.
Real-Time Streaming
Image source: Pixabay
© 2019 Bloomberg Finance L.P. All rights reserved.
Our data/analytics is
used by traders, fund
managers, educational
institutions …
And it impacts the
market!
Image source: Bloomberg
© 2019 Bloomberg Finance L.P. All rights reserved.
Transactional Consistency
Google’s 2018 Annual Filing
© 2019 Bloomberg Finance L.P. All rights reserved.
Transactional Consistency
© 2019 Bloomberg Finance L.P. All rights reserved.
Transactional Consistency
The numbers
together tell the
story
© 2019 Bloomberg Finance L.P. All rights reserved.
“Revenue” = “Sales & Services Revenue” + “Other Revenue”
“Gross Profit”= “Revenue” - “Cost of Revenue”
Transactional Consistency
© 2019 Bloomberg Finance L.P. All rights reserved.
HBase
© 2019 Bloomberg Finance L.P. All rights reserved.
Ordered Key-Value Store
Distributed - provides horizontal scaling
Low Latency Random Access – necessary for interactive/real-time systems
Auto Sharding – operational ease!
HBase
© 2019 Bloomberg Finance L.P. All rights reserved.© 2019 Bloomberg Finance L.P. All rights reserved.
ZooKeeper
Region Server
Region
Region
Region
Region
Region Server
Region
Region
Region
Region
Region Server
Region
Region
Region
Region
ZooKeeper ZooKeeper
HDFS Storage
HBase
Master
(Backup)
HBase
Master
(Active)
HBase Architecture
© 2019 Bloomberg Finance L.P. All rights reserved.
Data Modeling
© 2019 Bloomberg Finance L.P. All rights reserved.
Image Source: http://www.spatialdbadvisor.com/data_models
Data Modeling - RDBMS
© 2019 Bloomberg Finance L.P. All rights reserved.
RowKey – analogous to Index in RDBMS
Column Family
Column
HBase Table - Main components
© 2019 Bloomberg Finance L.P. All rights reserved.
Distribution
© 2019 Bloomberg Finance L.P. All rights reserved.
● Pre-Splitting Regions
○ Avoids hotspotting
○ Helps in initial mass on-boarding of the data
○ Minimizes split after deployment
Distribution - Be in control
Image source: Pixabay
© 2019 Bloomberg Finance L.P. All rights reserved.
● Key Distribution
○ Keys should be distributed (not related keys)
○ Identify components of the key that require
distribution
○ Generate hash based on identified components
○ Prepend the hash to your key
○ Do not allow region split to occur for the key
components involved in distribution. For example, in
our case, we won’t let IBM’s data be distributed
across multiple regions
Distribution - Be in control
Image source: Pixabay
© 2019 Bloomberg Finance L.P. All rights reserved.
Key/Index Design
Right key is important
HBase supports only one key on a table
Lexicographically sorted
Unlike RDBMS, key doesn’t need to be part of columns
Sequence of bytes
Image source: Pixabay
© 2019 Bloomberg Finance L.P. All rights reserved.
Design logical composite key as opposed to hash key
Logical key allows predicates on key’s components
Logical key preserves locality of related data
Only “Get” APIs can be used on hash key
Design small keys
Key/Index Design
Image source:Pixabay
© 2019 Bloomberg Finance L.P. All rights reserved.
What should be in the Key?
Image sources:Pixabay
© 2019 Bloomberg Finance L.P. All rights reserved.
Data items that require predicates
Data items that do not require atomic updates
What should be in the Key?
© 2019 Bloomberg Finance L.P. All rights reserved.
HBase doesn’t demand schema for table, but column families are tied to the
table and must be provided at table creation
Unlike RDBMS, columns are tied to Row(Key) and column sets can be unique
for each row
While there is no name for the key, the columns and column families are
named using ASCII characters
Avoid long names for columns and column families
Column Family and Columns
© 2019 Bloomberg Finance L.P. All rights reserved.
Column Family and Column names become part of the RowKey and are
carried with every data point
Column Family and Columns
Key-1 V2 Vn
Key-1 CF1:CQ1 V1
RDBMS V1
HBase
Key-1 CF1:CQn Vn
© 2019 Bloomberg Finance L.P. All rights reserved.
Grouping of related data items
Each column family has its own storage file
Column Family
© 2019 Bloomberg Finance L.P. All rights reserved.
Analyze your dataset to identify different patterns (e.g., sizes and/or queries)
Each pattern can be a column family
Identifying Column Families
© 2019 Bloomberg Finance L.P. All rights reserved.
What should be columns?
© 2019 Bloomberg Finance L.P. All rights reserved.
Data items requiring atomic updates or transactional behavior should be
columns
Last component of logical composite key can be a column
Key-1 CF1:CQ1 V1
Key-1 CF1:CQn Vn
What should be columns?
© 2019 Bloomberg Finance L.P. All rights reserved.
Why are we doing all this?
© 2019 Bloomberg Finance L.P. All rights reserved.
Atomicity – An operation completely succeeds or completely fails
Consistency – A successful mutation is visible to all read clients
Isolation – Operations do not interfere with each other
Durability – A successful mutation shall not be lost (even after system crash)!
HBase supports these properties at the Row level
https://hbase.apache.org/acid-semantics.html
ACID and HBase
© 2019 Bloomberg Finance L.P. All rights reserved.
RDBMS vs. HBase
Unlimited
# Rows
Fixed # Columns
Unlimited
# Rows
Varying and Unlimited # Columns
RDBMS HBase
© 2019 Bloomberg Finance L.P. All rights reserved.
d1+d2+..dn CF1:C1 value
d1+d2+..dn CF1:C2 value
d1+d2+..dn CF1:Cn value
HBase
D1 D2 … Dn C1 C2 … Cn
D1 D2 … Dn
d1 d2 … dn c1 c2 … cn
RDBMS
RDBMS vs. HBase
© 2019 Bloomberg Finance L.P. All rights reserved.
Data flow architecture
Publishing
Queue
Transformation
HBase
Middleware
Publishing
Publishing
Transformation
© 2019 Bloomberg Finance L.P. All rights reserved.
RETRIEVER
UPDATERS
QueueCluster
HBaseCluster
MSG Set
ACK
Groups of
MSGs
ACKS
Waiting
for ACKs
Apply
Changes
Reads
Updates
Streaming task architecture
© 2019 Bloomberg Finance L.P. All rights reserved.
Thank you!
We are Hiring!!
https://www.bloomberg.com/careers
Questions?
Ad

More Related Content

What's hot (20)

Postgres Vision 2018: Your Migration Path - Rabobank and a New DBaaS
Postgres Vision 2018: Your Migration Path - Rabobank and a New DBaaS  Postgres Vision 2018: Your Migration Path - Rabobank and a New DBaaS
Postgres Vision 2018: Your Migration Path - Rabobank and a New DBaaS
EDB
 
Postgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres EverywherePostgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres Everywhere
EDB
 
GDPR: the IBM journey to compliance
GDPR: the IBM journey to complianceGDPR: the IBM journey to compliance
GDPR: the IBM journey to compliance
DataWorks Summit
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
Hortonworks
 
Postgres Vision 2018: AI Needs IA
Postgres Vision 2018: AI Needs IAPostgres Vision 2018: AI Needs IA
Postgres Vision 2018: AI Needs IA
EDB
 
Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0 Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0
ShiHeng1
 
Postgres Vision 2018: Data as the New Oil
Postgres Vision 2018: Data as the New OilPostgres Vision 2018: Data as the New Oil
Postgres Vision 2018: Data as the New Oil
EDB
 
Postgres Vision 2018: The Pragmatic Cloud
Postgres Vision 2018:  The Pragmatic CloudPostgres Vision 2018:  The Pragmatic Cloud
Postgres Vision 2018: The Pragmatic Cloud
EDB
 
PgConf 2018 - Postgres in a World of DevOps
PgConf 2018 - Postgres in a World of DevOpsPgConf 2018 - Postgres in a World of DevOps
PgConf 2018 - Postgres in a World of DevOps
EDB
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @Indix
Manoj Mahalingam
 
Adapting to the exponential development of technology
Adapting to the exponential development of technologyAdapting to the exponential development of technology
Adapting to the exponential development of technology
DataWorks Summit
 
The Manulife Journey
The Manulife JourneyThe Manulife Journey
The Manulife Journey
DataWorks Summit
 
Postgres Vision 2018: The Changing Role of the DBA in the Cloud
Postgres Vision 2018: The Changing Role of the DBA in the CloudPostgres Vision 2018: The Changing Role of the DBA in the Cloud
Postgres Vision 2018: The Changing Role of the DBA in the Cloud
EDB
 
Postgres Vision 2018: Making Modern an Old Legacy System
Postgres Vision 2018: Making Modern an Old Legacy SystemPostgres Vision 2018: Making Modern an Old Legacy System
Postgres Vision 2018: Making Modern an Old Legacy System
EDB
 
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."
Gustavo Cuervo
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - Informatica
Sanjeev Kumar
 
The case of vehicle networking financial services accomplished by China Mobile
The case of vehicle networking financial services accomplished by China MobileThe case of vehicle networking financial services accomplished by China Mobile
The case of vehicle networking financial services accomplished by China Mobile
DataWorks Summit
 
IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17
David Spurway
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Kai Wähner
 
Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA
Kai Wähner
 
Postgres Vision 2018: Your Migration Path - Rabobank and a New DBaaS
Postgres Vision 2018: Your Migration Path - Rabobank and a New DBaaS  Postgres Vision 2018: Your Migration Path - Rabobank and a New DBaaS
Postgres Vision 2018: Your Migration Path - Rabobank and a New DBaaS
EDB
 
Postgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres EverywherePostgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres Everywhere
EDB
 
GDPR: the IBM journey to compliance
GDPR: the IBM journey to complianceGDPR: the IBM journey to compliance
GDPR: the IBM journey to compliance
DataWorks Summit
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
Hortonworks
 
Postgres Vision 2018: AI Needs IA
Postgres Vision 2018: AI Needs IAPostgres Vision 2018: AI Needs IA
Postgres Vision 2018: AI Needs IA
EDB
 
Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0 Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0
ShiHeng1
 
Postgres Vision 2018: Data as the New Oil
Postgres Vision 2018: Data as the New OilPostgres Vision 2018: Data as the New Oil
Postgres Vision 2018: Data as the New Oil
EDB
 
Postgres Vision 2018: The Pragmatic Cloud
Postgres Vision 2018:  The Pragmatic CloudPostgres Vision 2018:  The Pragmatic Cloud
Postgres Vision 2018: The Pragmatic Cloud
EDB
 
PgConf 2018 - Postgres in a World of DevOps
PgConf 2018 - Postgres in a World of DevOpsPgConf 2018 - Postgres in a World of DevOps
PgConf 2018 - Postgres in a World of DevOps
EDB
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @Indix
Manoj Mahalingam
 
Adapting to the exponential development of technology
Adapting to the exponential development of technologyAdapting to the exponential development of technology
Adapting to the exponential development of technology
DataWorks Summit
 
Postgres Vision 2018: The Changing Role of the DBA in the Cloud
Postgres Vision 2018: The Changing Role of the DBA in the CloudPostgres Vision 2018: The Changing Role of the DBA in the Cloud
Postgres Vision 2018: The Changing Role of the DBA in the Cloud
EDB
 
Postgres Vision 2018: Making Modern an Old Legacy System
Postgres Vision 2018: Making Modern an Old Legacy SystemPostgres Vision 2018: Making Modern an Old Legacy System
Postgres Vision 2018: Making Modern an Old Legacy System
EDB
 
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."
Gustavo Cuervo
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - Informatica
Sanjeev Kumar
 
The case of vehicle networking financial services accomplished by China Mobile
The case of vehicle networking financial services accomplished by China MobileThe case of vehicle networking financial services accomplished by China Mobile
The case of vehicle networking financial services accomplished by China Mobile
DataWorks Summit
 
IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17
David Spurway
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Kai Wähner
 
Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA
Kai Wähner
 

Similar to How data modelling helps serve billions of queries in millisecond latency with real-time streaming (20)

HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And Operations
Biju Nair
 
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
confluent
 
¿Son las bases de datos de contabilidad interesantes, o son parte del hype al...
¿Son las bases de datos de contabilidad interesantes, o son parte del hype al...¿Son las bases de datos de contabilidad interesantes, o son parte del hype al...
¿Son las bases de datos de contabilidad interesantes, o son parte del hype al...
javier ramirez
 
Real-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka StreamsReal-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka Streams
confluent
 
AI APIs as a Catalyst for Machine Learning Initiatives
AI APIs as a Catalyst for Machine Learning InitiativesAI APIs as a Catalyst for Machine Learning Initiatives
AI APIs as a Catalyst for Machine Learning Initiatives
Nicholas Walsh
 
The Power of Event Driven Caches (Brendan Powers, Bloomberg L.P) Kafka Summit...
The Power of Event Driven Caches (Brendan Powers, Bloomberg L.P) Kafka Summit...The Power of Event Driven Caches (Brendan Powers, Bloomberg L.P) Kafka Summit...
The Power of Event Driven Caches (Brendan Powers, Bloomberg L.P) Kafka Summit...
confluent
 
VOGIN-IP-lezing-edgar_meij
VOGIN-IP-lezing-edgar_meijVOGIN-IP-lezing-edgar_meij
VOGIN-IP-lezing-edgar_meij
voginip
 
Ibm red-hat-presentation
Ibm red-hat-presentationIbm red-hat-presentation
Ibm red-hat-presentation
jane smith
 
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Boaz Ziniman
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Summits
 
"FinOps and its role in Arity / Allstate" - Mike Rosenberg at the Chicago AWS...
"FinOps and its role in Arity / Allstate" - Mike Rosenberg at the Chicago AWS..."FinOps and its role in Arity / Allstate" - Mike Rosenberg at the Chicago AWS...
"FinOps and its role in Arity / Allstate" - Mike Rosenberg at the Chicago AWS...
AWS Chicago
 
TECHTalks - Philadelphia PA - Brien Blandford
  TECHTalks - Philadelphia PA - Brien Blandford  TECHTalks - Philadelphia PA - Brien Blandford
TECHTalks - Philadelphia PA - Brien Blandford
EagleDream Technologies
 
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
confluent
 
API Economy: 2016 Horizonwatch Trend Brief
API Economy:  2016 Horizonwatch Trend BriefAPI Economy:  2016 Horizonwatch Trend Brief
API Economy: 2016 Horizonwatch Trend Brief
Bill Chamberlin
 
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
AWS Summits
 
Iag api management architect presentation
Iag   api management architect presentationIag   api management architect presentation
Iag api management architect presentation
sflynn073
 
Future Roles in AI-enabled Organisations
Future Roles in AI-enabled OrganisationsFuture Roles in AI-enabled Organisations
Future Roles in AI-enabled Organisations
Inspirient
 
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWSAWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summits
 
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
AWS Summits
 
2019 06-12-aws taipei summit-dev day-essential capabilities behind microservices
2019 06-12-aws taipei summit-dev day-essential capabilities behind microservices2019 06-12-aws taipei summit-dev day-essential capabilities behind microservices
2019 06-12-aws taipei summit-dev day-essential capabilities behind microservices
Kim Kao
 
HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And Operations
Biju Nair
 
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
confluent
 
¿Son las bases de datos de contabilidad interesantes, o son parte del hype al...
¿Son las bases de datos de contabilidad interesantes, o son parte del hype al...¿Son las bases de datos de contabilidad interesantes, o son parte del hype al...
¿Son las bases de datos de contabilidad interesantes, o son parte del hype al...
javier ramirez
 
Real-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka StreamsReal-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka Streams
confluent
 
AI APIs as a Catalyst for Machine Learning Initiatives
AI APIs as a Catalyst for Machine Learning InitiativesAI APIs as a Catalyst for Machine Learning Initiatives
AI APIs as a Catalyst for Machine Learning Initiatives
Nicholas Walsh
 
The Power of Event Driven Caches (Brendan Powers, Bloomberg L.P) Kafka Summit...
The Power of Event Driven Caches (Brendan Powers, Bloomberg L.P) Kafka Summit...The Power of Event Driven Caches (Brendan Powers, Bloomberg L.P) Kafka Summit...
The Power of Event Driven Caches (Brendan Powers, Bloomberg L.P) Kafka Summit...
confluent
 
VOGIN-IP-lezing-edgar_meij
VOGIN-IP-lezing-edgar_meijVOGIN-IP-lezing-edgar_meij
VOGIN-IP-lezing-edgar_meij
voginip
 
Ibm red-hat-presentation
Ibm red-hat-presentationIbm red-hat-presentation
Ibm red-hat-presentation
jane smith
 
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Boaz Ziniman
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Summits
 
"FinOps and its role in Arity / Allstate" - Mike Rosenberg at the Chicago AWS...
"FinOps and its role in Arity / Allstate" - Mike Rosenberg at the Chicago AWS..."FinOps and its role in Arity / Allstate" - Mike Rosenberg at the Chicago AWS...
"FinOps and its role in Arity / Allstate" - Mike Rosenberg at the Chicago AWS...
AWS Chicago
 
TECHTalks - Philadelphia PA - Brien Blandford
  TECHTalks - Philadelphia PA - Brien Blandford  TECHTalks - Philadelphia PA - Brien Blandford
TECHTalks - Philadelphia PA - Brien Blandford
EagleDream Technologies
 
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
confluent
 
API Economy: 2016 Horizonwatch Trend Brief
API Economy:  2016 Horizonwatch Trend BriefAPI Economy:  2016 Horizonwatch Trend Brief
API Economy: 2016 Horizonwatch Trend Brief
Bill Chamberlin
 
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...
AWS Summits
 
Iag api management architect presentation
Iag   api management architect presentationIag   api management architect presentation
Iag api management architect presentation
sflynn073
 
Future Roles in AI-enabled Organisations
Future Roles in AI-enabled OrganisationsFuture Roles in AI-enabled Organisations
Future Roles in AI-enabled Organisations
Inspirient
 
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWSAWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summits
 
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
AWS Summits
 
2019 06-12-aws taipei summit-dev day-essential capabilities behind microservices
2019 06-12-aws taipei summit-dev day-essential capabilities behind microservices2019 06-12-aws taipei summit-dev day-essential capabilities behind microservices
2019 06-12-aws taipei summit-dev day-essential capabilities behind microservices
Kim Kao
 
Ad

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Ad

Recently uploaded (20)

Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Unlocking the Power of IVR: A Comprehensive Guide
Unlocking the Power of IVR: A Comprehensive GuideUnlocking the Power of IVR: A Comprehensive Guide
Unlocking the Power of IVR: A Comprehensive Guide
vikasascentbpo
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Build 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHSBuild 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHS
TECH EHS Solution
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Social Media App Development Company-EmizenTech
Social Media App Development Company-EmizenTechSocial Media App Development Company-EmizenTech
Social Media App Development Company-EmizenTech
Steve Jonas
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Top 10 IT Help Desk Outsourcing Services
Top 10 IT Help Desk Outsourcing ServicesTop 10 IT Help Desk Outsourcing Services
Top 10 IT Help Desk Outsourcing Services
Infrassist Technologies Pvt. Ltd.
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Unlocking the Power of IVR: A Comprehensive Guide
Unlocking the Power of IVR: A Comprehensive GuideUnlocking the Power of IVR: A Comprehensive Guide
Unlocking the Power of IVR: A Comprehensive Guide
vikasascentbpo
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Build 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHSBuild 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHS
TECH EHS Solution
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Social Media App Development Company-EmizenTech
Social Media App Development Company-EmizenTechSocial Media App Development Company-EmizenTech
Social Media App Development Company-EmizenTech
Steve Jonas
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 

How data modelling helps serve billions of queries in millisecond latency with real-time streaming

  • 1. © 2019 Bloomberg Finance L.P. All rights reserved. How data modelling helps serve billions of queries in millisecond latency with real-time streaming NoSQL Day 2019 at DataWorks Summit DC May 21, 2019 Udai Bhan Kashyap, Senior Software Engineer (Equities) Amit Anand, Senior Software Engineer (Hadoop Infrastructure)
  • 2. © 2019 Bloomberg Finance L.P. All rights reserved.© 2019 Bloomberg Finance L.P. All rights reserved. Agenda About Bloomberg Problem Definition HBase and Data Modeling Q&A
  • 3. © 2019 Bloomberg Finance L.P. All rights reserved.© 2019 Bloomberg Finance L.P. All rights reserved. Bloomberg in nutshell Image Source: Bloomberg
  • 4. © 2019 Bloomberg Finance L.P. All rights reserved.© 2019 Bloomberg Finance L.P. All rights reserved. Bloomberg by numbers Founded in 1981 325,000 subscribers in 170+ countries Over 19,000 employees in 192 locations More News reporters than The New York Times + Washington Post + Chicago Tribune
  • 5. © 2019 Bloomberg Finance L.P. All rights reserved.© 2019 Bloomberg Finance L.P. All rights reserved. Bloomberg Tech 5,500+ Software engineers and growing 100+ engineers and data scientists devoted to machine learning One of the largest private networks in the world 100B+ market data ticks per day, peak of 10 million+ messages/sec News content from 125K+ sources 2M news stories each day (that's >500 news stories ingested/sec) More than a billion emails and IB chats handled each day
  • 6. © 2019 Bloomberg Finance L.P. All rights reserved. Serve billions of queries with low latency, transactional consistency, and real- time streaming Problem Definition
  • 7. © 2019 Bloomberg Finance L.P. All rights reserved. Fast access and availability of the data is key for users and automated systems due to the market-impacting nature of the data Wide variety of applications (e.g., Quantitative Analysis, Backtesting, Screening, etc.) require huge amount of data, demanding a high throughput and low latency system Low Latency Image source: Pixabay
  • 8. © 2019 Bloomberg Finance L.P. All rights reserved. Real-Time Streaming Image source: Pixabay
  • 9. © 2019 Bloomberg Finance L.P. All rights reserved. Our data/analytics is used by traders, fund managers, educational institutions … And it impacts the market! Image source: Bloomberg
  • 10. © 2019 Bloomberg Finance L.P. All rights reserved. Transactional Consistency Google’s 2018 Annual Filing
  • 11. © 2019 Bloomberg Finance L.P. All rights reserved. Transactional Consistency
  • 12. © 2019 Bloomberg Finance L.P. All rights reserved. Transactional Consistency The numbers together tell the story
  • 13. © 2019 Bloomberg Finance L.P. All rights reserved. “Revenue” = “Sales & Services Revenue” + “Other Revenue” “Gross Profit”= “Revenue” - “Cost of Revenue” Transactional Consistency
  • 14. © 2019 Bloomberg Finance L.P. All rights reserved. HBase
  • 15. © 2019 Bloomberg Finance L.P. All rights reserved. Ordered Key-Value Store Distributed - provides horizontal scaling Low Latency Random Access – necessary for interactive/real-time systems Auto Sharding – operational ease! HBase
  • 16. © 2019 Bloomberg Finance L.P. All rights reserved.© 2019 Bloomberg Finance L.P. All rights reserved. ZooKeeper Region Server Region Region Region Region Region Server Region Region Region Region Region Server Region Region Region Region ZooKeeper ZooKeeper HDFS Storage HBase Master (Backup) HBase Master (Active) HBase Architecture
  • 17. © 2019 Bloomberg Finance L.P. All rights reserved. Data Modeling
  • 18. © 2019 Bloomberg Finance L.P. All rights reserved. Image Source: http://www.spatialdbadvisor.com/data_models Data Modeling - RDBMS
  • 19. © 2019 Bloomberg Finance L.P. All rights reserved. RowKey – analogous to Index in RDBMS Column Family Column HBase Table - Main components
  • 20. © 2019 Bloomberg Finance L.P. All rights reserved. Distribution
  • 21. © 2019 Bloomberg Finance L.P. All rights reserved. ● Pre-Splitting Regions ○ Avoids hotspotting ○ Helps in initial mass on-boarding of the data ○ Minimizes split after deployment Distribution - Be in control Image source: Pixabay
  • 22. © 2019 Bloomberg Finance L.P. All rights reserved. ● Key Distribution ○ Keys should be distributed (not related keys) ○ Identify components of the key that require distribution ○ Generate hash based on identified components ○ Prepend the hash to your key ○ Do not allow region split to occur for the key components involved in distribution. For example, in our case, we won’t let IBM’s data be distributed across multiple regions Distribution - Be in control Image source: Pixabay
  • 23. © 2019 Bloomberg Finance L.P. All rights reserved. Key/Index Design Right key is important HBase supports only one key on a table Lexicographically sorted Unlike RDBMS, key doesn’t need to be part of columns Sequence of bytes Image source: Pixabay
  • 24. © 2019 Bloomberg Finance L.P. All rights reserved. Design logical composite key as opposed to hash key Logical key allows predicates on key’s components Logical key preserves locality of related data Only “Get” APIs can be used on hash key Design small keys Key/Index Design Image source:Pixabay
  • 25. © 2019 Bloomberg Finance L.P. All rights reserved. What should be in the Key? Image sources:Pixabay
  • 26. © 2019 Bloomberg Finance L.P. All rights reserved. Data items that require predicates Data items that do not require atomic updates What should be in the Key?
  • 27. © 2019 Bloomberg Finance L.P. All rights reserved. HBase doesn’t demand schema for table, but column families are tied to the table and must be provided at table creation Unlike RDBMS, columns are tied to Row(Key) and column sets can be unique for each row While there is no name for the key, the columns and column families are named using ASCII characters Avoid long names for columns and column families Column Family and Columns
  • 28. © 2019 Bloomberg Finance L.P. All rights reserved. Column Family and Column names become part of the RowKey and are carried with every data point Column Family and Columns Key-1 V2 Vn Key-1 CF1:CQ1 V1 RDBMS V1 HBase Key-1 CF1:CQn Vn
  • 29. © 2019 Bloomberg Finance L.P. All rights reserved. Grouping of related data items Each column family has its own storage file Column Family
  • 30. © 2019 Bloomberg Finance L.P. All rights reserved. Analyze your dataset to identify different patterns (e.g., sizes and/or queries) Each pattern can be a column family Identifying Column Families
  • 31. © 2019 Bloomberg Finance L.P. All rights reserved. What should be columns?
  • 32. © 2019 Bloomberg Finance L.P. All rights reserved. Data items requiring atomic updates or transactional behavior should be columns Last component of logical composite key can be a column Key-1 CF1:CQ1 V1 Key-1 CF1:CQn Vn What should be columns?
  • 33. © 2019 Bloomberg Finance L.P. All rights reserved. Why are we doing all this?
  • 34. © 2019 Bloomberg Finance L.P. All rights reserved. Atomicity – An operation completely succeeds or completely fails Consistency – A successful mutation is visible to all read clients Isolation – Operations do not interfere with each other Durability – A successful mutation shall not be lost (even after system crash)! HBase supports these properties at the Row level https://hbase.apache.org/acid-semantics.html ACID and HBase
  • 35. © 2019 Bloomberg Finance L.P. All rights reserved. RDBMS vs. HBase Unlimited # Rows Fixed # Columns Unlimited # Rows Varying and Unlimited # Columns RDBMS HBase
  • 36. © 2019 Bloomberg Finance L.P. All rights reserved. d1+d2+..dn CF1:C1 value d1+d2+..dn CF1:C2 value d1+d2+..dn CF1:Cn value HBase D1 D2 … Dn C1 C2 … Cn D1 D2 … Dn d1 d2 … dn c1 c2 … cn RDBMS RDBMS vs. HBase
  • 37. © 2019 Bloomberg Finance L.P. All rights reserved. Data flow architecture Publishing Queue Transformation HBase Middleware Publishing Publishing Transformation
  • 38. © 2019 Bloomberg Finance L.P. All rights reserved. RETRIEVER UPDATERS QueueCluster HBaseCluster MSG Set ACK Groups of MSGs ACKS Waiting for ACKs Apply Changes Reads Updates Streaming task architecture
  • 39. © 2019 Bloomberg Finance L.P. All rights reserved. Thank you! We are Hiring!! https://www.bloomberg.com/careers Questions?