SlideShare a Scribd company logo
The Design and
implementation of
modern column-
oriented
databases
An unexamined query plan is not worth
executing - Socrates The DBA
Outline of the paper
History, trends and
performance issues
Predecessors of
modern column-
oriented databases.
Discussions on
MonetDB, VectorWise
Introduction
What are columnar
stores? Benefits and
Materialization issues
Column store
architectures
Common architecture
patterns for column
store architectures
Column store
internals and
advanced
techniques
Vectorized Execution.
Compiled Queries.
Compression.
Late Materialization.
What is column-oriented store?
Column-store systems completely vertically partition a database into a collection of
individual columns that are stored separately.
Data transfer costs are often the major performance
bottlenecks in database systems, At the same time database
schemas are becoming complex with fat tables containing
hundreds of attributes, a column-store is likely to be much
more efficient at executing queries… that touch only a subset
of a table’s attributes.
Benefits
Fetch the columns you need.
Easier to compress data of similar type and kind.
Direct operation on compressed data.
Database cracking and adaptive indexing.
Performs really well with vectorized queries and compiled queries
using SIMD.
Sorted columnar data yields faster filters.
Various layouts in columnar stores
Storing column per file
with implicit or explicit
ids.
Various layouts in columnar stores
Instead of having each column in separate file.
Have one file with various segments of row
groups.
Each row group will contain the column chunks.
Easier seeks for reconstruction of the entire
relation.
Various layouts in columnar stores
Vertical partitioning is storing a
group of columns together
usually the ones which are
heavily queried for OLAP kind of
queries.
SQL Server provides columnar
indexes for doing this.
Materialization tradeoff - Reconstruction of
rows
Row store - One seek per record. (When location of the required page is known)
Column store - n columns require n seeks to construct a record.
However, as more and more records are accessed, the transfer time begins to dominate
the seek time, and a column-oriented approach begins to perform better than a row-
oriented approach. For this reason, column-stores are typically used in analytic
applications, with queries that scan a large fraction of individual tables and compute
aggregates or other statistics over them.
Databases always try to reduce random seeks.
If a seek should happen then the scans after
the seeked position should yield more desired
data.
Summary
Different layouts - column per file, group of columns per file and row groups
Materialization tradeoff - more seeks might be used for reconstruction of rows.
Problem - Do I have to materialize
intermediate results when each
operator is applied?
Using variables to store intermediate results when each operator is applied is bad.
Storing in variables means flushing results to main memory. And for each operation loading
it again into caches from main memory.
Von Neumann Tube
Solutions
What is getting pushed or pulled?
Data is getting pushed or pulled
towards the operators.
To avoid storing intermediate results
when each operator is applied.
Instructions are chained so that
operations can be applied relation at a
time or a vector at a time.
Operate on data while it is in the cache
and avoid flushing intermediate results
in main memory.
Pull based query model
Push based query model
Volcano iterator model - A pull based
model
It is pull based model.
Data is pulled by operators.
Control flows downwards and data
flows upwards.
Data Driven.
Implemented using an iterator pattern.
Problem - Too many virtual functions
calls.
Push model
Data is pushed towards operators.
Each stage calls .consume() on next
operator.
Demand driven.
Implemented using visitor pattern.
Control flows upwards and data
flows upwards.
Reduced number of calls, as flow is
from bottom to top.
Pull model requires many .next calls
to reach predicate.
What if the row is discarded? Too
many function calls got wasted.
Example of push model
Column stores work better with push
model.
https://arxiv.org/pdf/1610.09166v1.pdf
Vectorized Queries
Why introduced?
Volcano iterator model is a row by row processing. If
query execution code is too big and general it will have
too many CPU instructions. If a cache miss occurs for
executing on each row then the cost is too much. Thus,
batch the rows together so that even if instruction
cache miss occurs it will be for a batch of records.
Execution code which is written generally to support
any schema type often reeks of too much abstraction
and branching. And thus leads to too many
instructions.
Compiled Queries
They solved the instruction cache miss problems. By generating the query execution code
tailored to the query. They took it as a compiler problem. Load only those instructions which
are required for the query.
Column stores work best when
compiled queries are generated to
execute on data as vectors while
adhering to the push model.
Spark does that.
Column stores and hybrid execution
Reduced interpretation overhead. The amount of function calls performed by the query
interpreter goes down by a factor equal to the vector size compared to the tuple-at-a-time
model.
Better cache locality.
Compiler optimization opportunities - using SIMD, loop pipelining, loop unrolling
Compression
It is easier to compress same kind and type of data.
Compression techniques used widely on columnar stores:
Delta Encoding - Have a base value and store deltas afterwards. Works good on large
monotonic values.
Dictionary Encoding - Encoding a set of values to an integer. Enums and strings with
restricted domain.
Run Length Encoding - Storing the start position and the number of times the item has
appeared after the start position. Columns with repeated values. Status flags etc
Bit Vector Encoding - Basically a bitmap
Patching Technique - Encode the outliers in dictionary encoding with an escape value in the
beginning.
Working on compressed data
Create an interface to define engines
which are more compression aware and
can operate on compressed data.
Example The query engine with
support for working on RLE data could
have methods such as getSize()
isSorted() isOneValue()
getDistinctValues() getSum() etc
SUM, AGG, MUL etc straightforward
operations can be easily performed on
RLE encoded data.
A compression block contains a
buffer of column data in
compressed format and provides
an API that allows the buffer to be
accessed by query operators in
several ways…
Late Materialization
Stitching together columns from same of various tables to form the result set.
The design and implementation of modern column oriented databases
Late materialisation advantages
Late materialisation has four main advantages:
1. Due to selection and aggregation operations, it may be possible to avoid
materialising some tuples altogether
2. It avoids decompression of data to reconstruct tuples, meaning we can still operate
directly on compressed data where applicable
3. It improves cache performance when operating directly on column data
4. Vectorized optimisations have a higher impact on performance for fixed-length
attributes. With columns, we can take advantage of this for any fixed-width
columns. Once we move to row-based representation, any variable-width attribute
in the row makes the whole tuple variable-width.
Hybrid materialization? Materialize right side and late materialization of left side as it is
already sorted
Joins
The most straightforward way to implement a column-oriented join is for (only) the
columns that compose the join predicate to be input to the join. In the case of hash
joins (which is the typical join algorithm used) this results in much more compact
hash tables which in turn results in much better access patterns during probing; a
smaller hash table leads to less cache misses.
Joins (Jive Join)
Redundant column representation
Columns that are sorted according to a particular attribute can be filtered much more
quickly on that attribute. By storing several copies of each column sorted by
attributes heavily used in an application’s query workload, substantial performance
gains can be achieved. C-store calls groups of columns sorted on a particular
attribute projections.
Database cracking and adaptive
indexing
An alternative to sorting columns up front is to adaptively and incrementally sort columns as
a side effect of query processing. “Each query partially reorganizes the columns it touches to
allow future queries to access data faster.” For example, if a query has a predicate A n where
n ≥ 10, it only has to search and crack only the last part of the column. In the following
example, query Q1 cuts the column in three pieces and then query Q2 further enhances the
partitioning.
The design and implementation of modern column oriented databases
Group-by, Aggregation and Arithmetic
Operations
Group-by. Group-by is typically a hash-table based operation in modern column-stores and
thus it exploits similar properties as discussed in the previous section. In particular, we may
create a compact hash table, i.e., where only the grouped attribute can be used, leading in
better access patterns when probing.
Aggregations. Aggregation operations make heavy use of the columnar layout. In particular,
they can work on only the relevant column with tight for-loops. For example, assume sum(),
min(), max(), avg() operators; such an operator only needs to scan the relevant column (or
intermediate result which is also in a columnar form), maximizing the utilization of memory
bandwidth.
Arithmetic operations. Other operators that may be used in the select clause in an SQL
query, i.e., math operators (such as +,-*,/) also exploit the columnar layout to perform those
actions eciently. However, in these cases, because such operators typically need to operate
on groups of columns, e.g., select A+B+C From R ..., they typically have to materialize
intermediate results for each action. For example, in our previous example, a inter=add(A,B)
operator will work over columns A and B creating an intermediate result column which will
then be fed to another res=add(C,inter) operator in order to perform the addition with
column C and to produce the final result. Vectoriza-
tion helps in minimizing the memory footprint of intermediate results at any given time, but it
has been shown that it may also be beneficial to on-the-fly transform intermediate results
into column-groups in order to work with (vectors of) multiple columns [101], avoiding
materialization of intermediate results completely. In our example above, we can create a
column-group of the qualifying tuples from all columns (A, B, C) and perform the sum
operation in one go.
Updates/Deletes
Deleting and Updating data in compressed columns is difficult.
Append only log with multi version. Frequent compaction.
ROS and WOS.
Using in-memory sorted data structure for updates and deletes. Later flush the changes.
Using the same query on delta and last snapshot separately and merging the results.
Conclusions
As it is evident by the plethora of those features, modern column stores go beyond simply
storing data one column-at-a-time; they provide a completely new database architecture
and execution engine tailored for modern hardware and data analytics.
compression is much more effective when applied at one column-at-a-time or vectorization
and block processing help minimize cache misses and instruction misses even more when
carrying one column-at-a-time
Late materialization is ambitious but cannot be done for more complex queries. Much
research is needed in this section.
Ad

More Related Content

What's hot (20)

SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
DataStax
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Cassandra no sql ecosystem
Cassandra no sql ecosystemCassandra no sql ecosystem
Cassandra no sql ecosystem
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
Fabio Fumarola
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
Iraklis Psaroudakis
 
Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets
robertlz
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
DataStax
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasets
Carl Lu
 
Google's Dremel
Google's DremelGoogle's Dremel
Google's Dremel
Maria Stylianou
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
Grisha Weintraub
 
Lecture 24
Lecture 24Lecture 24
Lecture 24
Shani729
 
Features of Hadoop
Features of HadoopFeatures of Hadoop
Features of Hadoop
Dr. C.V. Suresh Babu
 
Bigtable
BigtableBigtable
Bigtable
Amir Payberah
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
DataStax
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
DataStax
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
Daniel Lemire
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
EDB
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
Subhas Kumar Ghosh
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
DataStax
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
EDB
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
DataStax
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
Fabio Fumarola
 
Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets
robertlz
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
DataStax
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasets
Carl Lu
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
Grisha Weintraub
 
Lecture 24
Lecture 24Lecture 24
Lecture 24
Shani729
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
DataStax
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
DataStax
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
Daniel Lemire
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
EDB
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
Subhas Kumar Ghosh
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
DataStax
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
EDB
 

Similar to The design and implementation of modern column oriented databases (20)

Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Column oriented Transactions
Column oriented TransactionsColumn oriented Transactions
Column oriented Transactions
Aerial Telecom Solutions (ATS) Pvt. Ltd.
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
Suvradeep Rudra
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
Arno Huetter
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
Firat Atagun
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf points
ocporacledba
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf points
dba3003
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
Ryousei Takano
 
22827361 ab initio-fa-qs
22827361 ab initio-fa-qs22827361 ab initio-fa-qs
22827361 ab initio-fa-qs
Capgemini
 
Intro to SnappyData Webinar
Intro to SnappyData WebinarIntro to SnappyData Webinar
Intro to SnappyData Webinar
SnappyData
 
Scalability Considerations
Scalability ConsiderationsScalability Considerations
Scalability Considerations
Navid Malek
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
paulguerin
 
Java programing considering performance
Java programing considering performanceJava programing considering performance
Java programing considering performance
Roger Xia
 
Performance Improvement Technique in Column-Store
Performance Improvement Technique in Column-StorePerformance Improvement Technique in Column-Store
Performance Improvement Technique in Column-Store
IDES Editor
 
UPD_OP_SQL
UPD_OP_SQLUPD_OP_SQL
UPD_OP_SQL
Valapet Badri
 
Experimenting With Big Data
Experimenting With Big DataExperimenting With Big Data
Experimenting With Big Data
Nick Boucart
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon Redshift
AWS Germany
 
Column store databases approaches and optimization techniques
Column store databases  approaches and optimization techniquesColumn store databases  approaches and optimization techniques
Column store databases approaches and optimization techniques
IJDKP
 
Improving performance of decision support queries in columnar cloud database ...
Improving performance of decision support queries in columnar cloud database ...Improving performance of decision support queries in columnar cloud database ...
Improving performance of decision support queries in columnar cloud database ...
Serkan Özal
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
IndicThreads
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
Suvradeep Rudra
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
Arno Huetter
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
Firat Atagun
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf points
ocporacledba
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf points
dba3003
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
Ryousei Takano
 
22827361 ab initio-fa-qs
22827361 ab initio-fa-qs22827361 ab initio-fa-qs
22827361 ab initio-fa-qs
Capgemini
 
Intro to SnappyData Webinar
Intro to SnappyData WebinarIntro to SnappyData Webinar
Intro to SnappyData Webinar
SnappyData
 
Scalability Considerations
Scalability ConsiderationsScalability Considerations
Scalability Considerations
Navid Malek
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
paulguerin
 
Java programing considering performance
Java programing considering performanceJava programing considering performance
Java programing considering performance
Roger Xia
 
Performance Improvement Technique in Column-Store
Performance Improvement Technique in Column-StorePerformance Improvement Technique in Column-Store
Performance Improvement Technique in Column-Store
IDES Editor
 
Experimenting With Big Data
Experimenting With Big DataExperimenting With Big Data
Experimenting With Big Data
Nick Boucart
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon Redshift
AWS Germany
 
Column store databases approaches and optimization techniques
Column store databases  approaches and optimization techniquesColumn store databases  approaches and optimization techniques
Column store databases approaches and optimization techniques
IJDKP
 
Improving performance of decision support queries in columnar cloud database ...
Improving performance of decision support queries in columnar cloud database ...Improving performance of decision support queries in columnar cloud database ...
Improving performance of decision support queries in columnar cloud database ...
Serkan Özal
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
IndicThreads
 
Ad

Recently uploaded (20)

LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Ad

The design and implementation of modern column oriented databases

  • 1. The Design and implementation of modern column- oriented databases An unexamined query plan is not worth executing - Socrates The DBA
  • 2. Outline of the paper History, trends and performance issues Predecessors of modern column- oriented databases. Discussions on MonetDB, VectorWise Introduction What are columnar stores? Benefits and Materialization issues Column store architectures Common architecture patterns for column store architectures Column store internals and advanced techniques Vectorized Execution. Compiled Queries. Compression. Late Materialization.
  • 3. What is column-oriented store? Column-store systems completely vertically partition a database into a collection of individual columns that are stored separately.
  • 4. Data transfer costs are often the major performance bottlenecks in database systems, At the same time database schemas are becoming complex with fat tables containing hundreds of attributes, a column-store is likely to be much more efficient at executing queries… that touch only a subset of a table’s attributes.
  • 5. Benefits Fetch the columns you need. Easier to compress data of similar type and kind. Direct operation on compressed data. Database cracking and adaptive indexing. Performs really well with vectorized queries and compiled queries using SIMD. Sorted columnar data yields faster filters.
  • 6. Various layouts in columnar stores Storing column per file with implicit or explicit ids.
  • 7. Various layouts in columnar stores Instead of having each column in separate file. Have one file with various segments of row groups. Each row group will contain the column chunks. Easier seeks for reconstruction of the entire relation.
  • 8. Various layouts in columnar stores Vertical partitioning is storing a group of columns together usually the ones which are heavily queried for OLAP kind of queries. SQL Server provides columnar indexes for doing this.
  • 9. Materialization tradeoff - Reconstruction of rows Row store - One seek per record. (When location of the required page is known) Column store - n columns require n seeks to construct a record. However, as more and more records are accessed, the transfer time begins to dominate the seek time, and a column-oriented approach begins to perform better than a row- oriented approach. For this reason, column-stores are typically used in analytic applications, with queries that scan a large fraction of individual tables and compute aggregates or other statistics over them.
  • 10. Databases always try to reduce random seeks. If a seek should happen then the scans after the seeked position should yield more desired data.
  • 11. Summary Different layouts - column per file, group of columns per file and row groups Materialization tradeoff - more seeks might be used for reconstruction of rows.
  • 12. Problem - Do I have to materialize intermediate results when each operator is applied? Using variables to store intermediate results when each operator is applied is bad. Storing in variables means flushing results to main memory. And for each operation loading it again into caches from main memory. Von Neumann Tube
  • 13. Solutions What is getting pushed or pulled? Data is getting pushed or pulled towards the operators. To avoid storing intermediate results when each operator is applied. Instructions are chained so that operations can be applied relation at a time or a vector at a time. Operate on data while it is in the cache and avoid flushing intermediate results in main memory. Pull based query model Push based query model
  • 14. Volcano iterator model - A pull based model It is pull based model. Data is pulled by operators. Control flows downwards and data flows upwards. Data Driven. Implemented using an iterator pattern. Problem - Too many virtual functions calls.
  • 15. Push model Data is pushed towards operators. Each stage calls .consume() on next operator. Demand driven. Implemented using visitor pattern. Control flows upwards and data flows upwards. Reduced number of calls, as flow is from bottom to top. Pull model requires many .next calls to reach predicate. What if the row is discarded? Too many function calls got wasted.
  • 17. Column stores work better with push model. https://arxiv.org/pdf/1610.09166v1.pdf
  • 18. Vectorized Queries Why introduced? Volcano iterator model is a row by row processing. If query execution code is too big and general it will have too many CPU instructions. If a cache miss occurs for executing on each row then the cost is too much. Thus, batch the rows together so that even if instruction cache miss occurs it will be for a batch of records. Execution code which is written generally to support any schema type often reeks of too much abstraction and branching. And thus leads to too many instructions.
  • 19. Compiled Queries They solved the instruction cache miss problems. By generating the query execution code tailored to the query. They took it as a compiler problem. Load only those instructions which are required for the query.
  • 20. Column stores work best when compiled queries are generated to execute on data as vectors while adhering to the push model. Spark does that.
  • 21. Column stores and hybrid execution Reduced interpretation overhead. The amount of function calls performed by the query interpreter goes down by a factor equal to the vector size compared to the tuple-at-a-time model. Better cache locality. Compiler optimization opportunities - using SIMD, loop pipelining, loop unrolling
  • 22. Compression It is easier to compress same kind and type of data. Compression techniques used widely on columnar stores: Delta Encoding - Have a base value and store deltas afterwards. Works good on large monotonic values. Dictionary Encoding - Encoding a set of values to an integer. Enums and strings with restricted domain. Run Length Encoding - Storing the start position and the number of times the item has appeared after the start position. Columns with repeated values. Status flags etc Bit Vector Encoding - Basically a bitmap Patching Technique - Encode the outliers in dictionary encoding with an escape value in the beginning.
  • 23. Working on compressed data Create an interface to define engines which are more compression aware and can operate on compressed data. Example The query engine with support for working on RLE data could have methods such as getSize() isSorted() isOneValue() getDistinctValues() getSum() etc SUM, AGG, MUL etc straightforward operations can be easily performed on RLE encoded data. A compression block contains a buffer of column data in compressed format and provides an API that allows the buffer to be accessed by query operators in several ways…
  • 24. Late Materialization Stitching together columns from same of various tables to form the result set.
  • 26. Late materialisation advantages Late materialisation has four main advantages: 1. Due to selection and aggregation operations, it may be possible to avoid materialising some tuples altogether 2. It avoids decompression of data to reconstruct tuples, meaning we can still operate directly on compressed data where applicable 3. It improves cache performance when operating directly on column data 4. Vectorized optimisations have a higher impact on performance for fixed-length attributes. With columns, we can take advantage of this for any fixed-width columns. Once we move to row-based representation, any variable-width attribute in the row makes the whole tuple variable-width. Hybrid materialization? Materialize right side and late materialization of left side as it is already sorted
  • 27. Joins The most straightforward way to implement a column-oriented join is for (only) the columns that compose the join predicate to be input to the join. In the case of hash joins (which is the typical join algorithm used) this results in much more compact hash tables which in turn results in much better access patterns during probing; a smaller hash table leads to less cache misses.
  • 29. Redundant column representation Columns that are sorted according to a particular attribute can be filtered much more quickly on that attribute. By storing several copies of each column sorted by attributes heavily used in an application’s query workload, substantial performance gains can be achieved. C-store calls groups of columns sorted on a particular attribute projections.
  • 30. Database cracking and adaptive indexing An alternative to sorting columns up front is to adaptively and incrementally sort columns as a side effect of query processing. “Each query partially reorganizes the columns it touches to allow future queries to access data faster.” For example, if a query has a predicate A n where n ≥ 10, it only has to search and crack only the last part of the column. In the following example, query Q1 cuts the column in three pieces and then query Q2 further enhances the partitioning.
  • 32. Group-by, Aggregation and Arithmetic Operations Group-by. Group-by is typically a hash-table based operation in modern column-stores and thus it exploits similar properties as discussed in the previous section. In particular, we may create a compact hash table, i.e., where only the grouped attribute can be used, leading in better access patterns when probing. Aggregations. Aggregation operations make heavy use of the columnar layout. In particular, they can work on only the relevant column with tight for-loops. For example, assume sum(), min(), max(), avg() operators; such an operator only needs to scan the relevant column (or intermediate result which is also in a columnar form), maximizing the utilization of memory bandwidth.
  • 33. Arithmetic operations. Other operators that may be used in the select clause in an SQL query, i.e., math operators (such as +,-*,/) also exploit the columnar layout to perform those actions eciently. However, in these cases, because such operators typically need to operate on groups of columns, e.g., select A+B+C From R ..., they typically have to materialize intermediate results for each action. For example, in our previous example, a inter=add(A,B) operator will work over columns A and B creating an intermediate result column which will then be fed to another res=add(C,inter) operator in order to perform the addition with column C and to produce the final result. Vectoriza- tion helps in minimizing the memory footprint of intermediate results at any given time, but it has been shown that it may also be beneficial to on-the-fly transform intermediate results into column-groups in order to work with (vectors of) multiple columns [101], avoiding materialization of intermediate results completely. In our example above, we can create a column-group of the qualifying tuples from all columns (A, B, C) and perform the sum operation in one go.
  • 34. Updates/Deletes Deleting and Updating data in compressed columns is difficult. Append only log with multi version. Frequent compaction. ROS and WOS. Using in-memory sorted data structure for updates and deletes. Later flush the changes. Using the same query on delta and last snapshot separately and merging the results.
  • 35. Conclusions As it is evident by the plethora of those features, modern column stores go beyond simply storing data one column-at-a-time; they provide a completely new database architecture and execution engine tailored for modern hardware and data analytics. compression is much more effective when applied at one column-at-a-time or vectorization and block processing help minimize cache misses and instruction misses even more when carrying one column-at-a-time Late materialization is ambitious but cannot be done for more complex queries. Much research is needed in this section.

Editor's Notes

  • #3: What is column store? Benefits Physical layous
  • #4: Each column in a separate file.
  • #7: Explicit ids are good for materialization. To stich together the columns to obtain the rows.
  • #8: Explicit ids are good for materialization. To stitch together the columns to obtain the rows.
  • #10: The other kind of materialization is intermediate storing of data in variables.
  • #18: The idea is to filter data quickly and as early as possible.