SlideShare a Scribd company logo
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
Data processing into ClickHouse
Nikolai Kochetov, ClickHouse developer
Agenda
› Data layout and compression
› In-memory layout and data processing
› Pipelining and parallelism
› Specialized data structures
3 / 44
Data layout and compression
Column-Oriented DBMS
General ideas
› Separate column is stored in separate file
(or several files)
› Only affected columns are read
› Columnar data representation in memory
Additional concepts
› Sparse index
› Per-column compression
5 / 44
Compression
Highly customizable in CREATE TABLE statement
CREATE TABLE codec_example
(
`dt` DateTime, -- default CODEC is LZ4
`dt_none` DateTime CODEC(NONE),
`dt_lz4_4` DateTime CODEC(LZ4HC(4)),
`dt_zstd` DateTime CODEC(ZSTD),
`dt_dd_lz4` DateTime CODEC(DoubleDelta, LZ4HC) -- combined
)
ENGINE = MergeTree
ORDER BY dt
6 / 44
Compression ratio vs decompression speed
Intel Xeon E3-1225V3, enwik8 https://quixdb.github.io/squash-benchmark
7 / 44
Compression ratio
SELECT
column,
formatReadableSize(column_data_compressed_bytes) AS compressed,
formatReadableSize(column_data_uncompressed_bytes) AS uncompressed,
column_data_uncompressed_bytes / column_data_compressed_bytes AS r
FROM system.parts_columns
WHERE (table = 'codec_example') AND active ORDER BY r ASC
┌─column────┬─compressed─┬─uncompressed─┬──────────────────r─┐
│ dt_none │ 67.73 MiB │ 67.70 MiB │ 0.999618408127124 │
│ dt │ 3.06 MiB │ 67.70 MiB │ 22.156958788868835 │
│ dt_lz4_4 │ 3.06 MiB │ 67.70 MiB │ 22.156958788868835 │
│ dt_zstd │ 1.08 MiB │ 67.70 MiB │ 62.91648262048673 │
│ dt_dd_lz4 │ 938.17 KiB │ 67.70 MiB │ 73.89642182401099 │
└───────────┴────────────┴──────────────┴────────────────────┘
8 / 44
Data transformation chain
Time series data
9 / 44
Data transformation chain
Time series data -> Delta
10 / 44
Data transformation chain
Time series data -> Delta -> Delta
11 / 44
Data transformation chain
Time series data -> Delta -> Delta -> variable length encoding
Time series data -> DoubleDelta
Time series data -> DoubleDelta -> LZ4HC
12 / 44
Read time
Test query
SELECT dt_dd_lz4 FROM codec_example FORMAT Null
Enable system.query_log
SET log_queries = 1
xml config:
clickhouse.tech/docs/en/operations/server_settings/settings/#server_settings-query-log
Drop FS cache
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
13 / 44
Read time
Profile events are in system.query_log
SELECT
pe.Names,
pe.Values
FROM system.query_log
ARRAY JOIN ProfileEvents AS pe
WHERE event_date = today() AND type = 'QueryFinish'
AND query_id = '...'
┌─pe.Names──────────────────────────────┬─pe.Values─┐
│ DiskReadElapsedMicroseconds │ 123970 │
│ RealTimeMicroseconds │ 596084 │
...
14 / 44
Read time
Higher compression rate means
› less IO and more CPU time
› less real time for IO-bounded queries
15 / 44
Read time
select sum(halfMD5(halfMD5(dt))) from codec_example
For CPU-bounded queries decompression time is usually insignificant
16 / 44
In-memory layout
and data processing
Data processing
› Data is processed by blocks
› Block stores slices of columns
› Column is represented in one or several buffers
18 / 44
Integers
› Single buffer
› Stores zero at position -1
› Extra 15 bytes are allocated at array’s tail
19 / 44
Strings
› Buffers with data and offsets
› Offsets are prefix sums of sizes
› Store 0 at string’s end
20 / 44
Arrays
› As well as Strings
› Offsets are stored in a
separate file on FS
21 / 44
N-dimensional Arrays
› N-dimensional Array
is an Array of
(N-1)-dimensional Arrays
› N-dimensional Offsets
are Offsets for
(N-1)-dimensional offsets
› Natural generalization of
1-dimensional Arrays
22 / 44
Functions
Concepts
› Pure (with some exceptions)
› Strong typing
› Multiple overloads
Per-columns execution
› Less virtual calls
› SIMD optimizations
› Complication for UDF
23 / 44
SIMD operations
int memcmpSmallAllowOverflow15(const Char * a, size_t a_size,
const Char * b, size_t b_size)
{
size_t min_size = std::min(a_size, b_size);
for (size_t offset = 0; offset < min_size; offset += 16)
{
/// Compare 16 bytes at once
uint16_t mask = _mm_movemask_epi8(_mm_cmpeq_epi8(
_mm_loadu_si128(reinterpret_cast<const __m128i *>(a + offset)),
_mm_loadu_si128(reinterpret_cast<const __m128i *>(b + offset))));
if (~mask) /// if mask has zero bit (some bytes are different)
{
/// Find and compare first different bytes
...
}
}
return detail::cmp(a_size, b_size);
}
24 / 44
SIMD operations
Main loop for memcmpSmallAllowOverflow15
0xa187fc0 : add $0x10,%r8 ; offset += 16
0xa187fc4 : cmp %r9,%r8 ; if (offset >= min_size)
0xa187fc7 : jae 0xa188008 ; exit loop
0xa187fc9 : movdqu (%rdx,%r8,1),%xmm0 ; xmm0 = a[offset] (16 bytes)
0xa187fcf : movdqu (%rdi,%r8,1),%xmm1 ; xmm1 = b[offset] (16 bytes)
0xa187fd5 : pcmpeqb %xmm1,%xmm0 ; xmm0 = (xmm0 == xmm1)
; (16 bytes at once)
0xa187fd9 : pmovmskb %xmm0,%eax ; mask = `bit mask from xmm0`
0xa187fdd : xor $0xffff,%ax ; mask = ~mask
0xa187fe1 : je 0xa187fc0 ; if (mask == 0)
; continue loop
25 / 44
Pipelining and parallelism
Query Pipeline
SELECT avg(length(URL)) FROM hits WHERE URL != ''
Independent execution steps
› Read column URL
› Calculate expression URL != ''
› Filter column URL
› Calculate function length(URL)
› Calculate aggregate function avg
27 / 44
Query Pipeline
SELECT avg(length(URL)) FROM hits WHERE URL != ''
Properties
› Arbitrary graph
› Support parallel execution
› Dynamically changeable
28 / 44
Parallel Execution
SELECT avg(length(URL)) FROM hits WHERE URL != ''
Parallelism by data
29 / 44
Parallel Execution
SELECT hex(SHA256(*)) FROM (
SELECT hex(SHA256(*)) FROM (
SELECT hex(SHA256(*)) FROM (
SELECT URL FROM hits ORDER BY URL ASC)))
Vertical parallelism
30 / 44
Dynamic pipeline modification
Sometimes we need to change pipeline during execution
Sort stores all query data in memory
Set max_bytes_before_external_sort = <some limit>
31 / 44
Dynamic pipeline modification
Sometimes we need to change pipeline during execution
Sort stores all query data in memory
Set max_bytes_before_external_sort = <some limit>
32 / 44
Dynamic pipeline modification
Sometimes we need to change pipeline during execution
Sort stores all query data in memory
Set max_bytes_before_external_sort = <some limit>
33 / 44
Dynamic pipeline modification
Sometimes we need to change pipeline during execution
Sort stores all query data in memory
Set max_bytes_before_external_sort = <some limit>
34 / 44
Dynamic pipeline modification
Sometimes we need to change pipeline during execution
Sort stores all query data in memory
Set max_bytes_before_external_sort = <some limit>
35 / 44
Query Pipeline
SELECT avg(length(URL)) + 1
FROM hits WHERE URL != ''
WITH TOTALS SETTINGS extremes = 1
┌─plus(avg(length(URL)), 1)─┐
│ 85.3475007793562 │
└───────────────────────────┘
Totals:
┌─plus(avg(length(URL)), 1)─┐
│ 85.3475007793562 │
└───────────────────────────┘
Extremes:
┌─plus(avg(length(URL)), 1)─┐
│ 85.3475007793562 │
│ 85.3475007793562 │
└───────────────────────────┘
36 / 44
Specialized data structures
Task analysis
Task example: string search.
Possible aspects of a task
› Approximate or exact search
› Substring or regexp
› Single or multiple needles
› Single or multiple haystacks
› Short or long strings
› Bytes, unicode code points, real words
For every option can be created specialized algorithm
38 / 44
Concepts
› Take the best implementations
Example: simdjson, pdqsort
› Improve existent algorithms
Volnitsky -> MultiVolnitsky
memcpy -> memcpySmallAllowReadWriteOverflow15
› Use more optimal specializations
40 hash table implementations for GROUP BY
› Test performance on real data
Per-commit tests on real (obfuscated) dataset with page hits
› Profiling
39 / 44
GROUP BY
› Hash table
› Parallel
› Merging in
single thread
40 / 44
GROUP BY
Two level
› Split data to
256 buckets
› Merging in
multiple threads
› More efficient for
remote queries
41 / 44
Hash table specializations
› 8-bit or 16 bit key
lookup table
› 32, 64, 128, 256 bit key
32-bit hash for aggregating, 64-bit hash for merging
› several fixed size keys
represented as single integer if possible
› string key
store pre-calculated hash in hash table
small string optimization
› LowCardinality key
pre-calculated hash for dictionaries
pre-calculated bucket for consecutively repeated dictionaries
42 / 44
Conclusion
› Specialized algorithms and data structures
are necessary for the best performance
› Use the same ideas in your projects
› Contribute: https://github.com/ClickHouse/ClickHouse
43 / 44
Thank you!
QA
44 / 44

More Related Content

What's hot (20)

Distributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDistributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeCon
Duyhai Doan
 
Oracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueuesOracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueues
Kyle Hailey
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
PostgresOpen
 
Fosdem2012 mariadb-5.3-query-optimizer-r2
Fosdem2012 mariadb-5.3-query-optimizer-r2Fosdem2012 mariadb-5.3-query-optimizer-r2
Fosdem2012 mariadb-5.3-query-optimizer-r2
Sergey Petrunya
 
PostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with groupingPostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with grouping
Alexey Bashtanov
 
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeSCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
Jeff Frost
 
PG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas
PG Day'14 Russia, PostgreSQL System Architecture, Heikki LinnakangasPG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas
PG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas
pgdayrussia
 
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaCassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Duyhai Doan
 
Automating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQLAutomating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQL
Nina Kaufman
 
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
PgDay.Seoul
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross Lawley
NETWAYS
 
My sql fabric ha and sharding solutions
My sql fabric ha and sharding solutionsMy sql fabric ha and sharding solutions
My sql fabric ha and sharding solutions
Louis liu
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
Konstantine Krutiy
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden
 
MariaDB Temporal Tables
MariaDB Temporal TablesMariaDB Temporal Tables
MariaDB Temporal Tables
Federico Razzoli
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
Adrian Huang
 
Native erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentationNative erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentation
lin bao
 
Recent my sql_performance Test detail
Recent my sql_performance Test detailRecent my sql_performance Test detail
Recent my sql_performance Test detail
Louis liu
 
Wait Events 10g
Wait Events 10gWait Events 10g
Wait Events 10g
sagai
 
Distributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDistributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeCon
Duyhai Doan
 
Oracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueuesOracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueues
Kyle Hailey
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
PostgresOpen
 
Fosdem2012 mariadb-5.3-query-optimizer-r2
Fosdem2012 mariadb-5.3-query-optimizer-r2Fosdem2012 mariadb-5.3-query-optimizer-r2
Fosdem2012 mariadb-5.3-query-optimizer-r2
Sergey Petrunya
 
PostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with groupingPostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with grouping
Alexey Bashtanov
 
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeSCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
Jeff Frost
 
PG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas
PG Day'14 Russia, PostgreSQL System Architecture, Heikki LinnakangasPG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas
PG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas
pgdayrussia
 
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaCassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Duyhai Doan
 
Automating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQLAutomating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQL
Nina Kaufman
 
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
PgDay.Seoul
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross Lawley
NETWAYS
 
My sql fabric ha and sharding solutions
My sql fabric ha and sharding solutionsMy sql fabric ha and sharding solutions
My sql fabric ha and sharding solutions
Louis liu
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
Konstantine Krutiy
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
Adrian Huang
 
Native erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentationNative erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentation
lin bao
 
Recent my sql_performance Test detail
Recent my sql_performance Test detailRecent my sql_performance Test detail
Recent my sql_performance Test detail
Louis liu
 
Wait Events 10g
Wait Events 10gWait Events 10g
Wait Events 10g
sagai
 

Similar to 21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution (20)

Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
J Singh
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
Tanel Poder
 
Column and hadoop
Column and hadoopColumn and hadoop
Column and hadoop
Alex Jiang
 
Sap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory databaseSap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory database
Alexander Talac
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexing
Utkarsh De
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02
Michael Mathioudakis
 
OOW-IMC-final
OOW-IMC-finalOOW-IMC-final
OOW-IMC-final
Manuel Martin Marquez
 
8.4 Upcoming Features
8.4 Upcoming Features 8.4 Upcoming Features
8.4 Upcoming Features
PostgreSQL Experts, Inc.
 
PyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc AltedPyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc Alted
Pôle Systematic Paris-Region
 
Storage Methods for Nonstandard Data Patterns
Storage Methods for Nonstandard Data PatternsStorage Methods for Nonstandard Data Patterns
Storage Methods for Nonstandard Data Patterns
Bob Burgess
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 
Five steps perform_2009 (1)
Five steps perform_2009 (1)Five steps perform_2009 (1)
Five steps perform_2009 (1)
PostgreSQL Experts, Inc.
 
DBMS
DBMSDBMS
DBMS
Mannat Gill
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
Edward Capriolo
 
Frits Hoogland - About multiblock reads
Frits Hoogland - About multiblock readsFrits Hoogland - About multiblock reads
Frits Hoogland - About multiblock reads
Getting value from IoT, Integration and Data Analytics
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
Tanu Malik
 
Column Stores and Google BigQuery
Column Stores and Google BigQueryColumn Stores and Google BigQuery
Column Stores and Google BigQuery
Csaba Toth
 
HDF5 I/O Performance
HDF5 I/O PerformanceHDF5 I/O Performance
HDF5 I/O Performance
The HDF-EOS Tools and Information Center
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
J Singh
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
Tanel Poder
 
Column and hadoop
Column and hadoopColumn and hadoop
Column and hadoop
Alex Jiang
 
Sap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory databaseSap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory database
Alexander Talac
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexing
Utkarsh De
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02
Michael Mathioudakis
 
Storage Methods for Nonstandard Data Patterns
Storage Methods for Nonstandard Data PatternsStorage Methods for Nonstandard Data Patterns
Storage Methods for Nonstandard Data Patterns
Bob Burgess
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
Tanu Malik
 
Column Stores and Google BigQuery
Column Stores and Google BigQueryColumn Stores and Google BigQuery
Column Stores and Google BigQuery
Csaba Toth
 
Ad

More from Athens Big Data (20)

22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
Athens Big Data
 
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
Athens Big Data
 
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
Athens Big Data
 
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
Athens Big Data
 
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
Athens Big Data
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
Athens Big Data
 
19th Athens Big Data Meetup - 1st Talk - NLP understanding
19th Athens Big Data Meetup - 1st Talk - NLP understanding19th Athens Big Data Meetup - 1st Talk - NLP understanding
19th Athens Big Data Meetup - 1st Talk - NLP understanding
Athens Big Data
 
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
Athens Big Data
 
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
Athens Big Data
 
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
Athens Big Data
 
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
Athens Big Data
 
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
Athens Big Data
 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
Athens Big Data
 
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
Athens Big Data
 
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
Athens Big Data
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
Athens Big Data
 
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
Athens Big Data
 
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
Athens Big Data
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
Athens Big Data
 
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
Athens Big Data
 
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
Athens Big Data
 
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
Athens Big Data
 
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
Athens Big Data
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
Athens Big Data
 
19th Athens Big Data Meetup - 1st Talk - NLP understanding
19th Athens Big Data Meetup - 1st Talk - NLP understanding19th Athens Big Data Meetup - 1st Talk - NLP understanding
19th Athens Big Data Meetup - 1st Talk - NLP understanding
Athens Big Data
 
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
Athens Big Data
 
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
Athens Big Data
 
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
Athens Big Data
 
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
Athens Big Data
 
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
Athens Big Data
 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
Athens Big Data
 
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
Athens Big Data
 
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
Athens Big Data
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
Athens Big Data
 
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
Athens Big Data
 
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
Athens Big Data
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Athens Big Data
 
Ad

Recently uploaded (20)

Maxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing placeMaxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing place
usersalmanrazdelhi
 
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 ADr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr. Jimmy Schwarzkopf
 
Jeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software DeveloperJeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software Developer
Jeremy Millul
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AI Emotional Actors:  “When Machines Learn to Feel and Perform"AI Emotional Actors:  “When Machines Learn to Feel and Perform"
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AkashKumar809858
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto CertificateCybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
VICTOR MAESTRE RAMIREZ
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
Cyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptxCyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptx
Ghimire B.R.
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
Jira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : IntroductionJira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : Introduction
Ravi Teja
 
Fortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in CybersecurityFortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in Cybersecurity
VICTOR MAESTRE RAMIREZ
 
Maxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing placeMaxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing place
usersalmanrazdelhi
 
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 ADr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr. Jimmy Schwarzkopf
 
Jeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software DeveloperJeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software Developer
Jeremy Millul
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AI Emotional Actors:  “When Machines Learn to Feel and Perform"AI Emotional Actors:  “When Machines Learn to Feel and Perform"
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AkashKumar809858
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto CertificateCybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
VICTOR MAESTRE RAMIREZ
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
Cyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptxCyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptx
Ghimire B.R.
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
Jira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : IntroductionJira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : Introduction
Ravi Teja
 
Fortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in CybersecurityFortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in Cybersecurity
VICTOR MAESTRE RAMIREZ
 

21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution

  • 2. Data processing into ClickHouse Nikolai Kochetov, ClickHouse developer
  • 3. Agenda › Data layout and compression › In-memory layout and data processing › Pipelining and parallelism › Specialized data structures 3 / 44
  • 4. Data layout and compression
  • 5. Column-Oriented DBMS General ideas › Separate column is stored in separate file (or several files) › Only affected columns are read › Columnar data representation in memory Additional concepts › Sparse index › Per-column compression 5 / 44
  • 6. Compression Highly customizable in CREATE TABLE statement CREATE TABLE codec_example ( `dt` DateTime, -- default CODEC is LZ4 `dt_none` DateTime CODEC(NONE), `dt_lz4_4` DateTime CODEC(LZ4HC(4)), `dt_zstd` DateTime CODEC(ZSTD), `dt_dd_lz4` DateTime CODEC(DoubleDelta, LZ4HC) -- combined ) ENGINE = MergeTree ORDER BY dt 6 / 44
  • 7. Compression ratio vs decompression speed Intel Xeon E3-1225V3, enwik8 https://quixdb.github.io/squash-benchmark 7 / 44
  • 8. Compression ratio SELECT column, formatReadableSize(column_data_compressed_bytes) AS compressed, formatReadableSize(column_data_uncompressed_bytes) AS uncompressed, column_data_uncompressed_bytes / column_data_compressed_bytes AS r FROM system.parts_columns WHERE (table = 'codec_example') AND active ORDER BY r ASC ┌─column────┬─compressed─┬─uncompressed─┬──────────────────r─┐ │ dt_none │ 67.73 MiB │ 67.70 MiB │ 0.999618408127124 │ │ dt │ 3.06 MiB │ 67.70 MiB │ 22.156958788868835 │ │ dt_lz4_4 │ 3.06 MiB │ 67.70 MiB │ 22.156958788868835 │ │ dt_zstd │ 1.08 MiB │ 67.70 MiB │ 62.91648262048673 │ │ dt_dd_lz4 │ 938.17 KiB │ 67.70 MiB │ 73.89642182401099 │ └───────────┴────────────┴──────────────┴────────────────────┘ 8 / 44
  • 9. Data transformation chain Time series data 9 / 44
  • 10. Data transformation chain Time series data -> Delta 10 / 44
  • 11. Data transformation chain Time series data -> Delta -> Delta 11 / 44
  • 12. Data transformation chain Time series data -> Delta -> Delta -> variable length encoding Time series data -> DoubleDelta Time series data -> DoubleDelta -> LZ4HC 12 / 44
  • 13. Read time Test query SELECT dt_dd_lz4 FROM codec_example FORMAT Null Enable system.query_log SET log_queries = 1 xml config: clickhouse.tech/docs/en/operations/server_settings/settings/#server_settings-query-log Drop FS cache $ echo 3 | sudo tee /proc/sys/vm/drop_caches 13 / 44
  • 14. Read time Profile events are in system.query_log SELECT pe.Names, pe.Values FROM system.query_log ARRAY JOIN ProfileEvents AS pe WHERE event_date = today() AND type = 'QueryFinish' AND query_id = '...' ┌─pe.Names──────────────────────────────┬─pe.Values─┐ │ DiskReadElapsedMicroseconds │ 123970 │ │ RealTimeMicroseconds │ 596084 │ ... 14 / 44
  • 15. Read time Higher compression rate means › less IO and more CPU time › less real time for IO-bounded queries 15 / 44
  • 16. Read time select sum(halfMD5(halfMD5(dt))) from codec_example For CPU-bounded queries decompression time is usually insignificant 16 / 44
  • 18. Data processing › Data is processed by blocks › Block stores slices of columns › Column is represented in one or several buffers 18 / 44
  • 19. Integers › Single buffer › Stores zero at position -1 › Extra 15 bytes are allocated at array’s tail 19 / 44
  • 20. Strings › Buffers with data and offsets › Offsets are prefix sums of sizes › Store 0 at string’s end 20 / 44
  • 21. Arrays › As well as Strings › Offsets are stored in a separate file on FS 21 / 44
  • 22. N-dimensional Arrays › N-dimensional Array is an Array of (N-1)-dimensional Arrays › N-dimensional Offsets are Offsets for (N-1)-dimensional offsets › Natural generalization of 1-dimensional Arrays 22 / 44
  • 23. Functions Concepts › Pure (with some exceptions) › Strong typing › Multiple overloads Per-columns execution › Less virtual calls › SIMD optimizations › Complication for UDF 23 / 44
  • 24. SIMD operations int memcmpSmallAllowOverflow15(const Char * a, size_t a_size, const Char * b, size_t b_size) { size_t min_size = std::min(a_size, b_size); for (size_t offset = 0; offset < min_size; offset += 16) { /// Compare 16 bytes at once uint16_t mask = _mm_movemask_epi8(_mm_cmpeq_epi8( _mm_loadu_si128(reinterpret_cast<const __m128i *>(a + offset)), _mm_loadu_si128(reinterpret_cast<const __m128i *>(b + offset)))); if (~mask) /// if mask has zero bit (some bytes are different) { /// Find and compare first different bytes ... } } return detail::cmp(a_size, b_size); } 24 / 44
  • 25. SIMD operations Main loop for memcmpSmallAllowOverflow15 0xa187fc0 : add $0x10,%r8 ; offset += 16 0xa187fc4 : cmp %r9,%r8 ; if (offset >= min_size) 0xa187fc7 : jae 0xa188008 ; exit loop 0xa187fc9 : movdqu (%rdx,%r8,1),%xmm0 ; xmm0 = a[offset] (16 bytes) 0xa187fcf : movdqu (%rdi,%r8,1),%xmm1 ; xmm1 = b[offset] (16 bytes) 0xa187fd5 : pcmpeqb %xmm1,%xmm0 ; xmm0 = (xmm0 == xmm1) ; (16 bytes at once) 0xa187fd9 : pmovmskb %xmm0,%eax ; mask = `bit mask from xmm0` 0xa187fdd : xor $0xffff,%ax ; mask = ~mask 0xa187fe1 : je 0xa187fc0 ; if (mask == 0) ; continue loop 25 / 44
  • 27. Query Pipeline SELECT avg(length(URL)) FROM hits WHERE URL != '' Independent execution steps › Read column URL › Calculate expression URL != '' › Filter column URL › Calculate function length(URL) › Calculate aggregate function avg 27 / 44
  • 28. Query Pipeline SELECT avg(length(URL)) FROM hits WHERE URL != '' Properties › Arbitrary graph › Support parallel execution › Dynamically changeable 28 / 44
  • 29. Parallel Execution SELECT avg(length(URL)) FROM hits WHERE URL != '' Parallelism by data 29 / 44
  • 30. Parallel Execution SELECT hex(SHA256(*)) FROM ( SELECT hex(SHA256(*)) FROM ( SELECT hex(SHA256(*)) FROM ( SELECT URL FROM hits ORDER BY URL ASC))) Vertical parallelism 30 / 44
  • 31. Dynamic pipeline modification Sometimes we need to change pipeline during execution Sort stores all query data in memory Set max_bytes_before_external_sort = <some limit> 31 / 44
  • 32. Dynamic pipeline modification Sometimes we need to change pipeline during execution Sort stores all query data in memory Set max_bytes_before_external_sort = <some limit> 32 / 44
  • 33. Dynamic pipeline modification Sometimes we need to change pipeline during execution Sort stores all query data in memory Set max_bytes_before_external_sort = <some limit> 33 / 44
  • 34. Dynamic pipeline modification Sometimes we need to change pipeline during execution Sort stores all query data in memory Set max_bytes_before_external_sort = <some limit> 34 / 44
  • 35. Dynamic pipeline modification Sometimes we need to change pipeline during execution Sort stores all query data in memory Set max_bytes_before_external_sort = <some limit> 35 / 44
  • 36. Query Pipeline SELECT avg(length(URL)) + 1 FROM hits WHERE URL != '' WITH TOTALS SETTINGS extremes = 1 ┌─plus(avg(length(URL)), 1)─┐ │ 85.3475007793562 │ └───────────────────────────┘ Totals: ┌─plus(avg(length(URL)), 1)─┐ │ 85.3475007793562 │ └───────────────────────────┘ Extremes: ┌─plus(avg(length(URL)), 1)─┐ │ 85.3475007793562 │ │ 85.3475007793562 │ └───────────────────────────┘ 36 / 44
  • 38. Task analysis Task example: string search. Possible aspects of a task › Approximate or exact search › Substring or regexp › Single or multiple needles › Single or multiple haystacks › Short or long strings › Bytes, unicode code points, real words For every option can be created specialized algorithm 38 / 44
  • 39. Concepts › Take the best implementations Example: simdjson, pdqsort › Improve existent algorithms Volnitsky -> MultiVolnitsky memcpy -> memcpySmallAllowReadWriteOverflow15 › Use more optimal specializations 40 hash table implementations for GROUP BY › Test performance on real data Per-commit tests on real (obfuscated) dataset with page hits › Profiling 39 / 44
  • 40. GROUP BY › Hash table › Parallel › Merging in single thread 40 / 44
  • 41. GROUP BY Two level › Split data to 256 buckets › Merging in multiple threads › More efficient for remote queries 41 / 44
  • 42. Hash table specializations › 8-bit or 16 bit key lookup table › 32, 64, 128, 256 bit key 32-bit hash for aggregating, 64-bit hash for merging › several fixed size keys represented as single integer if possible › string key store pre-calculated hash in hash table small string optimization › LowCardinality key pre-calculated hash for dictionaries pre-calculated bucket for consecutively repeated dictionaries 42 / 44
  • 43. Conclusion › Specialized algorithms and data structures are necessary for the best performance › Use the same ideas in your projects › Contribute: https://github.com/ClickHouse/ClickHouse 43 / 44