SlideShare a Scribd company logo
© Copyright EnterpriseDB Corporation, 2015. All Rights Reserved. 1
Query Optimizations For Partitioned
Tables
Ashutosh Bapat
@PGCONF INDIA 2018
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 2
• Partition pruning
• Run-time partition pruning
• Partition-wise join
• Partition-wise aggregation
• Partition-wise sorting/ordering
Query Optimization Techniques
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 3
Partitioned Table
Partition 1
FOR VALUES
FROM (0) TO (100)
Partition 4
FOR VALUES
FROM (300) TO (400)
Partition 3
FOR VALUES
FROM (200) TO (300)
Partition 2
FOR VALUES
FROM (100) TO (200)
Unpartitioned table
t1 (c1 int, c2 int, …)
10, …, ...
90, …, ...
80, …, ...
325, …, …,
375, …, …,
Partitioned table
t1 (c1 int, c2 int, …)
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 4
Partition Pruning
SELECT * FROM t1 WHERE c1 = 350;
Partition 1
FOR VALUES
FROM (0) TO (100)
Partitioned table
t1 (c1 int, c2 int, …)
Partition 4
FOR VALUES
FROM (300) TO (400)
Partition 3
FOR VALUES
FROM (200) TO (300)
Partition 2
FOR VALUES
FROM (100) TO (200)
SELECT * FROM t1
WHERE c1 BETWEEN 150 AND 250;
Partitionboundsbasedelimination
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 5
Run-time Partition Pruning
EXEC scan_t1(350);
Partition 1
FOR VALUES
FROM (0) TO (100)
Partitioned table
t1 (c1 int, c2 int, …)
Partition 4
FOR VALUES
FROM (300) TO (400)
Partition 3
FOR VALUES
FROM (200) TO (300)
Partition 2
FOR VALUES
FROM (100) TO (200)
Partitionboundsbasedelimination
PREPARE scan_t1(int) AS
SELECT * FROM t1 WHERE c1 = $1;
© Copyright EnterpriseDB Corporation, 2015. All Rights Reserved. 7
Partition-wise Join
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 8
Partition-wise Join
Partition 1
FOR VALUES
(0) TO (100)
Partitioned table
t1 (c1 int, ...)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
Partition 1
FOR VALUES
(0) TO (100)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
Partition 1
FOR VALUES
(0) TO (100)
Partitioned table
t2 (c1 int, ...)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
t1 JOIN t2
ON t1.c1 = t2.c1
Partitioned join
Partition 3
FOR VALUES
(200) TO (300)
Partition 1
FOR VALUES
(0) TO (100)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
Partition 3
FOR VALUES
(200) TO (300)
Partition 1
FOR VALUES
(0) TO (100)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 10
• Scale 20
• Schema changes
– lineitems PARTITION BY RANGE(l_orderkey)
– orders PARTITION BY RANGE(o_orderkey)
– Each with 17 partitions
• GUCs
– work_mem - 1GB
– effective_cache_size - 8GB
– shared_buffers - 8GB
– enable_partition_wise_join = on
TPCH Vs. Partition-wise Join (Scale 20)
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 11
TPCH Vs. Partition-wise Join (Scale 20)
(linear time scale)
Q3 Q4 Q5 Q7 Q10 Q12 Q18
0
100
200
300
400
500
600
700
800
Unpartitioned tables
Partitioned tables without PWJ
Partitioned tables with PWJ
TPCH queries (scale 20)
Queryexecutiontime(seconds)
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 12
TPCH Vs. Partition-wise Join (Scale 20)
(scaled execution time)
Q3 Q4 Q5 Q7 Q10 Q12 Q18
1
10
100
Unpartitioned tables
Partitioned tables without PWJ
Partitioned tables with PWJ
TPCH queries (scale 20)
scaledqueryexecutiontime
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 13
TPCH Vs. partition-wise join: observations
● Join strategy change from join between partitioned
tables to join between partitions
– MergeJoin to HashJoin
● Q3, Q5, Q10, Q12, Q18
– HashJoin to parameterized NestLoop join
● Q4
● Different join strategies for different partition-joins
● Q7
● Change in order of joining partitioned tables
– Q3, Q5, Q10, Q12, Q18, Q7
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 14
• Scale 300
• Schema changes
– lineitems PARTITION BY RANGE(l_orderkey)
– orders PARTITION BY RANGE(o_orderkey)
– Each with 106 partitions
• GUCs
– work_mem - 1GB
– effective_cache_size - 10GB
– shared_buffers - 10GB
– enable_partition_wise_join = on
– max_parallel_workers_per_gather = 4
TPCH Vs. Partition-wise Join (Scale 300)
Reported by Rafia Sabih
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 15
TPCH Vs. Partition-wise Join (Scale 300)
Q3 Q4 Q5 Q10 Q12
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Unpartitioned tables
Partitioned tables without PWJ
Partitioned tables with PWJ
TPCH queries (scale 20)
Queryexecutiontime(seconds)
Partitioning makes queries faster
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 16
Partition pruning and Partition-wise Join
Partition 1
FOR VALUES
(0) TO (100)
Partitioned table
t1 (c1 int, ...)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
Partition 1
FOR VALUES
(0) TO (100)
Partitioned table
t2 (c1 int, ...)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
t1 JOIN t2
ON t1.c1 = t2.c1
SELECT … FROM t1 JOIN t2 ON t1.c1 = t2.c1
WHERE t1.c1 > 100 AND t2.c1 < 200
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 17
Partition pruning and Partition-wise Join
Partitioned join
Partition 1
FOR VALUES
(0) TO (100)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
Partition 1
FOR VALUES
(0) TO (100)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
Partitioned join
Partition 2
FOR VALUES
(100) TO (200)
Partition 2
FOR VALUES
(100) TO (200)
© Copyright EnterpriseDB Corporation, 2015. All Rights Reserved. 18
Partition-wise aggregation and
grouping
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 19
Full partition-wise aggregation
Partitioned table
t1 (c1 int, ...)
Partition 1
Partition 3
Partition 2
Partition 1
Partitioned table
t2 (c1 int, ...)
Partition 3
Partition 2
t1 JOIN t2
ON t1.c1 = t2.c1
Partition 1
Partition 3
Partition 2
Partition 1
Partition 3
Partition 2
Group by t1.c1
Group by t1.c1
Group by t1.c1
Group by t1.c1
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 20
Partial partition-wise aggregation
Partitioned table
t1 (c1 int, ...)
Partition 1
Partition 3
Partition 2
Partition 1
Partitioned table
t2 (c1 int, ...)
Partition 3
Partition 2
t1 JOIN t2
ON t1.c1 = t2.c1
Partition 1
Partition 3
Partition 2
Partition 1
Partition 3
Partition 2
Group by t1.c2
Partial Aggregation
Group by t1.c2
Partial Aggregation
Group by t1.c2
Partial Aggregation
Group by t1.c2
Full Aggregation
Group by t1.c2
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 22
Source: Jeevan Chalke's partition-wise aggregate proposal
Query: SELECT a, count(*) FROM plt1 GROUP BY a;
plt1: partitioned table with 3 foreign partitions, each with 1M rows
Query returns 30 rows, 10 rows per partition
enable_partition_wise_agg to false
QUERY PLAN
-------------------------------------
HashAggregate
Group Key: plt1.a
-> Append
-> Foreign Scan on fplt1_p1
-> Foreign Scan on fplt1_p2
-> Foreign Scan on fplt1_p3
Planning time: 0.251 ms
Execution time: 6499.018ms ~ 6.5s
enable_partition_wise_agg to true
QUERY PLAN
-------------------------------------------------------
Append
-> Foreign Scan: Aggregate on (public.fplt1_p1 plt1)
-> Foreign Scan: Aggregate on (public.fplt1_p2 plt1)
-> Foreign Scan: Aggregate on (public.fplt1_p3 plt1)
Planning time: 0.370ms
Execution time: 945.384ms ~ .9s
Example
7x faster
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 23
Partition-wise sorting
Partitioned table
t1 (c1 int, ...)
Partition 1
Partition 3
Partition 2
Partition 1
Partitioned table
t2 (c1 int, ...)
Partition 3
Partition 2
t1 JOIN t2
ON t1.c1 = t2.c1
Partition 1
Partition 3
Partition 2
Partition 1
Partition 3
Partition 2
Sort by c1
Sort by c1
Sort by c1
Sort by c1
© Copyright EnterpriseDB Corporation, 2015. All Rights Reserved. 24
Parallel Append and Partitioning
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 25
Parallel Append and Partitioning
Partition 1 Partition 3Partition 2Partition 1 Partition 3Partition 2
Group by t1.c1 Group by t1.c1 Group by t1.c1
Worker
Backend
1
Worker
Backend
2
Worker
Backend
1
Worker
Backend
3
Leader Backend
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 26
TPCH Vs. PWJ + PA
Q3 Q4 Q5 Q7 Q10 Q12 Q18
0
100
200
300
400
500
600
700
800
Unpartitioned tables
Partitioned tables without PWJ
Partitioned tables with PWJ
Partitioned tables with PWJ and
PA
TPCH queries (scale 20)
Queryexecutiontime(seconds)
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 27
• Committed patches
– Basic partition-wise join - Ashutosh Bapat (EDB)
– Parallel append – Amit Khandekar (EDB)
• Patches submitted on hackers and being reviewed
– Partition pruning – Amit Langote (NTT)
– Run-time partition pruning – Beena Emerson (EDB)
– Partition-wise aggregation – Jeevan Chalke (EDB)
– Partition-wise sorting/ordering – Ronan Dunklau, Julien
Rouhaud (Dalibo)
●
Benchmarking by Rafia Sabih (EDB)
Query Optimization Techniques - patches
Query optimization techniques for partitioned tables.

More Related Content

What's hot (20)

PDF
Thousands of Threads and Blocking I/O
George Cao
 
PDF
CockroachDB: Architecture of a Geo-Distributed SQL Database
C4Media
 
PPTX
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
PDF
Transaction Management on Cassandra
Scalar, Inc.
 
PDF
The Complete MariaDB Server tutorial
Colin Charles
 
PDF
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
Altinity Ltd
 
PDF
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxData
 
PPTX
HBase in Practice
larsgeorge
 
PPTX
RocksDB detail
MIJIN AN
 
PDF
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
PgDay.Seoul
 
PDF
Intro to HBase
alexbaranau
 
PDF
Mastering PostgreSQL Administration
EDB
 
PPTX
Tutorial: Using GoBGP as an IXP connecting router
Shu Sugimoto
 
PDF
Domino Administration Wizardry - Dark Arts Edition
Keith Brooks
 
PDF
[234]멀티테넌트 하둡 클러스터 운영 경험기
NAVER D2
 
PDF
Container Performance Analysis
Brendan Gregg
 
PDF
Oracle Client Failover - Under The Hood
Ludovico Caldara
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PDF
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxData
 
Thousands of Threads and Blocking I/O
George Cao
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
C4Media
 
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
Transaction Management on Cassandra
Scalar, Inc.
 
The Complete MariaDB Server tutorial
Colin Charles
 
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
Altinity Ltd
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxData
 
HBase in Practice
larsgeorge
 
RocksDB detail
MIJIN AN
 
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
PgDay.Seoul
 
Intro to HBase
alexbaranau
 
Mastering PostgreSQL Administration
EDB
 
Tutorial: Using GoBGP as an IXP connecting router
Shu Sugimoto
 
Domino Administration Wizardry - Dark Arts Edition
Keith Brooks
 
[234]멀티테넌트 하둡 클러스터 운영 경험기
NAVER D2
 
Container Performance Analysis
Brendan Gregg
 
Oracle Client Failover - Under The Hood
Ludovico Caldara
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxData
 

Similar to Query optimization techniques for partitioned tables. (20)

PDF
Partition and conquer large data in PostgreSQL 10
Ashutosh Bapat
 
PDF
Innodb에서의 Purge 메커니즘 deep internal (by 이근오)
I Goo Lee.
 
PDF
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
EDB
 
PDF
Performance improvements in PostgreSQL 9.5 and beyond
Tomas Vondra
 
PPT
Metadata Matters
afa reg
 
PPTX
Top 10 tips for Oracle performance
Guy Harrison
 
PDF
design-compiler.pdf
FrangoCamila
 
PPTX
Oracle 122 partitioning_in_action_slide_share
Thomas Teske
 
PDF
query-optimization-techniques_talk.pdf
garos1
 
PDF
Recent Changes and Challenges for Future Presto
Kai Sasaki
 
PDF
Demystifying cost based optimization
Riyaj Shamsudeen
 
PDF
MariaDB ColumnStore
MariaDB plc
 
PPT
Databasessanddataanalysis122222222222.ppt
siddigzain606
 
PDF
G-Store: High-Performance Graph Store for Trillion-Edge Processing
Pradeep Kumar
 
PDF
Pt 3 xii cs final
sanjaydubey2015
 
PPT
Informix Warehouse Accelerator (IWA) features in version 12.1
Keshav Murthy
 
PPT
database management system presentation on integrity constraints
MadhaviNandikonda
 
PDF
201809 DB tech showcase
Keisuke Suzuki
 
PPT
Thomas+Niewel+ +Oracletuning
afa reg
 
Partition and conquer large data in PostgreSQL 10
Ashutosh Bapat
 
Innodb에서의 Purge 메커니즘 deep internal (by 이근오)
I Goo Lee.
 
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
EDB
 
Performance improvements in PostgreSQL 9.5 and beyond
Tomas Vondra
 
Metadata Matters
afa reg
 
Top 10 tips for Oracle performance
Guy Harrison
 
design-compiler.pdf
FrangoCamila
 
Oracle 122 partitioning_in_action_slide_share
Thomas Teske
 
query-optimization-techniques_talk.pdf
garos1
 
Recent Changes and Challenges for Future Presto
Kai Sasaki
 
Demystifying cost based optimization
Riyaj Shamsudeen
 
MariaDB ColumnStore
MariaDB plc
 
Databasessanddataanalysis122222222222.ppt
siddigzain606
 
G-Store: High-Performance Graph Store for Trillion-Edge Processing
Pradeep Kumar
 
Pt 3 xii cs final
sanjaydubey2015
 
Informix Warehouse Accelerator (IWA) features in version 12.1
Keshav Murthy
 
database management system presentation on integrity constraints
MadhaviNandikonda
 
201809 DB tech showcase
Keisuke Suzuki
 
Thomas+Niewel+ +Oracletuning
afa reg
 
Ad

Recently uploaded (20)

PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PPTX
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
PPTX
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PPTX
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
PDF
Group 5_RMB Final Project on circular economy
pgban24anmola
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
What Is Data Integration and Transformation?
subhashenia
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PDF
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PPTX
Comparative Study of ML Techniques for RealTime Credit Card Fraud Detection S...
Debolina Ghosh
 
PDF
SQL for Accountants and Finance Managers
ysmaelreyes
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
Research Methodology Overview Introduction
ayeshagul29594
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
Group 5_RMB Final Project on circular economy
pgban24anmola
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
What Is Data Integration and Transformation?
subhashenia
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
Comparative Study of ML Techniques for RealTime Credit Card Fraud Detection S...
Debolina Ghosh
 
SQL for Accountants and Finance Managers
ysmaelreyes
 
Ad

Query optimization techniques for partitioned tables.

  • 1. © Copyright EnterpriseDB Corporation, 2015. All Rights Reserved. 1 Query Optimizations For Partitioned Tables Ashutosh Bapat @PGCONF INDIA 2018
  • 2. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 2 • Partition pruning • Run-time partition pruning • Partition-wise join • Partition-wise aggregation • Partition-wise sorting/ordering Query Optimization Techniques
  • 3. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 3 Partitioned Table Partition 1 FOR VALUES FROM (0) TO (100) Partition 4 FOR VALUES FROM (300) TO (400) Partition 3 FOR VALUES FROM (200) TO (300) Partition 2 FOR VALUES FROM (100) TO (200) Unpartitioned table t1 (c1 int, c2 int, …) 10, …, ... 90, …, ... 80, …, ... 325, …, …, 375, …, …, Partitioned table t1 (c1 int, c2 int, …)
  • 4. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 4 Partition Pruning SELECT * FROM t1 WHERE c1 = 350; Partition 1 FOR VALUES FROM (0) TO (100) Partitioned table t1 (c1 int, c2 int, …) Partition 4 FOR VALUES FROM (300) TO (400) Partition 3 FOR VALUES FROM (200) TO (300) Partition 2 FOR VALUES FROM (100) TO (200) SELECT * FROM t1 WHERE c1 BETWEEN 150 AND 250; Partitionboundsbasedelimination
  • 5. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 5 Run-time Partition Pruning EXEC scan_t1(350); Partition 1 FOR VALUES FROM (0) TO (100) Partitioned table t1 (c1 int, c2 int, …) Partition 4 FOR VALUES FROM (300) TO (400) Partition 3 FOR VALUES FROM (200) TO (300) Partition 2 FOR VALUES FROM (100) TO (200) Partitionboundsbasedelimination PREPARE scan_t1(int) AS SELECT * FROM t1 WHERE c1 = $1;
  • 6. © Copyright EnterpriseDB Corporation, 2015. All Rights Reserved. 7 Partition-wise Join
  • 7. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 8 Partition-wise Join Partition 1 FOR VALUES (0) TO (100) Partitioned table t1 (c1 int, ...) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) Partition 1 FOR VALUES (0) TO (100) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) Partition 1 FOR VALUES (0) TO (100) Partitioned table t2 (c1 int, ...) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) t1 JOIN t2 ON t1.c1 = t2.c1 Partitioned join Partition 3 FOR VALUES (200) TO (300) Partition 1 FOR VALUES (0) TO (100) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) Partition 3 FOR VALUES (200) TO (300) Partition 1 FOR VALUES (0) TO (100) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200)
  • 8. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 10 • Scale 20 • Schema changes – lineitems PARTITION BY RANGE(l_orderkey) – orders PARTITION BY RANGE(o_orderkey) – Each with 17 partitions • GUCs – work_mem - 1GB – effective_cache_size - 8GB – shared_buffers - 8GB – enable_partition_wise_join = on TPCH Vs. Partition-wise Join (Scale 20)
  • 9. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 11 TPCH Vs. Partition-wise Join (Scale 20) (linear time scale) Q3 Q4 Q5 Q7 Q10 Q12 Q18 0 100 200 300 400 500 600 700 800 Unpartitioned tables Partitioned tables without PWJ Partitioned tables with PWJ TPCH queries (scale 20) Queryexecutiontime(seconds)
  • 10. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 12 TPCH Vs. Partition-wise Join (Scale 20) (scaled execution time) Q3 Q4 Q5 Q7 Q10 Q12 Q18 1 10 100 Unpartitioned tables Partitioned tables without PWJ Partitioned tables with PWJ TPCH queries (scale 20) scaledqueryexecutiontime
  • 11. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 13 TPCH Vs. partition-wise join: observations ● Join strategy change from join between partitioned tables to join between partitions – MergeJoin to HashJoin ● Q3, Q5, Q10, Q12, Q18 – HashJoin to parameterized NestLoop join ● Q4 ● Different join strategies for different partition-joins ● Q7 ● Change in order of joining partitioned tables – Q3, Q5, Q10, Q12, Q18, Q7
  • 12. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 14 • Scale 300 • Schema changes – lineitems PARTITION BY RANGE(l_orderkey) – orders PARTITION BY RANGE(o_orderkey) – Each with 106 partitions • GUCs – work_mem - 1GB – effective_cache_size - 10GB – shared_buffers - 10GB – enable_partition_wise_join = on – max_parallel_workers_per_gather = 4 TPCH Vs. Partition-wise Join (Scale 300) Reported by Rafia Sabih
  • 13. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 15 TPCH Vs. Partition-wise Join (Scale 300) Q3 Q4 Q5 Q10 Q12 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Unpartitioned tables Partitioned tables without PWJ Partitioned tables with PWJ TPCH queries (scale 20) Queryexecutiontime(seconds) Partitioning makes queries faster
  • 14. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 16 Partition pruning and Partition-wise Join Partition 1 FOR VALUES (0) TO (100) Partitioned table t1 (c1 int, ...) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) Partition 1 FOR VALUES (0) TO (100) Partitioned table t2 (c1 int, ...) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) t1 JOIN t2 ON t1.c1 = t2.c1 SELECT … FROM t1 JOIN t2 ON t1.c1 = t2.c1 WHERE t1.c1 > 100 AND t2.c1 < 200
  • 15. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 17 Partition pruning and Partition-wise Join Partitioned join Partition 1 FOR VALUES (0) TO (100) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) Partition 1 FOR VALUES (0) TO (100) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) Partitioned join Partition 2 FOR VALUES (100) TO (200) Partition 2 FOR VALUES (100) TO (200)
  • 16. © Copyright EnterpriseDB Corporation, 2015. All Rights Reserved. 18 Partition-wise aggregation and grouping
  • 17. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 19 Full partition-wise aggregation Partitioned table t1 (c1 int, ...) Partition 1 Partition 3 Partition 2 Partition 1 Partitioned table t2 (c1 int, ...) Partition 3 Partition 2 t1 JOIN t2 ON t1.c1 = t2.c1 Partition 1 Partition 3 Partition 2 Partition 1 Partition 3 Partition 2 Group by t1.c1 Group by t1.c1 Group by t1.c1 Group by t1.c1
  • 18. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 20 Partial partition-wise aggregation Partitioned table t1 (c1 int, ...) Partition 1 Partition 3 Partition 2 Partition 1 Partitioned table t2 (c1 int, ...) Partition 3 Partition 2 t1 JOIN t2 ON t1.c1 = t2.c1 Partition 1 Partition 3 Partition 2 Partition 1 Partition 3 Partition 2 Group by t1.c2 Partial Aggregation Group by t1.c2 Partial Aggregation Group by t1.c2 Partial Aggregation Group by t1.c2 Full Aggregation Group by t1.c2
  • 19. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 22 Source: Jeevan Chalke's partition-wise aggregate proposal Query: SELECT a, count(*) FROM plt1 GROUP BY a; plt1: partitioned table with 3 foreign partitions, each with 1M rows Query returns 30 rows, 10 rows per partition enable_partition_wise_agg to false QUERY PLAN ------------------------------------- HashAggregate Group Key: plt1.a -> Append -> Foreign Scan on fplt1_p1 -> Foreign Scan on fplt1_p2 -> Foreign Scan on fplt1_p3 Planning time: 0.251 ms Execution time: 6499.018ms ~ 6.5s enable_partition_wise_agg to true QUERY PLAN ------------------------------------------------------- Append -> Foreign Scan: Aggregate on (public.fplt1_p1 plt1) -> Foreign Scan: Aggregate on (public.fplt1_p2 plt1) -> Foreign Scan: Aggregate on (public.fplt1_p3 plt1) Planning time: 0.370ms Execution time: 945.384ms ~ .9s Example 7x faster
  • 20. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 23 Partition-wise sorting Partitioned table t1 (c1 int, ...) Partition 1 Partition 3 Partition 2 Partition 1 Partitioned table t2 (c1 int, ...) Partition 3 Partition 2 t1 JOIN t2 ON t1.c1 = t2.c1 Partition 1 Partition 3 Partition 2 Partition 1 Partition 3 Partition 2 Sort by c1 Sort by c1 Sort by c1 Sort by c1
  • 21. © Copyright EnterpriseDB Corporation, 2015. All Rights Reserved. 24 Parallel Append and Partitioning
  • 22. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 25 Parallel Append and Partitioning Partition 1 Partition 3Partition 2Partition 1 Partition 3Partition 2 Group by t1.c1 Group by t1.c1 Group by t1.c1 Worker Backend 1 Worker Backend 2 Worker Backend 1 Worker Backend 3 Leader Backend
  • 23. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 26 TPCH Vs. PWJ + PA Q3 Q4 Q5 Q7 Q10 Q12 Q18 0 100 200 300 400 500 600 700 800 Unpartitioned tables Partitioned tables without PWJ Partitioned tables with PWJ Partitioned tables with PWJ and PA TPCH queries (scale 20) Queryexecutiontime(seconds)
  • 24. © Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 27 • Committed patches – Basic partition-wise join - Ashutosh Bapat (EDB) – Parallel append – Amit Khandekar (EDB) • Patches submitted on hackers and being reviewed – Partition pruning – Amit Langote (NTT) – Run-time partition pruning – Beena Emerson (EDB) – Partition-wise aggregation – Jeevan Chalke (EDB) – Partition-wise sorting/ordering – Ronan Dunklau, Julien Rouhaud (Dalibo) ● Benchmarking by Rafia Sabih (EDB) Query Optimization Techniques - patches