2011年10月19~21日に開催された「INSIGHT OUT 2011」のセッション「PostgreSQLアーキテクチャ入門」の講演資料です。
「INSIGHT OUT 2011」の詳細については、以下を参照ください。
http://www.insight-tec.com/insight-out-2011.html
29回勉強会資料「PostgreSQLのリカバリ超入門」
See also http://www.interdb.jp/pgsql (Coming soon!)
初心者向け。PostgreSQLのWAL、CHECKPOINT、 オンラインバックアップの仕組み解説。
これを見たら、次は→ http://www.slideshare.net/satock/29shikumi-backup
Postgres Toolkit is a collection of scripts and utilities that allows database administrators to perform complicated PostgreSQL management tasks with single commands. It focuses on frequent tasks like monitoring performance, checking configuration, and managing backups. The open source toolkit currently contains 13 scripts that work on Linux systems and PostgreSQL versions 9.0 through 9.4. It can be installed with a single curl command and includes utilities like pt-config to manage configuration files and pt-session-profiler to monitor long-running queries.
2011年10月19~21日に開催された「INSIGHT OUT 2011」のセッション「PostgreSQLアーキテクチャ入門」の講演資料です。
「INSIGHT OUT 2011」の詳細については、以下を参照ください。
http://www.insight-tec.com/insight-out-2011.html
29回勉強会資料「PostgreSQLのリカバリ超入門」
See also http://www.interdb.jp/pgsql (Coming soon!)
初心者向け。PostgreSQLのWAL、CHECKPOINT、 オンラインバックアップの仕組み解説。
これを見たら、次は→ http://www.slideshare.net/satock/29shikumi-backup
Postgres Toolkit is a collection of scripts and utilities that allows database administrators to perform complicated PostgreSQL management tasks with single commands. It focuses on frequent tasks like monitoring performance, checking configuration, and managing backups. The open source toolkit currently contains 13 scripts that work on Linux systems and PostgreSQL versions 9.0 through 9.4. It can be installed with a single curl command and includes utilities like pt-config to manage configuration files and pt-session-profiler to monitor long-running queries.
This document provides an overview of PL/Proxy, a database partitioning system implemented as a PostgreSQL procedural language extension. PL/Proxy allows applications to perform database operations like inserts, updates, deletes and queries across multiple PostgreSQL database partitions in a transparent manner. It works by routing operations to the appropriate partition based on the value of a partitioning key. The document discusses PL/Proxy concepts, areas of application, example usage, installation, backend and frontend functions, configuration options and more.
The document describes how mobile voice input can be used for data analysis in BI ASK. It shows the user selecting voice input in Siri, then analyzing sales data by saying "sales" which displays total sales. The user can further analyze by product category or units sold, with the results displayed in bubble charts on the mobile device.
mod_auth_ticket - Bringing Single-Sign-On to lighttpdTaisuke Yamada
Explains mod_auth_ticket, a newly developed module for lighttpd to make any website SSO-enabled. Also discusses development experience for lighttpd and some strength evaluation of crypto used by this module.
41. Stado Cluster #2
• Multiple spindles to take advantage of multi-core
processor even within a single server.
– Multiple tablespaces are assigned to each dedicated disk.
Coordinator
Agent
Postgres
tblspc1 tblspc2 tblspc3
testdb1
testdb2
testdb4
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
42. DBT-3 Schema
• Scale factor “10”
– >25GB, including its indexes.
Table Name Purpose # of Records Size Partition Key
customer 顧客マスタ 1,500,000 289,046,528 c_custkey
lineitem 注文明細 59,986,052 8,727,838,720 l_orderkey
nation 国名マスタ 25 8,192 -
orders 注文 15,000,000 2,074,853,376 o_orderkey
part 部品マスタ 2,000,000 332,816,384 p_partkey
partsupp 部品詳細マスタ 8,000,000 1,400,143,872 ps_partkey
region 地域マスタ 5 8,192 -
supplier 販売業者マスタ 100,000 17,858,560 -
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
43. Stado Test #1
• Data Loading / データロード
• Creating indexes / インデックス作成
• Aggregation (count, sum) / 集約(count, sum)
• Aggregation + GROUP BY (daily basis, country basis) /
集約+GROUP BY (日別売上、国別売上)
• JOIN (local) / 結合(ノード内)
• JOIN (cross-node) / 結合(ノード間)
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
44. Summary – Cluster #1
Stado Performance Comparison
(Stado Cluster #1)
3.50
3.00
(compared to standalone)
2.50
Execution Time
Standalone
2.00 1-node
2-nodes
1.50 4-nodes
8-nodes
1.00
0.50
0.00
Loading Building Query 1 Query 2 Query 3 Query 4 Query 5
Tables Indexes
Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
45. Summary – Cluster #2
Stado Performance Comparison
(Stado Cluster #2)
1.20
1.00
(compared to standalone)
0.80
Execution Time
Standalone
1-disk
0.60
2-disks
4-disks
0.40
0.20
0.00
Loading Building Query 1 Query 2 Query 3 Query 4 Query 5
Tables Indexes
Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
46. Loading Tables
• Loading tables did not take advantage of the cluster.
– Loading 8 tables using gs-loader (for Stado) or psql (for
Standalone)
– Parallel loading seems like highly CPU intensive task.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
3.50 1.20
3.00 1.00
(compared to standalone)
(compared to standalone)
2.50
Standalone 0.80
Execution Time
Execution Time
Standalone
2.00 1-node
1-disk
2-nodes 0.60
2-disks
1.50 4-nodes
4-disks
8-nodes 0.40
1.00
0.20
0.50
0.00 0.00
Loading Tables Loading Tables
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
47. Building Indexes
• Building indexes could benefit by adding servers and/or
spindles.
– Elapsed time shows linear scalability as adding nodes.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Building Indexes Building Indexes
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
48. Aggregation (Query 1, 2)
• Simple scan and aggregation can take advantage of the
cluster.
– Executing SUM() and COUNT() on “orders” table.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Query 1 Query 1
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
49. Aggregation + GROUP BY (Query 3)
• “Aggregation + GROUP BY” as well.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Query 3 Query 3
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
50. JOIN (local) (Query 4)
• “Local Join” between partitioned tables with using a
partition key could scale.
– “orders” and “lineitem” are joined with “orderkey”.
– More CPU power needed.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
2.50 1.20
1.00
2.00
(compared to standalone)
(compared to standalone)
Standalone 0.80
Execution Time
Execution Time
1.50 Standalone
1-node
1-disk
2-nodes 0.60
2-disks
4-nodes
1.00 4-disks
8-nodes 0.40
0.50
0.20
0.00 0.00
Query 4 Query 4
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
51. JOIN (cross-node) (Query 5)
• “Cross-node Join” may not be able to scale.
– Particularly, large “Cross-node Join” must be critical for SQL
performance.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
2.50 1.20
1.00
2.00
(compared to standalone)
(compared to standalone)
Standalone 0.80
Execution Time
Execution Time
1.50 Standalone
1-node
1-disk
2-nodes 0.60
2-disks
1.00 4-nodes
4-disks
8-nodes 0.40
0.50
0.20
0.00 0.00
Query 5 Query 5
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
52. Behind the scene: “cross-node” join
• Collect all rows (Row Shipping) from the entire
cluster into temp table. Node #1
• Process a JOIN which can be done within the
node. customer (1)
• Process a JOIN against the temp table. JOIN (1) orders (1)
lineitem (1)
• Row Shipping is very expensive!
JOIN (2) temp
Row Shipping
Node #2 Node #3 Node #4
customer (2) customer (3) customer (4)
orders (2) orders (3) orders (4)
lineitem (2) lineitem (3) lineitem (4)
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
53. Resource usage: “cross-node” join
http://www2.uptimeforce.com/pgstatview2/20b267567a8f134b970070cc1a212a05/
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
54. Stado Test #2
• DBT-3 Queries
– Q1 - Aggregations on a single partitioned table.
– Q2 - Joining two partitioned tables with a partition key, and also
joining dimension tables.
– Q4 - Joining two partitioned tables with a partition key, and
aggregation.
– Q6 - Aggregations on a single partitioned table, and aggregation.
– Q11 - Joining a partitioned table and dimension tables, and
aggregation.
– Q12 - Joining two partitioned table, and aggregation.
– Q16 - Joining two partitioned tables with a partition key, also
joining dimension tables, and aggregation.
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
55. Summary – Cluster #1
Stado Performance Comparison
(Stado Cluster #1)
2
1.8
1.6
1.4
(compared to standalone)
Execution Time
1.2 Standalone
1-node
1 2-nodes
4-nodes
0.8 8-nodes
0.6
0.4
0.2
0
Q1 Q2 Q4 Q6 Q11 Q12 Q16
Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
56. Summary – Cluster #2
Stado Performance Comparison
(Stado Cluster #2)
1.40
1.20
1.00
(compared to standalone)
Execution Time
0.80 Standalone
1-disk
2-disks
0.60 4-disks
0.40
0.20
0.00
Q1 Q2 Q4 Q6 Q11 Q12 Q16
Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
57. Q1
• Simple aggregation with conditional scan on a
partitioned table, “lineitem”.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Q1 Q1
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
58. Q2
• Joining partitioned talbes, “partsupp” and “partsupp”,
with a partition key, and dimension tables.
– But, the 1-node and 8-node AWS clusters shown some strange
behaviors.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
3.00 1.20
2.50 1.00
(compared to standalone)
(compared to standalone)
2.00 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
1.50 2-nodes 0.60
2-disks
4-nodes
4-disks
1.00 8-nodes 0.40
0.50 0.20
0.00 0.00
Q2 Q2
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
59. Q4
• Joining two partitioned tables, “orders” and “lineitem”,
with a partition key “orderkey”.
– Q4 failed to complete on the AWS cluster.
– Q4 was much faster even with 1-disk on the single server.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.00 1.20
0.90
1.00
0.80
(compared to standalone)
(compared to standalone)
0.70
Standalone 0.80
Execution Time
Execution Time
0.60 Standalone
1-node
1-disk
0.50 2-nodes 0.60
2-disks
0.40 4-nodes
4-disks
8-nodes 0.40
0.30
0.20
0.20
0.10
0.00 0.00
Q4 Q4
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
60. Q6
• Simple conditional scan on a partitioned table,“lineitem”.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Q6 Q6
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
61. Q11
• Joining a partitioned table and dimension tables,
“partsupp”, “supplier” and “nation”, with aggregation and
sort.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Q11 Q11
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
62. Q12
• Joining two partitioned tables, “orders” and “lineitem”,
with aggregation and sort.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.20 1.20
1.00 1.00
(compared to standalone)
(compared to standalone)
0.80 Standalone 0.80
Execution Time
Execution Time
Standalone
1-node
1-disk
0.60 2-nodes 0.60
2-disks
4-nodes
4-disks
0.40 8-nodes 0.40
0.20 0.20
0.00 0.00
Q12 Q12
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
63. Q16
• Joining another two partitioned tables, “part” and
“partsupp”, with aggregation and sort.
Stado Performance Comparison Stado Performance Comparison
(Stado Cluster #1) (Stado Cluster #2)
1.40 1.20
1.20 1.00
(compared to standalone)
(compared to standalone)
1.00
Standalone 0.80
Execution Time
Execution Time
Standalone
0.80 1-node
1-disk
2-nodes 0.60
2-disks
0.60 4-nodes
4-disks
8-nodes 0.40
0.40
0.20 0.20
0.00 0.00
Q16 Q16
Workload Type Workload Type
Copyright 2012 Uptime Technologies, LLC. All rights reserved.
64. その他
• SQLの互換性はほぼ問題なし
– DBT-3のクエリ22本はほとんどそのままの形で実行可能
– 一部、明示的な型キャストや優先度の指定が必要(parse errorなどが発生)
• statement_timeout相当の処理がStadoには無いので注意
– SET statement_timeout コマンドを送れない
– Row Shipping中(COPYコマンド中)にはpostgresql.confの
statement_timeoutも効かなかった(調査中)
• EXPLAINは、Row Shippingの発生および量を明確には教えてくれない
ので、頭の中で理解しておく
– “Even though, EXPLAIN is still your friend.”
• 今後の課題
– ログ出力の強化、改善
– 管理コマンドの強化(パーティションの再配置等)
– ドキュメントの強化
Copyright 2012 Uptime Technologies, LLC. All rights reserved.