SlideShare a Scribd company logo
Copyright © 2016 NTT DATA Corporation
December 2, 2016
NTT Data Corporation
Ayumi Ishii
Application of PostgreSQL to large social
infrastructure
PGCONF.ASIA 2016
Copyright © 2016 NTT DATA Corporation 2
How to use PostgreSQL in social infrastructure
3Copyright © 2016 NTT DATA Corporation
Positioning of smart meter management system
aggregation
device
SM
SM
SM
smart meter
management
system
SM
Data Center
SM
SM
SM
aggregation device
wheeling
management system
fee calculation for
new menu
other
power
companies
billing
processing
member management
system
reward points system
switching support
system
Organization
for Cross-
regional
Coordination
of
Transmission
Operators
★
4Copyright © 2016 NTT DATA Corporation
Main processing and mission of the system
main processing
5 million datasets
per 30 min
validate
save
data
save
calculated datacalculation
within 10minutes
• 240 million additional tuples per
day
• must be saved for 24 months
5 million
tuple
INSERT
Mission 1
Mission 2
large scale
SELECT
Mission 35 million
tuple
INSERT
5Copyright © 2016 NTT DATA Corporation
Mission
1. Load 10 million datasets within 10 minutes !
2. Must save data for 24 months !
3. Stabilize large scale SELECT performance !
6Copyright © 2016 NTT DATA Corporation
(1) Load 10 million datasets within 10 minutes !
★
main processing
5 million datasets
per 30 min
validate
save
data
save
calculated datacalculation
within 10minutes
• 240 million additional tuples per
day
• must be saved for 24 months
5 million
tuple
INSERT
Mission 2
large scale
SELECT
Mission 35 million
tuple
INSERT
Mission 1
7Copyright © 2016 NTT DATA Corporation
Data model
data : [Device ID] [Date] [Electricity Usage]
ex) ID: 1 used 500 at 1:00 August 1st.
Method 1 :UPDATE model
UPDATE new data for each device, daily
Device
ID
Day 0:00 0:30 1:00 1:30 …
1 8/1 100 300 500
2 8/1 200 400
Frequent UPADATEs are unfavorable for
PostgreSQL in terms of performance
8Copyright © 2016 NTT DATA Corporation
Data model
Device
ID
Date Value
1 8/1 0:00 100
1 8/1 0:30 300
1 8/1 1:00 500
… … …
○ performance
× data size
Method 2 : INSERT model
INSERT new data for each device, every 30 mins
Method 1 :UPDATE model
Device
ID
Day 0:00 0:30 1:00 1:30 …
1 8/1 100 300 500
2 8/1 200 400
9Copyright © 2016 NTT DATA Corporation
Data model
Device
ID
Date Value
1 8/1 0:00 100
1 8/1 0:30 300
1 8/1 1:00 500
… … …
○ performance
× data size
Method 2 : INSERT model
INSERT new data for each device, every 30 mins
Method 1 :UPDATE model
Device
ID
Day 0:00 0:30 1:00 1:30 …
1 8/1 100 300 500
2 8/1 200 400
Selected based on performance
10Copyright © 2016 NTT DATA Corporation
Performance factors
number of tuples
in one transaction ?
multiplicity? parameters?
data type?
restrictions?
index?
version?
pre research regarding performance factors
how to load to
partition table?
11Copyright © 2016 NTT DATA Corporation
Performance factors
number of tuples
in one transaction
10000multiplicity
8
parameter
wal_bugffers=1GB
data type
minimumrestriction
minimum
index
minimum
version
9.4
direct load to
partition child table
DB design
performance tuning
12Copyright © 2016 NTT DATA Corporation
Performance factors
number of tuples
in one transaction
10000multiplicity
8
parameter
wal_bugffers=1GB
data type
minimumrestriction
minimum
index
minimum
version
9.4
direct load to
partition child table
13Copyright © 2016 NTT DATA Corporation
Bottleneck Analysis with perf
19.83% postgres postgres [.] XLogInsert ★
6.45% postgres postgres [.] LWLockRelease
4.41% postgres postgres [.] PinBuffer
3.03% postgres postgres [.] LWLockAcquire
WAL is the
bottleneck !
perf
WAL
WAL
file
Disk
I/O
memory
WAL buffer
write
・commit
・buffer is full
14Copyright © 2016 NTT DATA Corporation
wal_buffers parameter
“The auto-tuning selected by the default
setting of -1 should give reasonable results
in most cases.”
by PostgreSQL Document
15Copyright © 2016 NTT DATA Corporation
wal_buffers
※INSERT only
(except SELECT)
0:00:00
0:01:00
0:02:00
0:03:00
0:04:00
0:05:00
0:06:00
0:07:00
0:08:00
0:09:00
16MB 1GB
Time
Impact of WAL_buffers
16Copyright © 2016 NTT DATA Corporation
PostgreSQL version
・WAL performance improved
・JSONB
・GIN performance improved
・CONCURRENTLY option
9.3 9.4
17Copyright © 2016 NTT DATA Corporation
Version up
• We had originally planned to use 9.3, but changed to 9.4.
0:00:00
0:01:00
0:02:00
0:03:00
0:04:00
0:05:00
0:06:00
0:07:00
0:08:00
9.3 9.4
time
impact of version up
※INSERT only
(except SELECT)
18Copyright © 2016 NTT DATA Corporation
0:07:57
0:06:59
0:05:49
0:03:29
0:03:29
0:03:29
0:00:00
0:02:00
0:04:00
0:06:00
0:08:00
0:10:00
0:12:01
9.3, 16MB 9.3, 1GB 9.4, 1GB
time
Result
target
accomplished!!
other processes
are already
tuned.
■INSERT
■others
19Copyright © 2016 NTT DATA Corporation
(2) Must save data for 24 months !
★
main processing
5 million datasets
per 30 min
validate
save
data
save
calculated datacalculation
within 10minutes
• 240 million additional tuples per
day
• must be saved for 24 months
5 million
tuple
INSERT
large scale
SELECT
Mission 35 million
tuple
INSERT
Mission 1
Mission 2
108TB
21Copyright © 2016 NTT DATA Corporation
Reduce data size by selecting the best data type
• Integer
 Use the smallest data type that can cover the range and precision
• Boolean
 Use BOOLEAN instead of CHAR(1)
Type precision Size
SMALLINT 4 digit 2 byte
INTEGER 9 digit 4 byte
BIGINT 18 digit 8 byte
NUMERIC 1000 digit 3 or 6 or 8 + ceiling(digit / 4) * 2
Type available data Size
CHAR(1) string (length is 1) 5 byte
BOOLEAN true or false 1 byte
22Copyright © 2016 NTT DATA Corporation
Reduce the data size by changing column order
• alignment
• PostgreSQL does not store data across the alignment
1 2 3 4 5 6 7 8
column_1(4byte) ***PADDING***
column_2(8byte)
8 byte
Column Type
column_1 integer
column_2 timestamp without time zone
column_3 integer
column_4 smallint
column_5 timestamp without time zone
column_6 smallint
column_7 timestamp without time zone
1 2 3 4 5 6 7 8
column_1 ***PADDING***
column_2
column_3 column_4 *PADDING*
column_5
column_6 ********PADDING*********
column_7
1 2 3 4 5 6 7 8
column_2
column_5
column_7
column_1 column_3
column_4 column_6
72 60
ex)
12 type / 1 tuple
 2.8GB /day!
24Copyright © 2016 NTT DATA Corporation
Change data model
num data select
frequency
update
frequency
policy model
1 1st day
~65th day
high high performance is the
priority
INSERT
2 66th day
~24 months
low low data size is the
priority
UPDATE
We adopted INSERT model considering the performance
• However, data size is large making it difficult to store long term
convert model for old data
25Copyright © 2016 NTT DATA Corporation
Change data model
ID date 0:00 0:30 1:00 … 22:30 23:00 23:30
1 8/1 100 300 500 … 1000 1100 1200
2 8/1 100 200 300 … 800 900 1000
ID timestamp value
1 8/1 0:00 100
2 8/1 0:00 100
1 8/1 0:30 300
2 8/1 0:30 200
1 8/1 1:00 500
2 8/1 1:00 300
… … …
1 8/1 22:30 1000
2 8/1 22:30 800
1 8/1 23:00 1100
2 8/1 23:00 900
1 8/1 23:30 1200
2 8/1 23:30 1000
INSERT model UPDATE model
remove duplicated data (ID, timestamp)
num of tuples/day: 240 million →5 million
size: 22GB→3GB
26Copyright © 2016 NTT DATA Corporation
result
108
11
0
20
40
60
80
100
120
datasize(TB)
reduce data size
before after
27Copyright © 2016 NTT DATA Corporation
(3) Stabilize large scale SELECT performance !
★
main processing
5 million datasets
per 30 min
validate
save
data
save
calculated datacalculation
within 10minutes
• 240 million additional tuples per
day
• must be saved for 24 months
5 million
tuple
INSERT
large scale
SELECT
5 million
tuple
INSERT
Mission 1
Mission 2
Mission 3
28Copyright © 2016 NTT DATA Corporation
Stabilize the performance of 10 million SELECT statements!
“stable performance” is important
• Performance degradation is caused by sudden changes in
execution plan is problem
control
execution plans
pg_hint_plan
lock statistical
information
pg_dbms_stats
stable performance
29Copyright © 2016 NTT DATA Corporation
Before using pg_hint_plan & pg_dbms_stats
In most cases, optimizer generates the best execution plan
fixing execution plan does not always bring good result
• The best execution plan at this time may not be best in the future.
However, it is necessary to reduce the risk.
If execution plan suddenly changed during operation, and
performance maybe reduced.
→Understand the demerits and use these extensions
• SELECT immediately after batch, before
ANALYZE
• SELECT from a lot of tables (JOIN)
• …
30Copyright © 2016 NTT DATA Corporation
pg_dbms_stats
Planner
pg_dbms_stats
PostgreSQL
Original
statistics
Plan
generate
Lock
“Locked”
statistics
31Copyright © 2016 NTT DATA Corporation
pg_dbms_stats in this system
usage
data
day
table
locked
statistics
day
table
locked
statistics
day
table
locked
statistics
day partition
set locked statistics with new table
COPY some statistics are different
depending on each child table
We can certainly get best plan even without
using ANALYZE.
• table’s OID, table name
• partition key, date
32Copyright © 2016 NTT DATA Corporation
Replacing statistics that should be changed according to table
• Create assumed dummy data
• ANALYZE dummy data
Column statistic
partition key Most Common Value
Date Histogram
Ex) “ 8/1 0:00” , “8/1 0:30”, “8/1 1:00”
48 pattern per day. Uniform distribution.
33Copyright © 2016 NTT DATA Corporation
1. Load 10 million datasets within 10 minutes !
2. Must save data for 24 months !
3. Stabilize large scale SELECT performance !
Mission
COMPLETE
34Copyright © 2016 NTT DATA Corporation
conclusion
The 20th anniversary of PostgreSQL
PostgreSQL finally evolved to be adopted in large scale social infrastructure.
Both PostgreSQL technical knowledge and business application knowledge are necessary
to be successful in difficult and large scale projects.
Pre research and know-how are important to get the full out of PostgreSQL.
Copyright © 2011 NTT DATA Corporation
Copyright © 2016 NTT DATA Corporation
Ad

More Related Content

What's hot (20)

Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Masahiko Sawada
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
Jihoon Son
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
Kohei KaiGai
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
Kohei KaiGai
 
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Sumeet Singh
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi
Kohei KaiGai
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
Kohei KaiGai
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
Wei-Yu Chen
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
VMware Tanzu
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
Kohei KaiGai
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Kohei KaiGai
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
Ryu Kobayashi
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
 
Dataflow shuffle service
Dataflow shuffle service Dataflow shuffle service
Dataflow shuffle service
Yuta Hono
 
Aws meetup (sep 2015) exprimir cada centavo
Aws meetup (sep 2015)   exprimir cada centavoAws meetup (sep 2015)   exprimir cada centavo
Aws meetup (sep 2015) exprimir cada centavo
Sebastian Montini
 
myHadoop 0.30
myHadoop 0.30myHadoop 0.30
myHadoop 0.30
Glenn K. Lockwood
 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Cloudera, Inc.
 
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
Takumi Asai
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)
Ryousei Takano
 
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC ContainerBuilding Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Hitoshi Sato
 
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Masahiko Sawada
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
Jihoon Son
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
Kohei KaiGai
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
Kohei KaiGai
 
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Sumeet Singh
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi
Kohei KaiGai
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
Kohei KaiGai
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
VMware Tanzu
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
Kohei KaiGai
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Kohei KaiGai
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
Ryu Kobayashi
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
 
Dataflow shuffle service
Dataflow shuffle service Dataflow shuffle service
Dataflow shuffle service
Yuta Hono
 
Aws meetup (sep 2015) exprimir cada centavo
Aws meetup (sep 2015)   exprimir cada centavoAws meetup (sep 2015)   exprimir cada centavo
Aws meetup (sep 2015) exprimir cada centavo
Sebastian Montini
 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Cloudera, Inc.
 
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
Takumi Asai
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)
Ryousei Takano
 
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC ContainerBuilding Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Hitoshi Sato
 

Viewers also liked (20)

Application of postgre sql to large social infrastructure jp
Application of postgre sql to large social infrastructure jpApplication of postgre sql to large social infrastructure jp
Application of postgre sql to large social infrastructure jp
NTT DATA OSS Professional Services
 
ブロックチェーンの仕組みと動向(入門編)
ブロックチェーンの仕組みと動向(入門編)ブロックチェーンの仕組みと動向(入門編)
ブロックチェーンの仕組みと動向(入門編)
NTT DATA OSS Professional Services
 
Apache Hadoop 2.8.0 の新機能 (抜粋)
Apache Hadoop 2.8.0 の新機能 (抜粋)Apache Hadoop 2.8.0 の新機能 (抜粋)
Apache Hadoop 2.8.0 の新機能 (抜粋)
NTT DATA OSS Professional Services
 
20170303 java9 hadoop
20170303 java9 hadoop20170303 java9 hadoop
20170303 java9 hadoop
NTT DATA OSS Professional Services
 
商用ミドルウェアのPuppet化で気を付けたい5つのこと
商用ミドルウェアのPuppet化で気を付けたい5つのこと商用ミドルウェアのPuppet化で気を付けたい5つのこと
商用ミドルウェアのPuppet化で気を付けたい5つのこと
NTT DATA OSS Professional Services
 
今からはじめるPuppet 2016 ~ インフラエンジニアのたしなみ ~
今からはじめるPuppet 2016 ~ インフラエンジニアのたしなみ ~今からはじめるPuppet 2016 ~ インフラエンジニアのたしなみ ~
今からはじめるPuppet 2016 ~ インフラエンジニアのたしなみ ~
NTT DATA OSS Professional Services
 
Hadoopエコシステムの最新動向とNTTデータの取り組み (OSC 2016 Tokyo/Spring 講演資料)
Hadoopエコシステムの最新動向とNTTデータの取り組み (OSC 2016 Tokyo/Spring 講演資料)Hadoopエコシステムの最新動向とNTTデータの取り組み (OSC 2016 Tokyo/Spring 講演資料)
Hadoopエコシステムの最新動向とNTTデータの取り組み (OSC 2016 Tokyo/Spring 講演資料)
NTT DATA OSS Professional Services
 
データ活用をもっともっと円滑に! ~データ処理・分析基盤編を少しだけ~
データ活用をもっともっと円滑に!~データ処理・分析基盤編を少しだけ~データ活用をもっともっと円滑に!~データ処理・分析基盤編を少しだけ~
データ活用をもっともっと円滑に! ~データ処理・分析基盤編を少しだけ~
NTT DATA OSS Professional Services
 
Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本
Sotaro Kimura
 
Apache NiFiと 他プロダクトのつなぎ方
Apache NiFiと他プロダクトのつなぎ方Apache NiFiと他プロダクトのつなぎ方
Apache NiFiと 他プロダクトのつなぎ方
Sotaro Kimura
 
値型と参照型
値型と参照型値型と参照型
値型と参照型
chocolamint
 
Hadoopのメンテナンスリリースバージョンをリリースしてみた (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo...
Hadoopのメンテナンスリリースバージョンをリリースしてみた (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo...Hadoopのメンテナンスリリースバージョンをリリースしてみた (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo...
Hadoopのメンテナンスリリースバージョンをリリースしてみた (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo...
NTT DATA OSS Professional Services
 
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's ToolkitUsing Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Mark Rittman
 
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
NTT DATA OSS Professional Services
 
PostgreSQLでpg_bigmを使って日本語全文検索 (MySQLとPostgreSQLの日本語全文検索勉強会 発表資料)
PostgreSQLでpg_bigmを使って日本語全文検索 (MySQLとPostgreSQLの日本語全文検索勉強会 発表資料)PostgreSQLでpg_bigmを使って日本語全文検索 (MySQLとPostgreSQLの日本語全文検索勉強会 発表資料)
PostgreSQLでpg_bigmを使って日本語全文検索 (MySQLとPostgreSQLの日本語全文検索勉強会 発表資料)
NTT DATA OSS Professional Services
 
本当にあったHadoopの恐い話 Blockはどこへきえた? (Hadoop / Spark Conference Japan 2016 ライトニングトー...
本当にあったHadoopの恐い話Blockはどこへきえた? (Hadoop / Spark Conference Japan 2016 ライトニングトー...本当にあったHadoopの恐い話Blockはどこへきえた? (Hadoop / Spark Conference Japan 2016 ライトニングトー...
本当にあったHadoopの恐い話 Blockはどこへきえた? (Hadoop / Spark Conference Japan 2016 ライトニングトー...
NTT DATA OSS Professional Services
 
SIプロジェクトでのインフラ自動化の事例 (第1回 Puppetユーザ会 発表資料)
SIプロジェクトでのインフラ自動化の事例 (第1回 Puppetユーザ会 発表資料)SIプロジェクトでのインフラ自動化の事例 (第1回 Puppetユーザ会 発表資料)
SIプロジェクトでのインフラ自動化の事例 (第1回 Puppetユーザ会 発表資料)
NTT DATA OSS Professional Services
 
サポートメンバは見た! Hadoopバグワースト10 (adoop / Spark Conference Japan 2016 ライトニングトーク発表資料)
サポートメンバは見た! Hadoopバグワースト10 (adoop / Spark Conference Japan 2016 ライトニングトーク発表資料)サポートメンバは見た! Hadoopバグワースト10 (adoop / Spark Conference Japan 2016 ライトニングトーク発表資料)
サポートメンバは見た! Hadoopバグワースト10 (adoop / Spark Conference Japan 2016 ライトニングトーク発表資料)
NTT DATA OSS Professional Services
 
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Sotaro Kimura
 
PostgreSQLコミュニティに飛び込もう
PostgreSQLコミュニティに飛び込もうPostgreSQLコミュニティに飛び込もう
PostgreSQLコミュニティに飛び込もう
NTT DATA OSS Professional Services
 
商用ミドルウェアのPuppet化で気を付けたい5つのこと
商用ミドルウェアのPuppet化で気を付けたい5つのこと商用ミドルウェアのPuppet化で気を付けたい5つのこと
商用ミドルウェアのPuppet化で気を付けたい5つのこと
NTT DATA OSS Professional Services
 
今からはじめるPuppet 2016 ~ インフラエンジニアのたしなみ ~
今からはじめるPuppet 2016 ~ インフラエンジニアのたしなみ ~今からはじめるPuppet 2016 ~ インフラエンジニアのたしなみ ~
今からはじめるPuppet 2016 ~ インフラエンジニアのたしなみ ~
NTT DATA OSS Professional Services
 
Hadoopエコシステムの最新動向とNTTデータの取り組み (OSC 2016 Tokyo/Spring 講演資料)
Hadoopエコシステムの最新動向とNTTデータの取り組み (OSC 2016 Tokyo/Spring 講演資料)Hadoopエコシステムの最新動向とNTTデータの取り組み (OSC 2016 Tokyo/Spring 講演資料)
Hadoopエコシステムの最新動向とNTTデータの取り組み (OSC 2016 Tokyo/Spring 講演資料)
NTT DATA OSS Professional Services
 
データ活用をもっともっと円滑に! ~データ処理・分析基盤編を少しだけ~
データ活用をもっともっと円滑に!~データ処理・分析基盤編を少しだけ~データ活用をもっともっと円滑に!~データ処理・分析基盤編を少しだけ~
データ活用をもっともっと円滑に! ~データ処理・分析基盤編を少しだけ~
NTT DATA OSS Professional Services
 
Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本
Sotaro Kimura
 
Apache NiFiと 他プロダクトのつなぎ方
Apache NiFiと他プロダクトのつなぎ方Apache NiFiと他プロダクトのつなぎ方
Apache NiFiと 他プロダクトのつなぎ方
Sotaro Kimura
 
値型と参照型
値型と参照型値型と参照型
値型と参照型
chocolamint
 
Hadoopのメンテナンスリリースバージョンをリリースしてみた (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo...
Hadoopのメンテナンスリリースバージョンをリリースしてみた (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo...Hadoopのメンテナンスリリースバージョンをリリースしてみた (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo...
Hadoopのメンテナンスリリースバージョンをリリースしてみた (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo...
NTT DATA OSS Professional Services
 
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's ToolkitUsing Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Mark Rittman
 
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
NTT DATA OSS Professional Services
 
PostgreSQLでpg_bigmを使って日本語全文検索 (MySQLとPostgreSQLの日本語全文検索勉強会 発表資料)
PostgreSQLでpg_bigmを使って日本語全文検索 (MySQLとPostgreSQLの日本語全文検索勉強会 発表資料)PostgreSQLでpg_bigmを使って日本語全文検索 (MySQLとPostgreSQLの日本語全文検索勉強会 発表資料)
PostgreSQLでpg_bigmを使って日本語全文検索 (MySQLとPostgreSQLの日本語全文検索勉強会 発表資料)
NTT DATA OSS Professional Services
 
本当にあったHadoopの恐い話 Blockはどこへきえた? (Hadoop / Spark Conference Japan 2016 ライトニングトー...
本当にあったHadoopの恐い話Blockはどこへきえた? (Hadoop / Spark Conference Japan 2016 ライトニングトー...本当にあったHadoopの恐い話Blockはどこへきえた? (Hadoop / Spark Conference Japan 2016 ライトニングトー...
本当にあったHadoopの恐い話 Blockはどこへきえた? (Hadoop / Spark Conference Japan 2016 ライトニングトー...
NTT DATA OSS Professional Services
 
SIプロジェクトでのインフラ自動化の事例 (第1回 Puppetユーザ会 発表資料)
SIプロジェクトでのインフラ自動化の事例 (第1回 Puppetユーザ会 発表資料)SIプロジェクトでのインフラ自動化の事例 (第1回 Puppetユーザ会 発表資料)
SIプロジェクトでのインフラ自動化の事例 (第1回 Puppetユーザ会 発表資料)
NTT DATA OSS Professional Services
 
サポートメンバは見た! Hadoopバグワースト10 (adoop / Spark Conference Japan 2016 ライトニングトーク発表資料)
サポートメンバは見た! Hadoopバグワースト10 (adoop / Spark Conference Japan 2016 ライトニングトーク発表資料)サポートメンバは見た! Hadoopバグワースト10 (adoop / Spark Conference Japan 2016 ライトニングトーク発表資料)
サポートメンバは見た! Hadoopバグワースト10 (adoop / Spark Conference Japan 2016 ライトニングトーク発表資料)
NTT DATA OSS Professional Services
 
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Sotaro Kimura
 
Ad

Similar to Application of postgre sql to large social infrastructure (20)

BigData @ comScore
BigData @ comScoreBigData @ comScore
BigData @ comScore
eaiti
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
DataWorks Summit/Hadoop Summit
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
NETWAYS
 
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
Xpand IT
 
Big data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQueryBig data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQuery
Thuyen Ho
 
Apache Druid Design and Future prospect
Apache Druid Design and Future prospectApache Druid Design and Future prospect
Apache Druid Design and Future prospect
c-bslim
 
Hypothetical Partitioning for PostgreSQL
Hypothetical Partitioning for PostgreSQLHypothetical Partitioning for PostgreSQL
Hypothetical Partitioning for PostgreSQL
Yuzuko Hosoya
 
Balogh gyorgy big_data
Balogh gyorgy big_dataBalogh gyorgy big_data
Balogh gyorgy big_data
LogDrill
 
Big Data and PostgreSQL
Big Data and PostgreSQLBig Data and PostgreSQL
Big Data and PostgreSQL
PGConf APAC
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB
 
M|18 Analytics in the Real World, Case Studies and Use Cases
M|18 Analytics in the Real World, Case Studies and Use CasesM|18 Analytics in the Real World, Case Studies and Use Cases
M|18 Analytics in the Real World, Case Studies and Use Cases
MariaDB plc
 
Sensor Data Management & Analytics: Advanced Process Control
Sensor Data Management & Analytics: Advanced Process ControlSensor Data Management & Analytics: Advanced Process Control
Sensor Data Management & Analytics: Advanced Process Control
TIBCO_Software
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
Itai Yaffe
 
Tw Bizcases
Tw BizcasesTw Bizcases
Tw Bizcases
Praveen Kumar Peddi
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
MapR Technologies
 
How to Suceed in Hadoop
How to Suceed in HadoopHow to Suceed in Hadoop
How to Suceed in Hadoop
Precisely
 
Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0
oysteing
 
Syncsort & comScore Big Data Warehouse Meetup Sept 2013
Syncsort & comScore Big Data Warehouse Meetup Sept 2013Syncsort & comScore Big Data Warehouse Meetup Sept 2013
Syncsort & comScore Big Data Warehouse Meetup Sept 2013
Steven Totman
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Seeling Cheung
 
BigData @ comScore
BigData @ comScoreBigData @ comScore
BigData @ comScore
eaiti
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
DataWorks Summit/Hadoop Summit
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
NETWAYS
 
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
Xpand IT
 
Big data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQueryBig data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQuery
Thuyen Ho
 
Apache Druid Design and Future prospect
Apache Druid Design and Future prospectApache Druid Design and Future prospect
Apache Druid Design and Future prospect
c-bslim
 
Hypothetical Partitioning for PostgreSQL
Hypothetical Partitioning for PostgreSQLHypothetical Partitioning for PostgreSQL
Hypothetical Partitioning for PostgreSQL
Yuzuko Hosoya
 
Balogh gyorgy big_data
Balogh gyorgy big_dataBalogh gyorgy big_data
Balogh gyorgy big_data
LogDrill
 
Big Data and PostgreSQL
Big Data and PostgreSQLBig Data and PostgreSQL
Big Data and PostgreSQL
PGConf APAC
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB
 
M|18 Analytics in the Real World, Case Studies and Use Cases
M|18 Analytics in the Real World, Case Studies and Use CasesM|18 Analytics in the Real World, Case Studies and Use Cases
M|18 Analytics in the Real World, Case Studies and Use Cases
MariaDB plc
 
Sensor Data Management & Analytics: Advanced Process Control
Sensor Data Management & Analytics: Advanced Process ControlSensor Data Management & Analytics: Advanced Process Control
Sensor Data Management & Analytics: Advanced Process Control
TIBCO_Software
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
Itai Yaffe
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
MapR Technologies
 
How to Suceed in Hadoop
How to Suceed in HadoopHow to Suceed in Hadoop
How to Suceed in Hadoop
Precisely
 
Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0
oysteing
 
Syncsort & comScore Big Data Warehouse Meetup Sept 2013
Syncsort & comScore Big Data Warehouse Meetup Sept 2013Syncsort & comScore Big Data Warehouse Meetup Sept 2013
Syncsort & comScore Big Data Warehouse Meetup Sept 2013
Steven Totman
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Seeling Cheung
 
Ad

More from NTT DATA OSS Professional Services (15)

Global Top 5 を目指す NTT DATA の確かで意外な技術力
Global Top 5 を目指す NTT DATA の確かで意外な技術力Global Top 5 を目指す NTT DATA の確かで意外な技術力
Global Top 5 を目指す NTT DATA の確かで意外な技術力
NTT DATA OSS Professional Services
 
Spark SQL - The internal -
Spark SQL - The internal -Spark SQL - The internal -
Spark SQL - The internal -
NTT DATA OSS Professional Services
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
NTT DATA OSS Professional Services
 
Hadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返りHadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返り
NTT DATA OSS Professional Services
 
HDFS Router-based federation
HDFS Router-based federationHDFS Router-based federation
HDFS Router-based federation
NTT DATA OSS Professional Services
 
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイントPostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
NTT DATA OSS Professional Services
 
Apache Hadoopの新機能Ozoneの現状
Apache Hadoopの新機能Ozoneの現状Apache Hadoopの新機能Ozoneの現状
Apache Hadoopの新機能Ozoneの現状
NTT DATA OSS Professional Services
 
Distributed data stores in Hadoop ecosystem
Distributed data stores in Hadoop ecosystemDistributed data stores in Hadoop ecosystem
Distributed data stores in Hadoop ecosystem
NTT DATA OSS Professional Services
 
Structured Streaming - The Internal -
Structured Streaming - The Internal -Structured Streaming - The Internal -
Structured Streaming - The Internal -
NTT DATA OSS Professional Services
 
Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoopの未来 3系になって何が変わるのか?Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoopの未来 3系になって何が変わるのか?
NTT DATA OSS Professional Services
 
Apache Hadoop and YARN, current development status
Apache Hadoop and YARN, current development statusApache Hadoop and YARN, current development status
Apache Hadoop and YARN, current development status
NTT DATA OSS Professional Services
 
HDFS basics from API perspective
HDFS basics from API perspectiveHDFS basics from API perspective
HDFS basics from API perspective
NTT DATA OSS Professional Services
 
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
NTT DATA OSS Professional Services
 
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
NTT DATA OSS Professional Services
 
Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)
Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)
Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)
NTT DATA OSS Professional Services
 
Global Top 5 を目指す NTT DATA の確かで意外な技術力
Global Top 5 を目指す NTT DATA の確かで意外な技術力Global Top 5 を目指す NTT DATA の確かで意外な技術力
Global Top 5 を目指す NTT DATA の確かで意外な技術力
NTT DATA OSS Professional Services
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
NTT DATA OSS Professional Services
 
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイントPostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
NTT DATA OSS Professional Services
 
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
NTT DATA OSS Professional Services
 
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
NTT DATA OSS Professional Services
 
Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)
Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)
Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)
NTT DATA OSS Professional Services
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 

Application of postgre sql to large social infrastructure

  • 1. Copyright © 2016 NTT DATA Corporation December 2, 2016 NTT Data Corporation Ayumi Ishii Application of PostgreSQL to large social infrastructure PGCONF.ASIA 2016
  • 2. Copyright © 2016 NTT DATA Corporation 2 How to use PostgreSQL in social infrastructure
  • 3. 3Copyright © 2016 NTT DATA Corporation Positioning of smart meter management system aggregation device SM SM SM smart meter management system SM Data Center SM SM SM aggregation device wheeling management system fee calculation for new menu other power companies billing processing member management system reward points system switching support system Organization for Cross- regional Coordination of Transmission Operators ★
  • 4. 4Copyright © 2016 NTT DATA Corporation Main processing and mission of the system main processing 5 million datasets per 30 min validate save data save calculated datacalculation within 10minutes • 240 million additional tuples per day • must be saved for 24 months 5 million tuple INSERT Mission 1 Mission 2 large scale SELECT Mission 35 million tuple INSERT
  • 5. 5Copyright © 2016 NTT DATA Corporation Mission 1. Load 10 million datasets within 10 minutes ! 2. Must save data for 24 months ! 3. Stabilize large scale SELECT performance !
  • 6. 6Copyright © 2016 NTT DATA Corporation (1) Load 10 million datasets within 10 minutes ! ★ main processing 5 million datasets per 30 min validate save data save calculated datacalculation within 10minutes • 240 million additional tuples per day • must be saved for 24 months 5 million tuple INSERT Mission 2 large scale SELECT Mission 35 million tuple INSERT Mission 1
  • 7. 7Copyright © 2016 NTT DATA Corporation Data model data : [Device ID] [Date] [Electricity Usage] ex) ID: 1 used 500 at 1:00 August 1st. Method 1 :UPDATE model UPDATE new data for each device, daily Device ID Day 0:00 0:30 1:00 1:30 … 1 8/1 100 300 500 2 8/1 200 400 Frequent UPADATEs are unfavorable for PostgreSQL in terms of performance
  • 8. 8Copyright © 2016 NTT DATA Corporation Data model Device ID Date Value 1 8/1 0:00 100 1 8/1 0:30 300 1 8/1 1:00 500 … … … ○ performance × data size Method 2 : INSERT model INSERT new data for each device, every 30 mins Method 1 :UPDATE model Device ID Day 0:00 0:30 1:00 1:30 … 1 8/1 100 300 500 2 8/1 200 400
  • 9. 9Copyright © 2016 NTT DATA Corporation Data model Device ID Date Value 1 8/1 0:00 100 1 8/1 0:30 300 1 8/1 1:00 500 … … … ○ performance × data size Method 2 : INSERT model INSERT new data for each device, every 30 mins Method 1 :UPDATE model Device ID Day 0:00 0:30 1:00 1:30 … 1 8/1 100 300 500 2 8/1 200 400 Selected based on performance
  • 10. 10Copyright © 2016 NTT DATA Corporation Performance factors number of tuples in one transaction ? multiplicity? parameters? data type? restrictions? index? version? pre research regarding performance factors how to load to partition table?
  • 11. 11Copyright © 2016 NTT DATA Corporation Performance factors number of tuples in one transaction 10000multiplicity 8 parameter wal_bugffers=1GB data type minimumrestriction minimum index minimum version 9.4 direct load to partition child table DB design performance tuning
  • 12. 12Copyright © 2016 NTT DATA Corporation Performance factors number of tuples in one transaction 10000multiplicity 8 parameter wal_bugffers=1GB data type minimumrestriction minimum index minimum version 9.4 direct load to partition child table
  • 13. 13Copyright © 2016 NTT DATA Corporation Bottleneck Analysis with perf 19.83% postgres postgres [.] XLogInsert ★ 6.45% postgres postgres [.] LWLockRelease 4.41% postgres postgres [.] PinBuffer 3.03% postgres postgres [.] LWLockAcquire WAL is the bottleneck ! perf WAL WAL file Disk I/O memory WAL buffer write ・commit ・buffer is full
  • 14. 14Copyright © 2016 NTT DATA Corporation wal_buffers parameter “The auto-tuning selected by the default setting of -1 should give reasonable results in most cases.” by PostgreSQL Document
  • 15. 15Copyright © 2016 NTT DATA Corporation wal_buffers ※INSERT only (except SELECT) 0:00:00 0:01:00 0:02:00 0:03:00 0:04:00 0:05:00 0:06:00 0:07:00 0:08:00 0:09:00 16MB 1GB Time Impact of WAL_buffers
  • 16. 16Copyright © 2016 NTT DATA Corporation PostgreSQL version ・WAL performance improved ・JSONB ・GIN performance improved ・CONCURRENTLY option 9.3 9.4
  • 17. 17Copyright © 2016 NTT DATA Corporation Version up • We had originally planned to use 9.3, but changed to 9.4. 0:00:00 0:01:00 0:02:00 0:03:00 0:04:00 0:05:00 0:06:00 0:07:00 0:08:00 9.3 9.4 time impact of version up ※INSERT only (except SELECT)
  • 18. 18Copyright © 2016 NTT DATA Corporation 0:07:57 0:06:59 0:05:49 0:03:29 0:03:29 0:03:29 0:00:00 0:02:00 0:04:00 0:06:00 0:08:00 0:10:00 0:12:01 9.3, 16MB 9.3, 1GB 9.4, 1GB time Result target accomplished!! other processes are already tuned. ■INSERT ■others
  • 19. 19Copyright © 2016 NTT DATA Corporation (2) Must save data for 24 months ! ★ main processing 5 million datasets per 30 min validate save data save calculated datacalculation within 10minutes • 240 million additional tuples per day • must be saved for 24 months 5 million tuple INSERT large scale SELECT Mission 35 million tuple INSERT Mission 1 Mission 2
  • 20. 108TB
  • 21. 21Copyright © 2016 NTT DATA Corporation Reduce data size by selecting the best data type • Integer  Use the smallest data type that can cover the range and precision • Boolean  Use BOOLEAN instead of CHAR(1) Type precision Size SMALLINT 4 digit 2 byte INTEGER 9 digit 4 byte BIGINT 18 digit 8 byte NUMERIC 1000 digit 3 or 6 or 8 + ceiling(digit / 4) * 2 Type available data Size CHAR(1) string (length is 1) 5 byte BOOLEAN true or false 1 byte
  • 22. 22Copyright © 2016 NTT DATA Corporation Reduce the data size by changing column order • alignment • PostgreSQL does not store data across the alignment 1 2 3 4 5 6 7 8 column_1(4byte) ***PADDING*** column_2(8byte) 8 byte
  • 23. Column Type column_1 integer column_2 timestamp without time zone column_3 integer column_4 smallint column_5 timestamp without time zone column_6 smallint column_7 timestamp without time zone 1 2 3 4 5 6 7 8 column_1 ***PADDING*** column_2 column_3 column_4 *PADDING* column_5 column_6 ********PADDING********* column_7 1 2 3 4 5 6 7 8 column_2 column_5 column_7 column_1 column_3 column_4 column_6 72 60 ex) 12 type / 1 tuple  2.8GB /day!
  • 24. 24Copyright © 2016 NTT DATA Corporation Change data model num data select frequency update frequency policy model 1 1st day ~65th day high high performance is the priority INSERT 2 66th day ~24 months low low data size is the priority UPDATE We adopted INSERT model considering the performance • However, data size is large making it difficult to store long term convert model for old data
  • 25. 25Copyright © 2016 NTT DATA Corporation Change data model ID date 0:00 0:30 1:00 … 22:30 23:00 23:30 1 8/1 100 300 500 … 1000 1100 1200 2 8/1 100 200 300 … 800 900 1000 ID timestamp value 1 8/1 0:00 100 2 8/1 0:00 100 1 8/1 0:30 300 2 8/1 0:30 200 1 8/1 1:00 500 2 8/1 1:00 300 … … … 1 8/1 22:30 1000 2 8/1 22:30 800 1 8/1 23:00 1100 2 8/1 23:00 900 1 8/1 23:30 1200 2 8/1 23:30 1000 INSERT model UPDATE model remove duplicated data (ID, timestamp) num of tuples/day: 240 million →5 million size: 22GB→3GB
  • 26. 26Copyright © 2016 NTT DATA Corporation result 108 11 0 20 40 60 80 100 120 datasize(TB) reduce data size before after
  • 27. 27Copyright © 2016 NTT DATA Corporation (3) Stabilize large scale SELECT performance ! ★ main processing 5 million datasets per 30 min validate save data save calculated datacalculation within 10minutes • 240 million additional tuples per day • must be saved for 24 months 5 million tuple INSERT large scale SELECT 5 million tuple INSERT Mission 1 Mission 2 Mission 3
  • 28. 28Copyright © 2016 NTT DATA Corporation Stabilize the performance of 10 million SELECT statements! “stable performance” is important • Performance degradation is caused by sudden changes in execution plan is problem control execution plans pg_hint_plan lock statistical information pg_dbms_stats stable performance
  • 29. 29Copyright © 2016 NTT DATA Corporation Before using pg_hint_plan & pg_dbms_stats In most cases, optimizer generates the best execution plan fixing execution plan does not always bring good result • The best execution plan at this time may not be best in the future. However, it is necessary to reduce the risk. If execution plan suddenly changed during operation, and performance maybe reduced. →Understand the demerits and use these extensions • SELECT immediately after batch, before ANALYZE • SELECT from a lot of tables (JOIN) • …
  • 30. 30Copyright © 2016 NTT DATA Corporation pg_dbms_stats Planner pg_dbms_stats PostgreSQL Original statistics Plan generate Lock “Locked” statistics
  • 31. 31Copyright © 2016 NTT DATA Corporation pg_dbms_stats in this system usage data day table locked statistics day table locked statistics day table locked statistics day partition set locked statistics with new table COPY some statistics are different depending on each child table We can certainly get best plan even without using ANALYZE. • table’s OID, table name • partition key, date
  • 32. 32Copyright © 2016 NTT DATA Corporation Replacing statistics that should be changed according to table • Create assumed dummy data • ANALYZE dummy data Column statistic partition key Most Common Value Date Histogram Ex) “ 8/1 0:00” , “8/1 0:30”, “8/1 1:00” 48 pattern per day. Uniform distribution.
  • 33. 33Copyright © 2016 NTT DATA Corporation 1. Load 10 million datasets within 10 minutes ! 2. Must save data for 24 months ! 3. Stabilize large scale SELECT performance ! Mission COMPLETE
  • 34. 34Copyright © 2016 NTT DATA Corporation conclusion The 20th anniversary of PostgreSQL PostgreSQL finally evolved to be adopted in large scale social infrastructure. Both PostgreSQL technical knowledge and business application knowledge are necessary to be successful in difficult and large scale projects. Pre research and know-how are important to get the full out of PostgreSQL.
  • 35. Copyright © 2011 NTT DATA Corporation Copyright © 2016 NTT DATA Corporation