SlideShare a Scribd company logo
POSTGRES INDEXES:
A DEEP DIVE
INTRODUCTION
Bartosz Sypytkowski
▪ @Horusiath
▪ b.sypytkowski@gmail.com
▪ bartoszsypytkowski.com
 B+Tree: how databases manage data?
 How indexes work
 Query plans & execution
 Types of indexes
AGENDA
B+TREE
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
4 32
4 25
4
DATA
32 49
8
DATA
16
DATA
20
DATA
25
DATA
27
DATA
30
DATA
32
DATA
42
DATA
43
DATA
44
DATA
49
DATA
62
DATA
64
DATA
8KB
B+TREE
link
link link link
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
4 32
4 25
4
DATA
32 49
8
DATA
16
DATA
20
DATA
25
DATA
27
DATA
30
DATA
32
DATA
42
DATA
43
DATA
49
DATA
49
DATA
62
DATA
64
DATA
8KB
SEARCH BY EQUALITY
link
link link link
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
4 32
4 25
4
DATA
32 49
8
DATA
16
DATA
20
DATA
25
DATA
27
DATA
30
DATA
32
DATA
42
DATA
43
DATA
44
DATA
49
DATA
62
DATA
64
DATA
8KB
RANGE SCAN
link
link link link
HEADER
4 32
HEADER
4 25
HEADER
32 49
HEADER
4
DATA
8
DATA
16
DATA
20
DATA
HEADER
25
DATA
27
DATA
30
DATA
HEADER
32
DATA
42
DATA
43
DATA
44
DATA
HEADER
49
DATA
62
DATA
64
DATA
8KB
OVERFLOW PAGES
link
link link link
HEADER
OVERFLOW
DATA
OVERFLOW
HEADER
HEADER
DATA
HEADER
POSTGRESQL
TOAST TABLES
book_id title content
1 ‘Haikus’
‘In the twilight rain
these brilliant-hued
hibiscus - A lovely
sunset.’
2 ‘Moby Dick’
public.books
chunk_id chunk_seq chunk_data
16403 0 0xFF2F000000436
16403 1 0x6167F27521F25B
16403 2 0x23FB21030E6F6
16403 3 0x7974686108676
pg_toast.pg_toast_{$OID}
select relname
from pg_class
where oid = (
select reltoastrelid
from pg_class
where relname='books')
Find name of a
corresponding
TOAST table
if(compress(content).len > 2000)
HEADER
HEADER
HEADER
HEADER
HEADER
4 25
4
DATA
8
DATA
16
DATA
20
DATA
25
DATA
27
DATA
30
DATA
TUPLE IDs
link
Table Heap Index Storage
4
TID
8
TID
16
TID
20
TID
25
TID
27
TID
30
TID
32
TID
32
DATA
<block id, tuple offset>
select ctid from my_table
QUERY EXECUTION
select *
from books
where author_id = 10
SEQ SCAN
SEQ SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. (Hopefully) sequential I/O
2. Scans all table’s related pages
3. Doesn’t use index pages
create index on books(publication_date);
select publication_date
from books
where publication_date > ‘2020/01/01’
INDEX
ONLY
SCAN
INDEX ONLY
SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Sequential I/O over index pages
2. Doesn’t use table’s related pages
create index on books(publication_date);
select title, publication_date
from books
where publication_date > ‘2020/01/01’
INDEX
SCAN
INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Uses index to find a first page of the related table…
INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Uses index to find a first page of the related table…
2. Position read cursor on the first page…
INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Uses index to find a first page of the related table…
2. Position read cursor on the first page…
3. Sequential I/O over all table’s pages until condition is done
create index on books
using gist(description_lex);
select title, publication_date
from books
where description_lex @@ ‘epic’
BITMAP
SCAN
BITMAP SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Using index create bitmap of matching pages
Bitmap
BITMAP SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Using index create bitmap of matching pages
2. Random I/O over pages covered by bitmap
Bitmap
INCLUDE & PARTIAL
INDEXES
create index ix_books_by_author
on books(author_id)
include (created_at)
where author_id is not null;
HEADER
HEADER
4
25
HEADER
HEADER
HEADER
4
DATA
7
DATA
13
DATA
16
DATA
19
DATA
25
DATA
32
DATA
47
DATA
61
DATA
4
TID
INC
16
TID
INC
25
TID
INC
32
TID
INC
duplicated
columns
BTREE INDEX
create index ix_users_birthdate on users(birthdate desc);
COMPAR
ING
VECTOR
CLOCKS
B-TREE INDEX
1. Default
2. Access time: always O(logN)
3. Supports most of the index features
create index ix_users_birthdate
on users(birthdate desc);
HASH INDEX
create index ix_orders_no on orders using hash(order_no);
HEADER
HEADER
0x
01
0x
02
HEADER
HEADER
HEADER
Meta page
HASH INDEX
HEADER
Bucket 0
Overflow page Bitmap page
Bucket 1
4
TID
71
TID
13
TID
73
TID
42
TID
67
TID
86
TID
99
TID
5
TID
38
TID
7
TID
51
TID
14
TID
34
TID
66
TID
70
TID
72
TID
90
TID
79
TID
91
TID
82
TID
COMPAR
ING
VECTOR
CLOCKS
HASH INDEX
1. Access time: O(1) – O(N)
2. Doesn’t shrink in size*
create index ix_orders_no
on orders using hash(order_no);
BRIN INDEX
create index ih_events_created_at on events
using brin(created_at) with (pages_per_range = 128);
HEADER
HEADER
HEADER
2020/01/01
2020/03/14
2020/02/09
2020/11/10
2020/01/01
2020/01/21
0..127
BRIN INDEX
link
2020/01/21
2020/03/14
128..255
2020/02/09
2020/07/28
256..383
2020/03/17
2020/11/10
384..511
select tablename, attname, correlation
from pg_stats
where tablename = 'film'
tablename attname correlation
film film_id 0.9979791
film title 0.9979791
film description 0.04854498
film release_year 1
film rating 0.1953281
film last_update 1
film fulltext <null>
COLUMN-TUPLE
CORRELATION
COMPAR
ING
VECTOR
CLOCKS
BRIN INDEX
1. Imprecise
2. Very small in size
3. Good for columns aligned with tuple
insert order and immutable records
create index ih_events_created_at on events
using brin(created_at) with (pages_per_range = 128);
BLOOM INDEX
create index ix_active_codes
on active_codes using bloom(keycode)
with (length=80, col1=2);
BLOOM FILTER
INSERT(C) =
h1(C)/len | h2(C)/len .. | hn(C)/len
INSERT(D) =
h1(D)/len | h2(D)/len .. | hn(D)/len
CONTAINS(A) =
h1(A)/len | h2(A)/len .. | hn(A)/len
CONTAINS(B) =
h1(B)/len | h2(B)/len .. | hn(B)/len
FALSE
MAYBE?
COMPAR
ING
VECTOR
CLOCKS
BLOOM INDEX
1. Small in size
2. Good for exclusion/narrowing
3. False positive ratio: hur.st/bloomfilter/
create extension bloom;
create index ix_active_codes
on active_codes using bloom(keycode)
with (length=80, col1=2);
number of bits per record
number of hashes for each
column
GiST INDEX
create index ix_books_content on books using gist(content_lex);
GiST INDEX
GEO POINTS
image: https://postgrespro.com/blog/pgsql/4175817
GiST INDEX
TSVECTOR
-- gist cannot be applied directly on text columns
alter table film add column
description_lex tsvector
generated always as (to_tsvector('english', description))
stored;
create index idx_film_description_lex
on film using gist(description_lex);
select * from film where description_lex @@ 'epic';
Bitmap Heap Scan on film (cost=4.18..20.32 rows=5 width=416)
Recheck Cond: (description_lex @@ '''epic'''::tsquery)
-> Bitmap Index Scan on idx_film_description_lex (cost=0.00..4.18 rows=5 width=0)
Index Cond: (description_lex @@ '''epic'''::tsquery)
Query Plan
GiST INDEX
TSVECTOR
HEADER
011011 110111
HEADER
010011 011010
HEADER
100110 110001
HEADER
aab
TID
aac
TID
aba
TID HEADER
adf
TID
azf
TID
bac
TID
HEADER
brc
TID
caa
TID
cdl
TID
HEADER
cff
TID
fre
TID
klm
TID
COMPAR
ING
VECTOR
CLOCKS
GiST INDEX
1. Supports specialized operators
2. Index is not updated during deletes
create index ix_books_content
on books using gist(content_lex);
SP-GiST INDEX
create index ix_files_path on files using spgist(path);
SP-GiST INDEX
GEO POINTS
image: https://postgrespro.com/blog/pgsql/4220639
SP-GiST INDEX
TSVECTOR
-- spgist can be created on text column but not on nvarchar
create index idx_film_title on film using spgist(title);
select * from film
where title like ‘A Fast-Paced% in New Orleans';
Bitmap Heap Scan on film (cost=8.66..79.03 rows=51 width=416)
Filter: (description ~~ 'A Fast-Paced%'::text)
-> Bitmap Index Scan on idx_film_title (cost=0.00..8.64 rows=50 width=0)
Index Cond: ((description ~>=~ 'A Fast-Paced'::text) AND (description ~<~ 'A Fast-Pacee'::text))
Query Plan
COMPAR
ING
VECTOR
CLOCKS
GiST INDEX
1. Just like GiST, but faster for some ops…
2. … but unable to perform some other
3. Indexed space is partitioned into non-
overlapping regions
create index ix_files_path
on files using spgist(path);
GIN INDEX
create index ix_books_content on books using gin(content_lex);
GIN INDEX -- gist cannot be applied directly on text columns
alter table film add column
description_lex tsvector
generated always as (to_tsvector('english', description))
stored;
create index idx_film_description_lex
on film using gin(description_lex);
select * from film where description_lex @@ 'epic';
Bitmap Heap Scan on film (cost=8.04..24.18 rows=5 width=416)
Recheck Cond: (description_lex @@ '''epic'''::tsquery)
-> Bitmap Index Scan on idx_film_description_lex (cost=0.00..8.04 rows=5 width=0)
Index Cond: (description_lex @@ '''epic'''::tsquery)
Query Plan
GIN INDEX
HEADER
ever gone
HEADER
HEADER
HEADER
HEADER
omit
call
(1,1)
(3,1)
dome
(1,2)
ever
(1,1)
(2,1)
faucet
(4,0)
gather
(2,1)
(3,2)
gone leather
(2,1)
omit
(2,1)
(3,2)
(2,2) (3,1)
HEADER
HEADER
(1,1)
(1,6)
(1,9)
(1,2) (1,3)
(1,7) (1,8)
(2,1) (2,2)
(2,3)
(2,6)
(2,8)
(2,5)
(2,7)
(3,1)
Posting list
Posting tree
COMPAR
ING
VECTOR
CLOCKS
GIN INDEX
create index ix_books_content
on books using gin(content_lex);
1. Reads usually faster than GiST
2. Writes are usually slower than GiST
3. Index size greater than GiST
RUM INDEX
create index ix_books_content on books using rum(content_lex);
RUM INDEX
HEADER
ever gone
HEADER
HEADER
HEADER
HEADER
omit
call
(1,1)
(3,1)
dome
(1,2)
ever
(1,1)
(2,1)
faucet
(4,0)
gather
(2,1)
(3,2)
gone leather
(2,1)
omit
(2,1)
(3,2)
(2,2) (3,1)
HEADER
HEADER
(1,1)
(1,6)
(1,9)
(1,2) (1,3)
(1,7) (1,8)
(2,1) (2,2)
1, 14
5
2, 5 25
1, 9
45 1, 21
9, 11
2, 10 45
1, 22
13
1
211
2, 4
2, 21
13
25
5
4, 7
(2,3)
(2,6)
(2,8)
(2,5)
(2,7)
(3,1)
2
3
1, 7
8, 13
17
1
RUM INDEX
-- similarity ranking
select description_lex <=> to_tsquery('epic’) as similarity
from books;
-- find description with 2 words located one after another
select * from books
where description_lex @@ to_tsquery(‘hello <-> world’);
COMPAR
ING
VECTOR
CLOCKS
RUM INDEX
1. GIN on steroids (bigger but more
capable)
2. Allows to query for terms and their
relative positions in text
3. Supports Index Scan and EXCLUDE
create extension rum;
create index ix_books_content
on books using rum(content_lex);
SUMMARY
THANK YOU

More Related Content

Similar to Postgres indexes (20)

PPTX
Postgres indexes: how to make them work for your application
Bartosz Sypytkowski
 
PDF
Flexible Indexing with Postgres
EDB
 
PPTX
PostgreSQL - It's kind've a nifty database
Barry Jones
 
PDF
Flexible Indexing with Postgres
EDB
 
PDF
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
PROIDEA
 
PDF
Deep dive to PostgreSQL Indexes
Ibrar Ahmed
 
PDF
Postgres Performance for Humans
Citus Data
 
PDF
Steam Learn: Introduction to RDBMS indexes
inovia
 
PDF
Quick Wins
HighLoad2009
 
PDF
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Ontico
 
PDF
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Ontico
 
PDF
Indexing Complex PostgreSQL Data Types
Jonathan Katz
 
PDF
Postgres performance for humans
Craig Kerstiens
 
PDF
Postgres can do THAT?
alexbrasetvik
 
PDF
SQL: Query optimization in practice
Jano Suchal
 
PDF
Beyond php - it's not (just) about the code
Wim Godden
 
PDF
Full Text Search in PostgreSQL
Aleksander Alekseev
 
PDF
Полнотекстовый поиск в PostgreSQL / Александр Алексеев (Postgres Professional)
Ontico
 
ODP
Beyond PHP - it's not (just) about the code
Wim Godden
 
PDF
PostgreSQL: Advanced indexing
Hans-Jürgen Schönig
 
Postgres indexes: how to make them work for your application
Bartosz Sypytkowski
 
Flexible Indexing with Postgres
EDB
 
PostgreSQL - It's kind've a nifty database
Barry Jones
 
Flexible Indexing with Postgres
EDB
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
PROIDEA
 
Deep dive to PostgreSQL Indexes
Ibrar Ahmed
 
Postgres Performance for Humans
Citus Data
 
Steam Learn: Introduction to RDBMS indexes
inovia
 
Quick Wins
HighLoad2009
 
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Ontico
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Ontico
 
Indexing Complex PostgreSQL Data Types
Jonathan Katz
 
Postgres performance for humans
Craig Kerstiens
 
Postgres can do THAT?
alexbrasetvik
 
SQL: Query optimization in practice
Jano Suchal
 
Beyond php - it's not (just) about the code
Wim Godden
 
Full Text Search in PostgreSQL
Aleksander Alekseev
 
Полнотекстовый поиск в PostgreSQL / Александр Алексеев (Postgres Professional)
Ontico
 
Beyond PHP - it's not (just) about the code
Wim Godden
 
PostgreSQL: Advanced indexing
Hans-Jürgen Schönig
 

More from Bartosz Sypytkowski (16)

PPTX
Full text search, vector search or both?
Bartosz Sypytkowski
 
PPTX
Service-less communication: is it possible?
Bartosz Sypytkowski
 
PPTX
Serviceless or how to build software without servers
Bartosz Sypytkowski
 
PPTX
How do databases perform live backups and point-in-time recovery
Bartosz Sypytkowski
 
PPTX
Scaling connections in peer-to-peer applications
Bartosz Sypytkowski
 
PPTX
Rich collaborative data structures for everyone
Bartosz Sypytkowski
 
PPTX
Behind modern concurrency primitives
Bartosz Sypytkowski
 
PPTX
Collaborative eventsourcing
Bartosz Sypytkowski
 
PPTX
Behind modern concurrency primitives
Bartosz Sypytkowski
 
PPTX
Living in eventually consistent reality
Bartosz Sypytkowski
 
PPTX
Virtual machines - how they work
Bartosz Sypytkowski
 
PPTX
Short story of time
Bartosz Sypytkowski
 
PPTX
Akka.NET streams and reactive streams
Bartosz Sypytkowski
 
PPTX
Collaborative text editing
Bartosz Sypytkowski
 
PPTX
The last mile from db to disk
Bartosz Sypytkowski
 
PPTX
GraphQL - an elegant weapon... for more civilized age
Bartosz Sypytkowski
 
Full text search, vector search or both?
Bartosz Sypytkowski
 
Service-less communication: is it possible?
Bartosz Sypytkowski
 
Serviceless or how to build software without servers
Bartosz Sypytkowski
 
How do databases perform live backups and point-in-time recovery
Bartosz Sypytkowski
 
Scaling connections in peer-to-peer applications
Bartosz Sypytkowski
 
Rich collaborative data structures for everyone
Bartosz Sypytkowski
 
Behind modern concurrency primitives
Bartosz Sypytkowski
 
Collaborative eventsourcing
Bartosz Sypytkowski
 
Behind modern concurrency primitives
Bartosz Sypytkowski
 
Living in eventually consistent reality
Bartosz Sypytkowski
 
Virtual machines - how they work
Bartosz Sypytkowski
 
Short story of time
Bartosz Sypytkowski
 
Akka.NET streams and reactive streams
Bartosz Sypytkowski
 
Collaborative text editing
Bartosz Sypytkowski
 
The last mile from db to disk
Bartosz Sypytkowski
 
GraphQL - an elegant weapon... for more civilized age
Bartosz Sypytkowski
 
Ad

Recently uploaded (20)

PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Human Resources Information System (HRIS)
Amity University, Patna
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Ad

Postgres indexes