How Tables and Indexes Are Stored On Disk
How Tables and Indexes Are Stored On Disk
stored on disk
And how they are queried
Storage concepts
● Table
● Row_id
● Page
● IO
● Heap data structure
● Index data structure b-tree
● Example of a query
Logical Table
column
IO 1,10,Hussein,1/2/1
988,$100,000|2,
20,Adam,3/2/1977|
● IO operation (input/output) is a read request to the disk 3,30,Ali,5/2/1982,$
300,000
● We try to minimize this as much as possible
Page 1
● An IO can fetch 1 page or more depending on the disk partitions and
( Rows 4,5,6 ) …...
other factors
Page 2
● An IO cannot read a single row, its a page with many rows in them,
( Rows 7,8,9 ) …...
you get them for free.
…….
● You want to minimize the number of IOs as they are expensive.
Page 333
● Some IOs in operating systems goes to the operating system cache
More
and not disk
rows….1000,10000
,Eddard,1/27/1999,
$250,000
Heap Page 0
Heap 1,10,Hussein,1/2/1
988,$100,000|2,
20,Adam,3/2/1977|
● The Heap is data structure where the table is stored with all its 3,30,Ali,5/2/1982,$
300,000
pages one after another.
Page 1
● This is where the actual data is stored including everything
( Rows 4,5,6 ) …...
● Traversing the heap is expensive as we need to read so may data
Page 2
to find what we want
( Rows 7,8,9 ) …...
● That is why we need indexes that help tell us exactly what part of
…….
the heap we need to read. What page(s) of the heap we need to
Page 333
pull
More
rows….1000,10000
,Eddard,1/27/1999,
$250,000
Index
● An index is another data structure separate from the heap that has “pointers” to the
heap
● It has part of the data and used to quickly search for something
● You can index on one column or more.
● Once you find a value of the index, you go to the heap to fetch more information
where everything is there
● Index tells you EXACTLY which page to fetch in the heap instead of taking the hit to
scan every page in the heap
● The index is also stored as pages and cost IO to pull the entries of the index.
● The smaller the index, the more it can fit in memory the faster the search
● Popular data structure for index is b-trees, learn more on that in the b-tree section
Page 0 Heap Page 0
Index on
EMP_ID
1,10,Hussein,1/2/1
10 (1,0) | 20 (2,0) | 30 (3,0) 988,$100,000|2,
40 (4,1) | 50 (5,1) | 60 (6,1) 20,Adam,3/2/1977|
70 (7,2) | 80 (8,2) | 90 (9,2) 3,30,Ali,5/2/1982,$
IO2 on
the heap 300,000
Page 1 to pull
Page 1
exactly
IO1 on the ( Rows 4,5,6 ) …...
the index 100 (10,3) | 110 (11,3) | 120 (12,3)
130 (13,4) | 140 (14,4) | 150 (15,4) page(s)
to find the Page 2
160 (16,5) | 170 (17,5) | 180 (18,5) we found
page/row in the ( Rows 7,8,9 ) …...
index
….. …….
More
9920 (992,331) | 9930 (993,331) | 9940 (994,331)
9950 (995,332) | 9960 (996,332) | 9970 (997,332)
rows….1000,10000
9980 (998,333) | 9990 (999,333) | 10000 (1000,333) ,Eddard,1/27/1999,
$250,000
Heap Page 0
1,10,Hussein,1/2/1
988,$100,000|2,
20,Adam,3/2/1977|
3,30,Ali,5/2/1982,$
300,000
No Index - Page 1
Page 333
More
rows….1000,10000
,Eddard,1/27/1999,
$250,000
Index on Page 0
EMP_ID
10 (1,0) | 20 (2,0) | 30 (3,0)
40 (4,1) | 50 (5,1) | 60 (6,1)
70 (7,2) | 80 (8,2) | 90 (9,2)
10000; …..
Page N
Page 333
More
rows….1000,10000
,Eddard,1/27/1999,
$250,000
Notes
● Sometimes the heap table can be organized around a single index. This is
called a clustered index or an Index Organized Table.
● Primary key is usually a clustered index unless otherwise specified.
● MySQL InnoDB always have a primary key (clustered index) other indexes
point to the primary key “value”
● Postgres only have secondary indexes and all indexes point directly to the
row_id which lives in the heap.
Storage concepts - Summary
● Table
● Row_id
● Page
● IO
● Heap data structure
● Index data structure b-tree
● Example of a query