0% found this document useful (0 votes)
46 views

How Tables and Indexes Are Stored On Disk

1. Tables and indexes are stored on disk in pages with multiple rows per page to reduce I/O operations. 2. An index stores pointers to rows in the heap to allow quickly looking up specific rows without scanning the entire heap. 3. When querying on an indexed column, the database first looks up the row location in the index, then performs I/O to fetch the row data from the referenced heap page. This is more efficient than scanning the entire

Uploaded by

bimo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

How Tables and Indexes Are Stored On Disk

1. Tables and indexes are stored on disk in pages with multiple rows per page to reduce I/O operations. 2. An index stores pointers to rows in the heap to allow quickly looking up specific rows without scanning the entire heap. 3. When querying on an indexed column, the database first looks up the row location in the index, then performs I/O to fetch the row data from the referenced heap page. This is more efficient than scanning the entire

Uploaded by

bimo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

How tables and indexes are

stored on disk
And how they are queried
Storage concepts

● Table
● Row_id
● Page
● IO
● Heap data structure
● Index data structure b-tree
● Example of a query
Logical Table
column

emp_id emp_name emp_dob emp_salary

2000 Hussein 1/2/1988 $100,000

3000 Adam 3/2/1977 $200,000


row
4000 Ali 5/2/1982 $300,000
Row_ID
● Internal and system maintained
● In certain databases (mysql -innoDB) it is the same as the primary key but other
databases like Postgres have a system column row_id (tuple_id)

row_id emp_id emp_name emp_dob emp_salary

1 2000 Hussein 1/2/1988 $100,000

2 3000 Adam 3/2/1977 $200,000

3 4000 Ali 5/2/1982 $300,000


Page 0
Page 1,10,Hussein,1/2/1
● Depending on the storage model (row vs column store), the rows are 988,$100,000|2,
stored and read in logical pages. 20,Adam,3/2/1977|
● The database doesn’t read a single row, it reads a page or more in a 3,30,Ali,5/2/1982,$
single IO and we get a lot of rows in that IO. 300,000
● Each page has a size (e.g. 8KB in postgres, 16KB in MySQL) Page 1
● Assume each page holds 3 rows in this example, with 1001 rows
you will have 1001/3 = 333~ pages ( Rows 4,5,6 ) …...
Page 2
row_id emp_id emp_name emp_dob emp_salary
( Rows 7,8,9 ) …...
1 10 Hussein 1/2/1988 $100,000
…….
2 20 Adam 3/2/1977 $200,000
Page 333
3 30 Ali 5/2/1982 $300,000
More
rows….1000,10000
... .. ... …. ….
,Eddard,1/27/1999,
$250,000
1000 10000 Eddard 1/27/1999 $250,000
Page 0

IO 1,10,Hussein,1/2/1
988,$100,000|2,
20,Adam,3/2/1977|
● IO operation (input/output) is a read request to the disk 3,30,Ali,5/2/1982,$
300,000
● We try to minimize this as much as possible
Page 1
● An IO can fetch 1 page or more depending on the disk partitions and
( Rows 4,5,6 ) …...
other factors
Page 2
● An IO cannot read a single row, its a page with many rows in them,
( Rows 7,8,9 ) …...
you get them for free.
…….
● You want to minimize the number of IOs as they are expensive.
Page 333
● Some IOs in operating systems goes to the operating system cache
More
and not disk
rows….1000,10000
,Eddard,1/27/1999,
$250,000
Heap Page 0

Heap 1,10,Hussein,1/2/1
988,$100,000|2,
20,Adam,3/2/1977|
● The Heap is data structure where the table is stored with all its 3,30,Ali,5/2/1982,$
300,000
pages one after another.
Page 1
● This is where the actual data is stored including everything
( Rows 4,5,6 ) …...
● Traversing the heap is expensive as we need to read so may data
Page 2
to find what we want
( Rows 7,8,9 ) …...
● That is why we need indexes that help tell us exactly what part of
…….
the heap we need to read. What page(s) of the heap we need to
Page 333
pull
More
rows….1000,10000
,Eddard,1/27/1999,
$250,000
Index
● An index is another data structure separate from the heap that has “pointers” to the
heap
● It has part of the data and used to quickly search for something
● You can index on one column or more.
● Once you find a value of the index, you go to the heap to fetch more information
where everything is there
● Index tells you EXACTLY which page to fetch in the heap instead of taking the hit to
scan every page in the heap
● The index is also stored as pages and cost IO to pull the entries of the index.
● The smaller the index, the more it can fit in memory the faster the search
● Popular data structure for index is b-trees, learn more on that in the b-tree section
Page 0 Heap Page 0
Index on
EMP_ID
1,10,Hussein,1/2/1
10 (1,0) | 20 (2,0) | 30 (3,0) 988,$100,000|2,
40 (4,1) | 50 (5,1) | 60 (6,1) 20,Adam,3/2/1977|
70 (7,2) | 80 (8,2) | 90 (9,2) 3,30,Ali,5/2/1982,$
IO2 on
the heap 300,000
Page 1 to pull
Page 1
exactly
IO1 on the ( Rows 4,5,6 ) …...
the index 100 (10,3) | 110 (11,3) | 120 (12,3)
130 (13,4) | 140 (14,4) | 150 (15,4) page(s)
to find the Page 2
160 (16,5) | 170 (17,5) | 180 (18,5) we found
page/row in the ( Rows 7,8,9 ) …...
index
….. …….

Page N Page 333

More
9920 (992,331) | 9930 (993,331) | 9940 (994,331)
9950 (995,332) | 9960 (996,332) | 9970 (997,332)
rows….1000,10000
9980 (998,333) | 9990 (999,333) | 10000 (1000,333) ,Eddard,1/27/1999,
$250,000
Heap Page 0

1,10,Hussein,1/2/1
988,$100,000|2,
20,Adam,3/2/1977|
3,30,Ali,5/2/1982,$
300,000
No Index - Page 1

SELECT * FROM EMP ( Rows 4,5,6 ) …...


Page 2
WHERE EMP_ID =
( Rows 7,8,9 ) …...
10000; …….

Page 333

More
rows….1000,10000
,Eddard,1/27/1999,
$250,000
Index on Page 0
EMP_ID
10 (1,0) | 20 (2,0) | 30 (3,0)
40 (4,1) | 50 (5,1) | 60 (6,1)
70 (7,2) | 80 (8,2) | 90 (9,2)

With Index - Page 1

SELECT * FROM EMP 100 (10,3) | 110 (11,3) | 120 (12,3)


130 (13,4) | 140 (14,4) | 150 (15,4)
WHERE EMP_ID = 160 (16,5) | 170 (17,5) | 180 (18,5)

10000; …..
Page N

9920 (992,331) | 9930 (993,331) | 9940 (994,331)

10000 (1000,333) 9950 (995,332) | 9960 (996,332) | 9970 (997,332)


9980 (998,333) | 9990 (999,333) | 10000 (1000,333)
Heap Page 0
10000 (1000,333)
Fetch page 333, and pull row 1,10,Hussein,1/2/1
988,$100,000|2,
10000 20,Adam,3/2/1977|
3,30,Ali,5/2/1982,$
300,000
With Index - Page 1

SELECT * FROM EMP ( Rows 4,5,6 ) …...


Page 2
WHERE EMP_ID =
( Rows 7,8,9 ) …...
10000; …….

Page 333

More
rows….1000,10000
,Eddard,1/27/1999,
$250,000
Notes
● Sometimes the heap table can be organized around a single index. This is
called a clustered index or an Index Organized Table.
● Primary key is usually a clustered index unless otherwise specified.
● MySQL InnoDB always have a primary key (clustered index) other indexes
point to the primary key “value”
● Postgres only have secondary indexes and all indexes point directly to the
row_id which lives in the heap.
Storage concepts - Summary

● Table
● Row_id
● Page
● IO
● Heap data structure
● Index data structure b-tree
● Example of a query

You might also like