0% found this document useful (0 votes)
18 views110 pages

03-Storage1 Slides

Uploaded by

m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views110 pages

03-Storage1 Slides

Uploaded by

m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

2

Homework #1 is due September 12th @ 11:59pm

Project #0 is due September 12th @ 11:59pm

Project #1 will be released on September 13th

15-445/645 (Fall 2021)


3

We now understand what a database looks like at


a logical level and how to write queries to
read/write data (e.g., using SQL).

We will next learn how to build software that


manages a database (i.e., a DBMS).

15-445/645 (Fall 2021)


4

Relational Databases
Storage
Execution
Concurrency Control
Recovery
Distributed Databases
Potpourri

15-445/645 (Fall 2021)


4

Relational Databases
Storage
Execution
Concurrency Control
Recovery
Distributed Databases
Potpourri

15-445/645 (Fall 2021)


5

The DBMS assumes that the primary storage


location of the database is on non-volatile disk.

The DBMS's components manage the movement


of data between non-volatile and volatile storage.

15-445/645 (Fall 2021)


6

15-445/645 (Fall 2021)


6

15-445/645 (Fall 2021)


6

15-445/645 (Fall 2021)


6

15-445/645 (Fall 2021)


6

15-445/645 (Fall 2021)


6

15-445/645 (Fall 2021)


6

15-445/645 (Fall 2021)


6

15-445/645 (Fall 2021)


6

15-445/645 (Fall 2021)


6

15-445/645 (Fall 2021)


6

15-445/645 (Fall 2021)


7

15-445/645 (Fall 2021)


7

15-445/645 (Fall 2021)


8

Random access on non-volatile storage is usually


much slower than sequential access.

DBMS will want to maximize sequential access.


→ Algorithms try to reduce number of writes to random
pages so that data is stored in contiguous blocks.
→ Allocating multiple pages at the same time is called an
extent.

15-445/645 (Fall 2021)


9

Allow the DBMS to manage databases that exceed


the amount of memory available.

Reading/writing to disk is expensive, so it must be


managed carefully to avoid large stalls and
performance degradation.

Random access on disk is usually much slower


than sequential access, so the DBMS will want to
maximize sequential access.
15-445/645 (Fall 2021)
10

15-445/645 (Fall 2021)


10

Directory Header Header Header Header Header

1 2 3 4 5 …

15-445/645 (Fall 2021)


10

Directory Header Header Header Header Header

1 2 3 4 5 …

15-445/645 (Fall 2021)


10

Directory Header Header Header Header Header

1 2 3 4 5 …

15-445/645 (Fall 2021)


10

Directory Header Header Header Header Header

1 2 3 4 5 …

15-445/645 (Fall 2021)


10

Directory

Directory Header Header Header Header Header

1 2 3 4 5 …

15-445/645 (Fall 2021)


10

Directory Header

Directory Header Header Header Header Header

1 2 3 4 5 …

15-445/645 (Fall 2021)


10

Directory Header

Directory Header Header Header Header Header

1 2 3 4 5 …

15-445/645 (Fall 2021)


10

Directory Header

Directory Header Header Header Header Header

1 2 3 4 5 …

15-445/645 (Fall 2021)


10

Directory Header

Directory Header Header Header Header Header

1 2 3 4 5 …

15-445/645 (Fall 2021)


10

Directory Header

Directory Header Header Header Header Header

1 2 3 4 5 …

15-445/645 (Fall 2021)


10

Directory Header

Directory Header Header Header Header Header

1 2 3 4 5 …

15-445/645 (Fall 2021)


11

The DBMS can use memory mapping


(mmap) to store the contents of a file
into the address space of a program.

The OS is responsible for moving the


pages of the file in and out of memory,
so the DBMS doesn’t need to worry
about it.

15-445/645 (Fall 2021)


11

The DBMS can use memory mapping


(mmap) to store the contents of a file
into the address space of a program.

The OS is responsible for moving the


pages of the file in and out of memory,
so the DBMS doesn’t need to worry
about it. page1 page2 page3 page4

15-445/645 (Fall 2021)


11

The DBMS can use memory mapping


(mmap) to store the contents of a file
into the address space of a program. page1
page2

The OS is responsible for moving the page3

pages of the file in and out of memory, page4

so the DBMS doesn’t need to worry


about it. page1 page2 page3 page4

15-445/645 (Fall 2021)


11

The DBMS can use memory mapping


(mmap) to store the contents of a file
into the address space of a program. page1
page2

The OS is responsible for moving the page3

pages of the file in and out of memory, page4

so the DBMS doesn’t need to worry


about it. page1 page2 page3 page4

15-445/645 (Fall 2021)


11

The DBMS can use memory mapping


(mmap) to store the contents of a file
into the address space of a program. page1 page1
page2

The OS is responsible for moving the page3

pages of the file in and out of memory, page4

so the DBMS doesn’t need to worry


about it. page1 page2 page3 page4

15-445/645 (Fall 2021)


11

The DBMS can use memory mapping


(mmap) to store the contents of a file
into the address space of a program. page1 page1
page2

The OS is responsible for moving the page3

pages of the file in and out of memory, page4

so the DBMS doesn’t need to worry


about it. page1 page2 page3 page4

15-445/645 (Fall 2021)


11

The DBMS can use memory mapping


(mmap) to store the contents of a file
into the address space of a program. page1 page1
page2

The OS is responsible for moving the page3

pages of the file in and out of memory, page4

so the DBMS doesn’t need to worry


about it. page1 page2 page3 page4

15-445/645 (Fall 2021)


11

The DBMS can use memory mapping


(mmap) to store the contents of a file
into the address space of a program. page1 page1
page2 page3

The OS is responsible for moving the page3

pages of the file in and out of memory, page4

so the DBMS doesn’t need to worry


about it. page1 page2 page3 page4

15-445/645 (Fall 2021)


11

The DBMS can use memory mapping


(mmap) to store the contents of a file
into the address space of a program. page1 page1
page2 page3

The OS is responsible for moving the page3

pages of the file in and out of memory, page4

so the DBMS doesn’t need to worry


about it. page1 page2 page3 page4

15-445/645 (Fall 2021)


11

The DBMS can use memory mapping


(mmap) to store the contents of a file
into the address space of a program. page1 page1
page2 page3

The OS is responsible for moving the page3

pages of the file in and out of memory, page4

so the DBMS doesn’t need to worry


about it. page1 page2 page3 page4

15-445/645 (Fall 2021)


11

The DBMS can use memory mapping


(mmap) to store the contents of a file
into the address space of a program. page1 page1
page2 page3

The OS is responsible for moving the page3

pages of the file in and out of memory, page4

so the DBMS doesn’t need to worry


about it. page1 page2 page3 page4

15-445/645 (Fall 2021)


11

The DBMS can use memory mapping


(mmap) to store the contents of a file
into the address space of a program. page1 page1
page2 page3

The OS is responsible for moving the page3

pages of the file in and out of memory, page4

so the DBMS doesn’t need to worry


about it. page1 page2 page3 page4

15-445/645 (Fall 2021)


12

What if we allow multiple threads to access the


mmap files to hide page fault stalls?

This works good enough for read-only access.


It is complicated when there are multiple writers…

15-445/645 (Fall 2021)


13

There are some solutions to this


problem:
→ madvise: Tell the OS how you expect to
read certain pages.
→ mlock: Tell the OS that memory ranges
cannot be paged out.
→ msync: Tell the OS to flush memory
ranges out to disk.

15-445/645 (Fall 2021)


13

There are some solutions to this


problem:
→ madvise: Tell the OS how you expect to
read certain pages.
→ mlock: Tell the OS that memory ranges
cannot be paged out.
→ msync: Tell the OS to flush memory
ranges out to disk.

15-445/645 (Fall 2021)


13

There are some solutions to this


problem:
→ madvise: Tell the OS how you expect to
read certain pages.
→ mlock: Tell the OS that memory ranges
cannot be paged out.
→ msync: Tell the OS to flush memory
ranges out to disk.

15-445/645 (Fall 2021)


14

DBMS (almost) always wants to control things


itself and can do a better job than the OS.
→ Flushing dirty pages to disk in the correct order.
→ Specialized prefetching.
→ Buffer replacement policy.
→ Thread/process scheduling.

The OS is not your friend.

15-445/645 (Fall 2021)


15

Problem #1: How the DBMS represents the


database in files on disk.

Problem #2: How the DBMS manages its memory


and moves data back-and-forth from disk.

15-445/645 (Fall 2021)


15

Problem #1: How the DBMS represents the


database in files on disk. ←

Problem #2: How the DBMS manages its memory


and moves data back-and-forth from disk.

15-445/645 (Fall 2021)


15

Problem #1: How the DBMS represents the


database in files on disk. ←

Problem #2: How the DBMS manages its memory


and moves data back-and-forth from disk.

15-445/645 (Fall 2021)


16

File Storage
Page Layout
Tuple Layout

15-445/645 (Fall 2021)


17

The DBMS stores a database as one or more files


on disk typically in a proprietary format.
→ The OS doesn't know anything about the contents of
these files.

Early systems in the 1980s used custom filesystems


on raw storage.
→ Some "enterprise" DBMSs still support this.
→ Most newer DBMSs do not do this.

15-445/645 (Fall 2021)


18

The storage manager is responsible for


maintaining a database's files.
→ Some do their own scheduling for reads and writes to
improve spatial and temporal locality of pages.

It organizes the files as a collection of pages.


→ Tracks data read/written to pages.
→ Tracks the available space.

15-445/645 (Fall 2021)


19

A page is a fixed-size block of data.


→ It can contain tuples, meta-data, indexes, log records…
→ Most systems do not mix page types.
→ Some systems require a page to be self-contained.

Each page is given a unique identifier.


→ The DBMS uses an indirection layer to map page IDs to
physical locations.

15-445/645 (Fall 2021)


20

There are three different notions of


"pages" in a DBMS:
→ Hardware Page (usually 4KB)
→ OS Page (usually 4KB)
→ Database Page (512B-16KB)

A hardware page is the largest block


of data that the storage device can
guarantee failsafe writes.

15-445/645 (Fall 2021)


20

There are three different notions of


"pages" in a DBMS:
→ Hardware Page (usually 4KB)
→ OS Page (usually 4KB)
→ Database Page (512B-16KB)

A hardware page is the largest block


of data that the storage device can
guarantee failsafe writes.

15-445/645 (Fall 2021)


20

There are three different notions of


"pages" in a DBMS:
→ Hardware Page (usually 4KB)
→ OS Page (usually 4KB)
→ Database Page (512B-16KB)

A hardware page is the largest block


of data that the storage device can
guarantee failsafe writes.

15-445/645 (Fall 2021)


22

A heap file is an unordered collection of pages


with tuples that are stored in random order.
→ Create / Get / Write / Delete Page
→ Must also support iterating over all pages.

Two ways to represent a heap file:


→ Linked List
→ Page Directory

15-445/645 (Fall 2021)


23

It is easy to find pages if there is only a


single heap file.

Need meta-data to keep track of what


pages exist in multiple files and which
ones have free space.
Page0 Page1 Page2 Page3 Page4

15-445/645 (Fall 2021)
23

It is easy to find pages if there is only a


single heap file.

Need meta-data to keep track of what


pages exist in multiple files and which
ones have free space.
Page0 Page1 Page2 Page3 Page4

15-445/645 (Fall 2021)
23

It is easy to find pages if there is only a


single heap file.

Need meta-data to keep track of what


pages exist in multiple files and which
ones have free space. ×

Page0 Page1 Page2 Page3 Page4



15-445/645 (Fall 2021)
23

It is easy to find pages if there is only a


single heap file.

Need meta-data to keep track of what


pages exist in multiple files and which
ones have free space.

15-445/645 (Fall 2021)


23

It is easy to find pages if there is only a


single heap file.

Need meta-data to keep track of what


pages exist in multiple files and which
ones have free space.

15-445/645 (Fall 2021)


24

Maintain a header page at the


beginning of the file that stores two
pointers: Header
→ HEAD of the free page list.
→ HEAD of the data page list.

Each page keeps track of how many


free slots they currently have.

15-445/645 (Fall 2021)


24

Maintain a header page at the


beginning of the file that stores two
pointers: Header
→ HEAD of the free page list.
→ HEAD of the data page list.

Each page keeps track of how many


free slots they currently have.

15-445/645 (Fall 2021)


24

Maintain a header page at the Page1 Page4


beginning of the file that stores two
pointers: Header

Data Data
→ HEAD of the free page list.
→ HEAD of the data page list.

Each page keeps track of how many Page0 Page2

free slots they currently have. …


Data Data

15-445/645 (Fall 2021)


24

Maintain a header page at the Page1 Page4


beginning of the file that stores two
pointers: Header

Data Data
→ HEAD of the free page list.
→ HEAD of the data page list.

Each page keeps track of how many Page0 Page2

free slots they currently have. …


Data Data

15-445/645 (Fall 2021)


24

Maintain a header page at the Page1 Page4


beginning of the file that stores two
pointers: Header

Data Data
→ HEAD of the free page list.
→ HEAD of the data page list.

Each page keeps track of how many Page0 Page2

free slots they currently have. …


Data Data

15-445/645 (Fall 2021)


25

The DBMS maintains special pages


that tracks the location of data pages Directory

in the database files.

The directory also records the number


of free slots per page.

Must make sure that the directory


pages are in sync with the data pages.
15-445/645 (Fall 2021)
25

Page0

Data
The DBMS maintains special pages
that tracks the location of data pages Directory
Page1
in the database files.
Data

The directory also records the number


of free slots per page.


Page100

Must make sure that the directory


Data
pages are in sync with the data pages.
15-445/645 (Fall 2021)
25

Page0

Data
The DBMS maintains special pages
that tracks the location of data pages Directory
Page1
in the database files.
Data

The directory also records the number


of free slots per page.


Page100

Must make sure that the directory


Data
pages are in sync with the data pages.
15-445/645 (Fall 2021)
26

File Storage
Page Layout
Tuple Layout

15-445/645 (Fall 2021)


27

Every page contains a header of meta- Header


data about the page's contents.
→ Page Size
→ Checksum Data
→ DBMS Version
→ Transaction Visibility
→ Compression Information

Some systems require pages to be self-


contained (e.g., Oracle).
15-445/645 (Fall 2021)
28

For any page storage architecture, we now need to


decide how to organize the data inside of the page.
→ We are still assuming that we are only storing tuples.

Two approaches:
→ Tuple-oriented
→ Log-structured

15-445/645 (Fall 2021)


28

For any page storage architecture, we now need to


decide how to organize the data inside of the page.
→ We are still assuming that we are only storing tuples.

Two approaches:
→ Tuple-oriented
→ Log-structured

15-445/645 (Fall 2021)


29

How to store tuples in a page? Num Tuples = 0

15-445/645 (Fall 2021)


29

How to store tuples in a page? Num Tuples = 0

Strawman Idea: Keep track of the


number of tuples in a page and then
just append a new tuple to the end.

15-445/645 (Fall 2021)


29

How to store tuples in a page? Num Tuples = 30

Strawman Idea: Keep track of the Tuple #1

number of tuples in a page and then Tuple #2


just append a new tuple to the end. Tuple #3

15-445/645 (Fall 2021)


29

How to store tuples in a page? Num Tuples = 30

Strawman Idea: Keep track of the Tuple #1

number of tuples in a page and then Tuple #2


just append a new tuple to the end. Tuple #3
→ What happens if we delete a tuple?

15-445/645 (Fall 2021)


29

How to store tuples in a page? Num Tuples = 230

Strawman Idea: Keep track of the Tuple #1

number of tuples in a page and then


just append a new tuple to the end. Tuple #3
→ What happens if we delete a tuple?

15-445/645 (Fall 2021)


29

How to store tuples in a page? Num Tuples = 30

Strawman Idea: Keep track of the Tuple #1

number of tuples in a page and then Tuple #4


just append a new tuple to the end. Tuple #3
→ What happens if we delete a tuple?

15-445/645 (Fall 2021)


29

How to store tuples in a page? Num Tuples = 30

Strawman Idea: Keep track of the Tuple #1

number of tuples in a page and then Tuple #4


just append a new tuple to the end. Tuple #3
→ What happens if we delete a tuple?
→ What happens if we have a variable-
length attribute?

15-445/645 (Fall 2021)


30

The most common layout scheme is


called slotted pages. Header

The slot array maps "slots" to the


tuples' starting position offsets.
Tuple #4 Tuple #3

The header keeps track of: Tuple #2 Tuple #1


→ The # of used slots
→ The offset of the starting location of the
last slot used.
15-445/645 (Fall 2021)
30

The most common layout scheme is


called slotted pages. Header

The slot array maps "slots" to the


tuples' starting position offsets.
Tuple #4 Tuple #3

The header keeps track of: Tuple #2 Tuple #1


→ The # of used slots
→ The offset of the starting location of the
last slot used.
15-445/645 (Fall 2021)
30

The most common layout scheme is


called slotted pages. Header

The slot array maps "slots" to the


tuples' starting position offsets.
Tuple #4 Tuple #3

The header keeps track of: Tuple #2 Tuple #1


→ The # of used slots
→ The offset of the starting location of the
last slot used.
15-445/645 (Fall 2021)
30

The most common layout scheme is


called slotted pages. Header

The slot array maps "slots" to the


tuples' starting position offsets.
Tuple #4 Tuple #3

The header keeps track of: Tuple #2 Tuple #1


→ The # of used slots
→ The offset of the starting location of the
last slot used.
15-445/645 (Fall 2021)
30

The most common layout scheme is


called slotted pages. Header

The slot array maps "slots" to the


tuples' starting position offsets.
Tuple #4 Tuple #3

The header keeps track of: Tuple #2 Tuple #1


→ The # of used slots
→ The offset of the starting location of the
last slot used.
15-445/645 (Fall 2021)
30

The most common layout scheme is


called slotted pages. Header

The slot array maps "slots" to the


tuples' starting position offsets.
Tuple #4 Tuple #3

The header keeps track of: Tuple #2 Tuple #1


→ The # of used slots
→ The offset of the starting location of the
last slot used.
15-445/645 (Fall 2021)
30

The most common layout scheme is


called slotted pages. Header

The slot array maps "slots" to the


tuples' starting position offsets.
Tuple #4 Tuple #3

The header keeps track of: Tuple #2 Tuple #1


→ The # of used slots
→ The offset of the starting location of the
last slot used.
15-445/645 (Fall 2021)
30

The most common layout scheme is


called slotted pages. Header

The slot array maps "slots" to the


tuples' starting position offsets.
Tuple #4

The header keeps track of: Tuple #2 Tuple #1


→ The # of used slots
→ The offset of the starting location of the
last slot used.
15-445/645 (Fall 2021)
30

The most common layout scheme is


called slotted pages. Header

The slot array maps "slots" to the


tuples' starting position offsets.
Tuple #4

The header keeps track of: Tuple #2 Tuple #1


→ The # of used slots
→ The offset of the starting location of the
last slot used.
15-445/645 (Fall 2021)
30

The most common layout scheme is


called slotted pages. Header

The slot array maps "slots" to the


tuples' starting position offsets.
Tuple #4

The header keeps track of: Tuple #2 Tuple #1


→ The # of used slots
→ The offset of the starting location of the
last slot used.
15-445/645 (Fall 2021)
31

The DBMS needs a way to keep track


of individual tuples.
Each tuple is assigned a unique record
identifier.
→ Most common: page_id + offset/slot
→ Can also contain file location info.

An application cannot rely on these


IDs to mean anything.
15-445/645 (Fall 2021)
31

The DBMS needs a way to keep track


of individual tuples.
Each tuple is assigned a unique record
identifier.
→ Most common: page_id + offset/slot
→ Can also contain file location info.

An application cannot rely on these


IDs to mean anything.
15-445/645 (Fall 2021)
32

File Storage
Page Layout
Tuple Layout

15-445/645 (Fall 2021)


33

A tuple is essentially a sequence of bytes.

It's the job of the DBMS to interpret those bytes


into attribute types and values.

15-445/645 (Fall 2021)


34

Each tuple is prefixed with a header Header Attribute Data


that contains meta-data about it.
→ Visibility info (concurrency control)
→ Bit Map for NULL values.

We do not need to store meta-data


about the schema.

15-445/645 (Fall 2021)


35

Attributes are typically stored in the Header a b c d e


order that you specify them when you
create the table. CREATE TABLE foo (
a INT PRIMARY KEY,
This is done for software engineering b INT NOT NULL,
reasons (i.e., simplicity). c INT,
d DOUBLE,
e FLOAT
However, it might be more efficient );
to lay them out differently.
15-445/645 (Fall 2021)
36

DBMS can physically denormalize CREATE TABLE foo (


(e.g., "pre join") related tuples and a INT PRIMARY KEY,
b INT NOT NULL,
store them together in the same page. ); CREATE TABLE bar (
→ Potentially reduces the amount of I/O for
c INT PRIMARY KEY,
common workload patterns.
a INT
→ Can make updates more expensive. ⮱REFERENCES foo (a),
);

15-445/645 (Fall 2021)


36

DBMS can physically denormalize CREATE TABLE foo (


(e.g., "pre join") related tuples and a INT PRIMARY KEY,
b INT NOT NULL,
store them together in the same page. ); CREATE TABLE bar (
→ Potentially reduces the amount of I/O for
c INT PRIMARY KEY,
common workload patterns.
a INT
→ Can make updates more expensive. ⮱REFERENCES foo (a),
);

15-445/645 (Fall 2021)


36

DBMS can physically denormalize Header a b


(e.g., "pre join") related tuples and
store them together in the same page.
→ Potentially reduces the amount of I/O for
common workload patterns.
Header c a
→ Can make updates more expensive.
Header c a
Header c a

15-445/645 (Fall 2021)


36

DBMS can physically denormalize Header a b c c c …


(e.g., "pre join") related tuples and
store them together in the same page.
→ Potentially reduces the amount of I/O for
common workload patterns.
→ Can make updates more expensive.

15-445/645 (Fall 2021)


36

DBMS can physically denormalize Header a b c c c …


(e.g., "pre join") related tuples and
store them together in the same page.
→ Potentially reduces the amount of I/O for
common workload patterns.
→ Can make updates more expensive.

15-445/645 (Fall 2021)


36

DBMS can physically denormalize Header a b c c c …


(e.g., "pre join") related tuples and
store them together in the same page.
→ Potentially reduces the amount of I/O for
common workload patterns.
→ Can make updates more expensive.

Not a new idea.


→ IBM System R did this in the 1970s.
→ Several NoSQL DBMSs do this without
calling it physical denormalization.
15-445/645 (Fall 2021)
36

DBMS can physically denormalize Header a b c c c …


(e.g., "pre join") related tuples and
store them together in the same page.
→ Potentially reduces the amount of I/O for
common workload patterns.
→ Can make updates more expensive.

Not a new idea.


→ IBM System R did this in the 1970s.
→ Several NoSQL DBMSs do this without
calling it physical denormalization.
15-445/645 (Fall 2021)
37

Database is organized in pages.


Different ways to track pages.
Different ways to store pages.
Different ways to store tuples.

15-445/645 (Fall 2021)


38

Log-Structured Storage
Value Representation
Storage Models

15-445/645 (Fall 2021)

You might also like