Notes 02 - Hardware
Notes 02 - Hardware
Notes 2: Hardware
Yousef M. Elmehdwi
Slides: adapted from a courses taught by Hector Garcia-Molina, Stanford, Shun Yan
Cheung, Andy Pavlo, Paris Koutris, & Leonard McMillan
1 / 62
Overview
2 / 62
Outline
3 / 62
Today
Hardware: Disks
Access Times
Optimizations
Other Topics:
Storage costs
Using secondary storage
Disk failures
4 / 62
Disk-Oriented Architecture
The Disk-Oriented Architecture is a fundamental approach in a Database
Management System (DBMS)
The DBMS assumes that the primary storage location of the database is
on non-volatile storage (e.g., HDD, SSD).
The database is stored in a file as a collection of fixed length blocks called
slotted pages on disk.
The DBMS’s components manage the movement of data between
non-volatile and volatile storage.
The system uses an in-memory (volatile) buffer pool to cache blocks
fetched from disk.
Its job is to manage the movement of those blocks back and forth between
disk and memory.
6 / 62
Classification of Physical Storage Media
Volatile means that if you pull the power from the machine, then the data
is lost.
Loses contents when power is switched off
Supports fast random access with byte-addressable locations.
This means that the program can jump to any byte address and get the
data that is there.
For our purposes, we will always refer to this storage class as memory
8 / 62
Non-Volatile Storage
Non-volatile means that the storage device does not need to be provided
continuous power in order for the device to retain the bits that it is
storing
Contents persist even when power is switched off.
Block/page addressable.
to read a value at a particular offset, the program first has to load the 4
KB page into memory that holds the value the program wants to read.
Traditionally better at sequential access
reading multiple contiguous chunks of data at the same time
We will refer to this as disk. We will not make a (major) distinction
between solid-state storage (SSD) or spinning hard drives (HDD).
9 / 62
Persistent Memory
There is also a new class of storage devices that are becoming more
popular called Persistent memory.
Persistent memory (PMEM) is a solid-state high-performance
byte-addressable memory device that resides on the memory bus
These devices are designed to be the best of both worlds: almost as fast
as DRAM with the persistence of disk.
We will not cover these devices in this course.
10 / 62
11 / 62
Disks
12 / 62
Disks
13 / 62
Disks
14 / 62
Components of a Disk
15 / 62
Disks
17 / 62
Terminology: cylinder
One track from each surface will be under the head for that surface and
will therefore be readable and writable.
When you have the same track number on each surface stacked above
each other, they form a cylinder.
The tracks that are under the heads at the same time are said to form a
cylinder.
In other words, a cylinder consists of the set of tracks that are vertically
aligned on all surfaces.
Cylinder i consists of ith track of all the surfaces.
Disk head does not need to move when accessing (read/write) data in the
same cylinder
18 / 62
Disk Storage Characteristics
19 / 62
Today
Hardware: Disks
Access Times
Optimizations
Other Topics:
Storage costs
Using secondary storage
Disk failures
20 / 62
Accessing the Disk
The time taken between the moment at which the command to read a
block is issued and the time that the contents of the block appear in main
memory is called the latency of the disk.
The access time is also called the latency of the disk.
21 / 62
Accessing the Disk
Basic operations:
READ: transfer data from disk to buffer
WRITE: transfer data from buffer to disk
Reading a disk block:
Reading a block from disk requires the disk to start spinning
Disk arm has to be moved to the correct track of the disk
The disk head must wait until the right location on the track is found
Then, the disk block can be read from disk and copied to memory
22 / 62
Accessing the Disk
access time = seek time + rotational delay + transfer time + other delay
Other Delays:
CPU time to issue I/O
Contention for controller
Different programs can be using the disk
Contention for bus, memory
Different programs can be transferring data
These delays are negligible compared to seek time + rotational delay
+ transfer time
“Typical” Value: 0
23 / 62
Accessing the Disk
24 / 62
Accessing the Disk
25 / 62
Average Rotational Delay
On the average, the desired sector will be about half way around the
circle when the heads arrive at its cylinder.
Average rotational delay is time for 12 revolution
Example: Given a total revolution of 7200 RPM
One rotation = 60s×1000
7200
= 8.33 ms
Average rotational latency = 4.16 ms
26 / 62
Accessing the Disk
Data transfer rate: the rate at which data can be retrieved from or stored
to the disk.
Transfer rate: # bits transferred/sec
We can calculate the transfer time by dividing the size of a byte sector by
the transfer rate.
Block size
Given a transfer rate, the transfer time = transf er rate
27 / 62
Steps to access data on a disk
28 / 62
Steps to access data on a disk
2. Wait for the desired sector to arrive under the disk head
Time to wait for a sector = rotational delay
29 / 62
Steps to access data on a disk
3. Transfer the data from sector to main memory (through the disk
controller)
30 / 62
Accessing the Disk
31 / 62
Arranging Blocks on Disk
32 / 62
If we do things right
33 / 62
Rule of Thumb
34 / 62
Cost for Writing similar to Reading
The process of writing a block is, in its simplest form, quite similar to
reading a block
. . . unless we want to verify!
Block size
need to add (full) rotation + transf er rate
35 / 62
To Modify a Block?
36 / 62
Megatron 747 Disk (old)
Example
Rotate at 3600 RPM
Only 1 surface
16 MB usable capacity (usable capacity excludes the gaps)
128 cylinders
seek time:
average = 25 ms.
adjacent cylinders = 5 ms.
1 KB block = 1 sector
10% overhead between blocks
gaps represent 10% of the circle and
sectors represent the remaining 90%
37 / 62
Megatron 747 Disk (old)
1 KB blocks = sectors
10% overhead between blocks
capacity = 16 MB = (220 )16 = 224
# cylinders = 128 = 27
total capacity 220 ×16 24
bytes/cylinder = total # cylinders = 128 = 227 = 217 = 128KB
#blocks/cylinder = capacity of each cylinder
size of block = 128KB
1KB = 128
38 / 62
Megatron 747 Disk (old)
1 Transfer time is the time it takes the sectors of the block and any gaps between them to rotate past the head.
Divide the amount of data by the transfer speed to find the transfer time.
39 / 62
Megatron 747 Disk (old)
40 / 62
Megatron 747 Disk (old)
16.66
Access time = T4 = 25 + 2 + 0.117 × 1 + 0.13 × 3 = 33.83 ms
Compare to T1 = 33.45ms
Q) The time to read a full track is?
41 / 62
Today
Hardware: Disks
Access Times
Optimizations
Other Topics:
Storage costs
Using secondary storage
Disk failures
42 / 62
Optimizations (in controller or O.S.)
43 / 62
Disk Scheduling
44 / 62
Pre-fetching (Double Buffer)
45 / 62
Double Buffering Algorithm
Problem
Have a File
Sequence of Blocks B1, B2, ...
Have a Program
Process B1
Process B2
Process B3
..
.
46 / 62
Single Buffer Solution (Naı̈ve Solution)
1 Read B1 → Buffer
2 Process Data in Buffer
3 Read B2 → Buffer
4 Process Data in Buffer
..
5 .
47 / 62
Single Buffer Solution
Let:
P = time to process/block
R = time to read in 1 block
n = # blocks
1. Read B1 → Buffer ⇒ R
2. Process Data in Buffer ⇒ P
3. Read B2 → Buffer ⇒ R
4. Process Data in Buffer ⇒ P
Time to process n block =n(P+R)
48 / 62
Double Buffering Solution
49 / 62
Double Buffering Solution
50 / 62
Double Buffering Solution
51 / 62
Double Buffering Solution
52 / 62
Double Buffer Solution
Let:
P = time to process/block
R = time to read in 1 block
n = # blocks
Say P ≥ R
What is processing time?
Double buffering time = R+nP
Single buffer time = n(R+P)
53 / 62
Using disk array to accelerate disk access
54 / 62
Techniques: multiple disks
Block Striping
Store blocks of a file over multiple disks
High read and write speed
Mirror disk
Store the same data on multiple disks
Mirrored disks contain identical content
Read operation: n times as fast
Write operation: about the same as 1 disk
RAID
Redundant arrays of inexpensive disks
Is a simple theory of using multiple disks to increase both speed of access
and reliability of disks.
RAID can be implemented using a hardware controller or a software
controller.
Different levels provide different solutions at different price points.
55 / 62
Disk Failures
We consider ways in which disks can fail and what can be done to mitigate
these failures:
The disk is OK. But: due to electrical fluctuations, a disk read (or disk
write) operation failed (a one time event)
Intermittent read failure (Cause: power fluctuations/failure)
Intermittent write failure (Cause: power fluctuation/failure)
Media decay (Disk surface worn out)
A sector is worn
The sector is part of a block and it can no longer be used
Permanent failure (Disk crash)
The disk head has scratched the platter(s)
Data on the whole disk is lost
56 / 62
Coping with Read/Write Failures
Detection
Read (verify) after writing data
Better: Use checksum
Correction
Redundancy
57 / 62
Coping with media decay
58 / 62
Coping with Disk Crash
Only way to recover from a disk crash: Redundancy (e.g., backup copy)
Different ways to achieve redundancy
Exact copy (mirror)
RAID
59 / 62
Summary
60 / 62
Reading
61 / 62
Next
62 / 62