0% found this document useful (0 votes)
4 views

Notes 02 - Hardware

The document outlines the hardware aspects of advanced database organization, focusing on data storage in database management systems (DBMS). It discusses the importance of understanding disk-oriented architecture, access times, and optimizations related to managing large volumes of data efficiently. Key topics include the classification of storage media, the characteristics of disks, and the processes involved in accessing and modifying data on disks.

Uploaded by

Dhruv Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Notes 02 - Hardware

The document outlines the hardware aspects of advanced database organization, focusing on data storage in database management systems (DBMS). It discusses the importance of understanding disk-oriented architecture, access times, and optimizations related to managing large volumes of data efficiently. Key topics include the classification of storage media, the characteristics of disks, and the processes involved in accessing and modifying data on disks.

Uploaded by

Dhruv Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

CS525-04/05: Advanced Database Organization

Notes 2: Hardware

Yousef M. Elmehdwi

Department of Computer Science


Illinois Institute of Technology

[email protected]

August 23rd 2023

Slides: adapted from a courses taught by Hector Garcia-Molina, Stanford, Shun Yan
Cheung, Andy Pavlo, Paris Koutris, & Leonard McMillan

1 / 62
Overview

In CS425, we already understand what a database looks like at a logical


level and how to write queries to read/write data from it.
In CS525, we will learn how to build software that manages a database.

2 / 62
Outline

Study of data storage in a database management systems


We shall learn the basic techniques for managing data within the
computer
There are two issues we must address which are related to how a DBMS
deals with very large amounts of data efficiently:
How does a computer system store and manage very large volumes of data?
What representations and data structures best support efficient
manipulations of this data?

3 / 62
Today

Hardware: Disks
Access Times
Optimizations
Other Topics:
Storage costs
Using secondary storage
Disk failures

4 / 62
Disk-Oriented Architecture
The Disk-Oriented Architecture is a fundamental approach in a Database
Management System (DBMS)
The DBMS assumes that the primary storage location of the database is
on non-volatile storage (e.g., HDD, SSD).
The database is stored in a file as a collection of fixed length blocks called
slotted pages on disk.
The DBMS’s components manage the movement of data between
non-volatile and volatile storage.
The system uses an in-memory (volatile) buffer pool to cache blocks
fetched from disk.
Its job is to manage the movement of those blocks back and forth between
disk and memory.

Understanding how data is stored on non-volatile storage is crucial. It


directly impacts how the system responds to queries and manages data
modifications.
To understand this further, we want to make the distinction between
volatile and non volatile storage.
5 / 62
Typical Storage Hierarchy

We will focus on a disk-oriented DBMS architecture that assumes that


primary storage location of the database is on non-volatile disk.
At the top of the storage hierarchy, you have the devices that are closest
to the CPU.
This is the fastest storage but it is also the smallest and most expensive.
The further you get away from the CPU, the storage devices have larger
the capacities but are much slower and farther away from the CPU.
These devices also get cheaper per GB.

6 / 62
Classification of Physical Storage Media

Can differentiate storage into:


Volatile Storage
Non-volatile Storage
Non-volatile Memory1
Devices are designed to be the best of both worlds: almost as fast as DRAM
but with the persistence of disk. We will not cover these devices.
Factors affecting choice of storage media include
Speed with which data can be accessed
Cost per unit of data
Reliability

1 Simulations of Ultralow-Power Nonvolatile Cells for Random-Access Memory


7 / 62
Volatile Storage

Volatile means that if you pull the power from the machine, then the data
is lost.
Loses contents when power is switched off
Supports fast random access with byte-addressable locations.
This means that the program can jump to any byte address and get the
data that is there.
For our purposes, we will always refer to this storage class as memory

8 / 62
Non-Volatile Storage

Non-volatile means that the storage device does not need to be provided
continuous power in order for the device to retain the bits that it is
storing
Contents persist even when power is switched off.
Block/page addressable.
to read a value at a particular offset, the program first has to load the 4
KB page into memory that holds the value the program wants to read.
Traditionally better at sequential access
reading multiple contiguous chunks of data at the same time
We will refer to this as disk. We will not make a (major) distinction
between solid-state storage (SSD) or spinning hard drives (HDD).

9 / 62
Persistent Memory

There is also a new class of storage devices that are becoming more
popular called Persistent memory.
Persistent memory (PMEM) is a solid-state high-performance
byte-addressable memory device that resides on the memory bus
These devices are designed to be the best of both worlds: almost as fast
as DRAM with the persistence of disk.
We will not cover these devices in this course.

10 / 62
11 / 62
Disks

A high-level design goal of the DBMS is to support databases that exceed


the amount of memory available.
DBMS stores information on (“hard”) disks.
This has major implications for DBMS design!
READ: transfer data from disk to main memory (RAM).
WRITE: transfer data from RAM to disk.
Reading/writing operations to disk is expensive, relative to in-memory
operations, so it must be managed carefully to avoid large stalls and
performance degradation

12 / 62
Disks

The use of non-volatile storage is one of the important characteristics of a


DBMS.
To motivate many of the ideas used in DBMS implementation, we must
examine the operation of disks in detail

13 / 62
Disks

Secondary storage device of choice


random access vs. sequential
Sequential: read the data contiguously
Random: read the data from anywhere at any time
Data is stored and retrieved in units called disk blocks or pages
Retrieval time depends upon the location of the disk
Therefore, relative placement of pages on disk has major impact on DBMS
performance! Why?

14 / 62
Components of a Disk

A disk contains multiple platters


(usually 2 surfaces per platter)
Platter: circular hard surface on
which data is stored by inducing
magnetic changes
Platters rotates (7200 RPM -
15000 RPM)
RPM (Rotations Per Minute)
Usually, the disk contains
read/write heads that allow to
read/write from all surfaces
simultaneously
All disk heads move at the same
time (in or out)

15 / 62
Disks

Surface of platter divided into circular tracks


Over 50K-100K tracks per platter on typical hard disks
Each track is divided into sectors1 . Sectors are separated from each
other by blank spaces
Gaps are non-magnetic and used to identify the start of a sector
A sector is the smallest unit of data that can be read or written.
Sector size typically 512 bytes
Typical sectors per track: 500 to 1000 (on inner tracks) to 1000 to 2000
(on outer tracks)
A disk block (disk page)2 is usually composed of a number of
consecutive sectors (determined by the operating system)
Data are read/written in units of a disk block (or disk page)
A disk block is the same size as a memory block or page.
Block size: 4K-64K bytes

1 Sector is a physical unit of the disk


2 Block is a logical unit, a creation of whatever software system (OS or DBMS) is using the disk
16 / 62
Top View of a Platter

17 / 62
Terminology: cylinder

One track from each surface will be under the head for that surface and
will therefore be readable and writable.
When you have the same track number on each surface stacked above
each other, they form a cylinder.
The tracks that are under the heads at the same time are said to form a
cylinder.
In other words, a cylinder consists of the set of tracks that are vertically
aligned on all surfaces.
Cylinder i consists of ith track of all the surfaces.
Disk head does not need to move when accessing (read/write) data in the
same cylinder
18 / 62
Disk Storage Characteristics

# Cylinders= # tracks per surface (platter)


e.g., 10 tracks ⇒ 10 cylinders and we can refer to them cylinder zero to
cylinder nine
# tracks per cylinder= # of heads or 2× # platter
Average # sectors per track
bytes per sector
⇒disk capacity/size

19 / 62
Today

Hardware: Disks
Access Times
Optimizations
Other Topics:
Storage costs
Using secondary storage
Disk failures

20 / 62
Accessing the Disk

The time taken between the moment at which the command to read a
block is issued and the time that the contents of the block appear in main
memory is called the latency of the disk.
The access time is also called the latency of the disk.

21 / 62
Accessing the Disk

Basic operations:
READ: transfer data from disk to buffer
WRITE: transfer data from buffer to disk
Reading a disk block:
Reading a block from disk requires the disk to start spinning
Disk arm has to be moved to the correct track of the disk
The disk head must wait until the right location on the track is found
Then, the disk block can be read from disk and copied to memory

22 / 62
Accessing the Disk

access time = seek time + rotational delay + transfer time + other delay

Other Delays:
CPU time to issue I/O
Contention for controller
Different programs can be using the disk
Contention for bus, memory
Different programs can be transferring data
These delays are negligible compared to seek time + rotational delay
+ transfer time
“Typical” Value: 0

23 / 62
Accessing the Disk

access time = seek time + rotational delay + transfer time


Seek time: time to move the arm to position disk head on the right track
(position the read/write head at the proper cylinder)
Seek time includes both the time required for the head to physically move
and the time for the drive’s electronics to settle.
Seek time can be 0 if the heads happen already to be at the proper
cylinder.
If not, the heads require some minimum time to start moving and to stop
again, plus additional time that is roughly proportional to the distance
traveled.
The average seek time is often used as a way to characterize the speed of
the disk.

24 / 62
Accessing the Disk

access time = seek time + rotational delay + transfer time


The platters on which data is stored need to rotate to the correct position
before data can be read or written.
rotational delay: time to wait/take for sector to rotate under the disk
head
i.e., wait for the beginning of the block
how long it takes to get to the correct sector

25 / 62
Average Rotational Delay

On the average, the desired sector will be about half way around the
circle when the heads arrive at its cylinder.
Average rotational delay is time for 12 revolution
Example: Given a total revolution of 7200 RPM
One rotation = 60s×1000
7200
= 8.33 ms
Average rotational latency = 4.16 ms

26 / 62
Accessing the Disk

access time = seek time + rotational delay + transfer time


Transfer time is the time it takes to read or write the actual data from/to
the disk once the correct track is positioned and the data is under the
read/write head.
i.e., Transfer time is the time it takes the sectors of the block and any gaps
between them to rotate past the head.
It’s influenced by the disk’s data transfer rate and the size of the data
being read or written.

Data transfer rate: the rate at which data can be retrieved from or stored
to the disk.
Transfer rate: # bits transferred/sec
We can calculate the transfer time by dividing the size of a byte sector by
the transfer rate.
Block size
Given a transfer rate, the transfer time = transf er rate

27 / 62
Steps to access data on a disk

1. Move the disk heads to the desired cylinder


Time to seek a cylinder = seek time

28 / 62
Steps to access data on a disk

2. Wait for the desired sector to arrive under the disk head
Time to wait for a sector = rotational delay

29 / 62
Steps to access data on a disk

3. Transfer the data from sector to main memory (through the disk
controller)

30 / 62
Accessing the Disk

Seek time and rotational delay dominate.


Key to lower I/O cost: reduce seek/rotation delays!

31 / 62
Arranging Blocks on Disk

So far: One (Random) Block Access


What about: Reading “Next” block?
Blocks in a file should be arranged sequentially on disk (by “next”) to
minimize seek and rotational delays.
Next block concept:
blocks on same track, followed by
blocks on same cylinder, followed by
blocks on adjacent cylinder
For a sequential scan, pre-fetching several blocks at a time is a big win.

32 / 62
If we do things right

(e.g., Double Buffer, Stagger Blocks...)


Time to get blocks should be proportional to the size of blocks, and the
seek time and rotational latency thus become trivial
Block size
time to get block = transf er rate + N egligible
Negligible:
skip gap
switch track
once in a while, next cylinder

33 / 62
Rule of Thumb

Sequential access pattern


Successive requests are for successive disk blocks
Disk seek required only for first block
Random access pattern
Successive requests are for blocks that can be anywhere on disk
Each access requires a seek
Transfer rates are low since a lot of time is wasted in seeks

Random I/O: Expensive


Sequential I/O: Much less

34 / 62
Cost for Writing similar to Reading

The process of writing a block is, in its simplest form, quite similar to
reading a block
. . . unless we want to verify!
Block size
need to add (full) rotation + transf er rate

35 / 62
To Modify a Block?

It is not possible to modify a block on disk directly. Rather, even if we wish to


modify only a few bytes, we must do the following:
1 Read Block into Memory
2 Modify in Memory
3 Write Block
4 [Verify?]

36 / 62
Megatron 747 Disk (old)

Example
Rotate at 3600 RPM
Only 1 surface
16 MB usable capacity (usable capacity excludes the gaps)
128 cylinders
seek time:
average = 25 ms.
adjacent cylinders = 5 ms.
1 KB block = 1 sector
10% overhead between blocks
gaps represent 10% of the circle and
sectors represent the remaining 90%

37 / 62
Megatron 747 Disk (old)

1 KB blocks = sectors
10% overhead between blocks
capacity = 16 MB = (220 )16 = 224
# cylinders = 128 = 27
total capacity 220 ×16 24
bytes/cylinder = total # cylinders = 128 = 227 = 217 = 128KB
#blocks/cylinder = capacity of each cylinder
size of block = 128KB
1KB = 128

38 / 62
Megatron 747 Disk (old)

3600 RPM → 60 revolutions/sec→1 rev. = 16.66 msec.

Time over useful data = 16.66 × 0.9 = 14.99 ms


Time over gaps=16.66 × 0.1 = 1.66 ms
Transfer time1 for 1 block = 14.99
128 = 0.117ms
Transfer time for 1 block+gap= 16.66
128 = 0.13ms

1 Transfer time is the time it takes the sectors of the block and any gaps between them to rotate past the head.
Divide the amount of data by the transfer speed to find the transfer time.
39 / 62
Megatron 747 Disk (old)

Access time (T1 ) = Time to read one random block


T1 = seek + rotational delay + transfer time for 1 block
16.66
T1 = 25 + 2 + 0.117 = 33.45 ms.
Why we did not use the time it takes to transfer 1 block+gap here?

40 / 62
Megatron 747 Disk (old)

Suppose OS deals with 4 KB blocks

16.66
Access time = T4 = 25 + 2 + 0.117 × 1 + 0.13 × 3 = 33.83 ms
Compare to T1 = 33.45ms
Q) The time to read a full track is?

41 / 62
Today

Hardware: Disks
Access Times
Optimizations
Other Topics:
Storage costs
Using secondary storage
Disk failures

42 / 62
Optimizations (in controller or O.S.)

Effective ways to speed up disk accesses:


Disk Scheduling Algorithms
e.g., elevator algorithm
Track (or larger) Buffer
Pre-fetch (a.k.a. Double buffering)
Disk Arrays
On Disk Cache

43 / 62
Disk Scheduling

The disk controller can order the requests to minimize seeks


Situation: Have many read/write requests at any one moment in time
Question: Service policy: In which order the disk controller process
(service) the requests?
The order in which you service the disk operations can affect the
performance
Naı̈ve service (but fair): First Come First Serve
Fairness but inefficient (e.g. zig-zag read pattern)
Commonly used disk scheduling algorithm: the “elevator” algorithm
Elevator scheduling for a disk:
The disk head sweeps in-and-out (like an elevator)
When the disk head is on a cylinder k:
- Disk will service all requests for that cylinder before moving to the next
cylinder
Efficient but unfair

44 / 62
Pre-fetching (Double Buffer)

Another suggestion for speeding up some secondary-memory algorithms is


called double buffering.
In some scenarios, we can predict the order in which blocks will be
requested from disk by some process.
Pre-fetching (double buffering) is the method of fetching the necessary
blocks into the buffer in advance
Requires enough buffer space
Speedup factor up to n, where n is the number of blocks requested by a
process

45 / 62
Double Buffering Algorithm

Problem
Have a File
Sequence of Blocks B1, B2, ...
Have a Program
Process B1
Process B2
Process B3
..
.

46 / 62
Single Buffer Solution (Naı̈ve Solution)

1 Read B1 → Buffer
2 Process Data in Buffer
3 Read B2 → Buffer
4 Process Data in Buffer
..
5 .

47 / 62
Single Buffer Solution

Let:
P = time to process/block
R = time to read in 1 block
n = # blocks
1. Read B1 → Buffer ⇒ R
2. Process Data in Buffer ⇒ P
3. Read B2 → Buffer ⇒ R
4. Process Data in Buffer ⇒ P
Time to process n block =n(P+R)

48 / 62
Double Buffering Solution

The program allocates two buffers to process data from a file


Data is read in a buffer
When buffer is full, program processes the data. And at the same time,
more data is read in the other buffer
Rotate buffers when done processing data in buffer

49 / 62
Double Buffering Solution

50 / 62
Double Buffering Solution

51 / 62
Double Buffering Solution

52 / 62
Double Buffer Solution

Let:
P = time to process/block
R = time to read in 1 block
n = # blocks
Say P ≥ R
What is processing time?
Double buffering time = R+nP
Single buffer time = n(R+P)

53 / 62
Using disk array to accelerate disk access

Speed of access and reliability of disks can be increased by simply using


multiple disks.
Reliability is the ability of the disk system to accommodate a single- or
multi-disk failure and still remain available to the users.
Why use multiple disks
Multiple disks → multiple disk heads
Multiple outputs = Increased data rate

54 / 62
Techniques: multiple disks

Block Striping
Store blocks of a file over multiple disks
High read and write speed
Mirror disk
Store the same data on multiple disks
Mirrored disks contain identical content
Read operation: n times as fast
Write operation: about the same as 1 disk
RAID
Redundant arrays of inexpensive disks
Is a simple theory of using multiple disks to increase both speed of access
and reliability of disks.
RAID can be implemented using a hardware controller or a software
controller.
Different levels provide different solutions at different price points.

55 / 62
Disk Failures

We consider ways in which disks can fail and what can be done to mitigate
these failures:
The disk is OK. But: due to electrical fluctuations, a disk read (or disk
write) operation failed (a one time event)
Intermittent read failure (Cause: power fluctuations/failure)
Intermittent write failure (Cause: power fluctuation/failure)
Media decay (Disk surface worn out)
A sector is worn
The sector is part of a block and it can no longer be used
Permanent failure (Disk crash)
The disk head has scratched the platter(s)
Data on the whole disk is lost

56 / 62
Coping with Read/Write Failures

Detection
Read (verify) after writing data
Better: Use checksum
Correction
Redundancy

57 / 62
Coping with media decay

Handling media decay: Replacing bad sectors/blocks


Disk has a number of spare blocks
When writing a block fails for n times
Mark block as bad
Replace block with one of the spare blocks
Effect of bad sectors/blocks: The disk capacity is reduced

58 / 62
Coping with Disk Crash

Only way to recover from a disk crash: Redundancy (e.g., backup copy)
Different ways to achieve redundancy
Exact copy (mirror)
RAID

59 / 62
Summary

Secondary storage, mainly disks


I/O times
I/Os should be avoided, especially random ones

60 / 62
Reading

Chapter 2: data storage in Assignments & Projects/reading folder,


except Sections: 2.3.3, 2.3.4, 2.3.5, 2.4.4, 2.5.4, 2.6

61 / 62
Next

File and System Structure (Database Storage)

62 / 62

You might also like