0% found this document useful (0 votes)
3 views

Chap_11

Uploaded by

siri velpula
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chap_11

Uploaded by

siri velpula
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

CSCI 5333: DBMS

Chapter 12: Physical Storage Systems


Chapter 13: Data Storage Structures

Lecture Content
• Physical Storage Media Classification
• Storage Hierarchy
• RAID
• Improving Performance and Reliability
• RAID Levels
• Hot Swapping
• File Organization
• Organization of Records in Files
• Data-Dictionary Storage
1
Physical Storage Media Classification
 Speed with which data can be accessed
 Cost per unit of data
 Reliability
 data loss on power failure or system crash
 physical failure of the storage device
 Can differentiate storage into:
 volatile storage: loses contents when
power is switched off
 non-volatile storage:
 Contents persist even when power is
switched off.
 Includes secondary and tertiary storage, as
well as batter- backed up main-memory.
Physical Storage Media

 Cache – fastest and most costly form of storage; volatile;


managed by the computer system hardware.

 Main memory:
 fast access (10s to 100s of nanoseconds; 1 nanosecond =
10–9 seconds)
 generally too small (or too expensive) to store the entire
database
 capacities of up to a few Gigabytes widely used currently
 Capacities have gone up and per-byte costs have decreased
steadily and rapidly (roughly factor of 2 every 2 to 3 years)
 Volatile — contents of main memory are usually lost if a
power failure or system crash occurs.
Physical Storage Media (Cont.)
 Flash memory
 Data survives power failure
 Data can be written at a location only once, but location can be erased and
written to again
 Can support only a limited number (10K – 1M) of write/erase cycles.
 Erasing of memory has to be done to an entire bank of memory
 Reads are roughly as fast as main memory
 But writes are slow (few microseconds), erase is slower
 Widely used in embedded devices such as digital cameras, phones, and USB
keys
Physical Storage Media (Cont.)
 Magnetic-disk
 Data is stored on spinning disk, and read/written
magnetically
 Primary medium for the long-term storage of data;
typically stores entire database.
 Data must be moved from disk to main memory for
access, and written back for storage
 Much slower access than main memory (more on
this later)
 direct-access – possible to read data on disk in any
order, unlike magnetic tape
 Survives power failures and system crashes
 disk failure can destroy data, but is rare
Physical Storage Media (Cont.)
 Optical storage
 Non-volatile, data is read optically from a spinning disk
using a laser
 CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular
forms
 Blu-ray disks: 27 GB to 54 GB
 Write-one, read-many (WORM) optical disks used for
archival storage (CD-R, DVD-R, DVD+R)
 Multiple write versions also available (CD-RW, DVD-RW,
DVD+RW, and DVD-RAM)
 Reads and writes are slower than with magnetic disk
 Juke-box systems, with large numbers of removable
disks, a few drives, and a mechanism for automatic
loading/unloading of disks available for storing large
volumes of data
Physical Storage Media (Cont.)
 Tape storage
 Non-volatile, used primarily for backup (to
recover from disk failure), and for archival data
 sequential-access – much slower than disk
 very high capacity (40 to 300 GB tapes available)
 tape can be removed from drive  storage costs
much cheaper than disk, but drives are expensive
 Tape jukeboxes available for storing massive
amounts of data
 hundreds of terabytes (1 terabyte = 109 bytes) to
even multiple petabytes (1 petabyte = 1012 bytes)

IBM tape storage


Flash Storage
 We can have NOR flash or NAND flash
 NAND flash
 used widely for storage, since it is much cheaper than NOR
flash
 requires page-at-a-time read (page: 512 bytes to 4 KB)
 transfer rate around 20 MB/sec
 solid state disks: use multiple flash storage devices to
provide higher transfer rate of 100 to 200 MB/sec
 erase is very slow (1 to 2 milli secs)
 erase block contains multiple pages
 remapping of logical page addresses to physical page
addresses avoids waiting for erase
 translation table tracks mapping
 also stored in a label field of flash page
 remapping carried out by flash translation layer
 after 100,000 to 1,000,000 erases, erase block becomes
unreliable and cannot be used
 wear leveling
Storage Hierarchy
 primary storage: Fastest media but volatile
(cache, main memory).
 secondary storage: next level in hierarchy, non-
volatile, moderately fast access time
 also called on-line storage
 E.g. flash memory, magnetic disks
 tertiary storage: lowest level in hierarchy, non-
volatile, slow access time
 also called off-line storage
 E.g. magnetic tape, optical storage
RAID
Redundant Arrays of Independent Disks
 RAID: Redundant Arrays of Independent Disks
 disk organization techniques that manage a large numbers of disks, providing a view of a single disk of
 high capacity and speed by using multiple disks in parallel, and
 high reliability by storing data redundantly, so that data can be recovered even if a disk fails
 The chance that some disk out of a set of N disks will fail is much higher than the chance that a specific
single disk will fail.
 Mean time to failure (MTTF) – the average time the disk is expected to run continuously without any
failure.
 E.g., a system with 100 disks, each with MTTF of 100,000 hours (apox. 11 years), will have a system
MTTF of 1000 hours (appox. 41days)
 Originally a cost-effective alternative to large, expensive disks
 I in RAID originally stood for ``inexpensive’’
 Today RAIDs are used for their higher reliability and bandwidth.
 The “I” is interpreted as independent.

10
RAID
Improvement of Reliability via Redundancy
 Redundancy – store extra information that can be used to rebuild information lost in a disk failure
 E.g., Mirroring (or shadowing)
 Duplicate every disk. Logical disk consists of two physical disks.
 Every write is carried out on both disks
 Reads can take place from either disk

 If one disk in a pair fails, data still available in the other


 Data loss would occur only if a disk fails, and its mirror disk also fails before the system is repaired
 Probability of combined event is very small

 Except for dependent failure modes such as fire or building collapse or electrical power surges

 Mean time to data loss depends on mean time to failure,


and mean time to repair
 E.g. MTTF of 100,000 hours, mean time to repair of 10 hours gives mean time to data loss of 500*106
hours (or 57,000 years) for a mirrored pair of disks (ignoring dependent failure11modes)
RAID
Improvement in Performance via Parallelism
 Two main goals of parallelism in a disk system:
1. Load balance multiple small accesses to increase throughput
2. Parallelize large accesses to reduce response time.
 Improve transfer rate by striping data across multiple disks.
 Bit-level striping – split the bits of each byte across multiple disks
 In an array of eight disks, write bit i of each byte to disk i.
 Each access can read data at eight times the rate of a single disk.
 But seek/access time worse than for a single disk
 Bit level striping is not used much any more
 Block-level striping – with n disks, block i of a file goes to disk (i mod n) + 1
 Requests for different blocks can run in parallel if the blocks reside on different disks
 A request for a long sequence of blocks can utilize all disks in parallel
12
RAID Levels
 Schemes to provide redundancy at lower cost by using disk striping combined with
parity bits
 Different RAID organizations, or RAID levels, have differing cost, performance and
reliability characteristics

 RAID Level 0: Block striping; non-redundant.


 Used in high-performance applications where data lost is not critical.

 RAID Level 1: Mirrored disks with block striping


 Offers best write performance.
 Popular for applications such as storing log files in a database system.

13
RAID Levels (Cont.)
 RAID Level 2: Memory-Style Error-Correcting-Codes (ECC) with bit striping.
 RAID Level 3: Bit-Interleaved Parity
 a single parity bit is enough for error correction, not just detection, since we
know which disk has failed
 When writing data, corresponding parity bits must also be computed and written to
a parity bit disk
 To recover data in a damaged disk, compute XOR of bits from other disks (including
parity bit disk)

14
RAID Levels (Cont.)
 RAID Level 3 (Cont.)
 Faster data transfer than with a single disk, but fewer I/Os per second since every
disk has to participate in every I/O.
 Subsumes Level 2 (provides all its benefits, at lower cost).
 RAID Level 4: Block-Interleaved Parity; uses block-level striping, and keeps a parity
block on a separate disk for corresponding blocks from N other disks.
 When writing data block, corresponding block of parity bits must also be computed
and written to parity disk
 To find value of a damaged block, compute XOR of bits from corresponding blocks
(including parity block) from other disks.

15
RAID Levels (Cont.)
 RAID Level 5: Block-Interleaved Distributed Parity; partitions data and
parity among all N + 1 disks, rather than storing data in N disks and
parity in 1 disk.
 E.g.,with 5 disks, parity block for nth set of blocks is stored on disk
(n mod 5) + 1, with the data blocks stored on the other 4 disks.

16
RAID Levels (Cont.)
 RAID Level 5 (Cont.)
 Higher I/O rates than Level 4.
 Block writes occur in parallel if the blocks and their parity blocks are on different
disks.
 Subsumes Level 4: provides same benefits, but avoids bottleneck of parity disk.
 RAID Level 6: P+Q Redundancy scheme; similar to Level 5, but stores extra redundant
information to guard against multiple disk failures.
 Better reliability than Level 5 at a higher cost; not used as widely.

17
Choice of RAID Level
 Factors in choosing RAID level
 Monetary cost
 Performance: Number of I/O operations per second, and bandwidth during normal operation
 Performance during failure
 Performance during rebuild of failed disk
 Including time taken to rebuild failed disk

 RAID 0 is used only when data safety is not important


 E.g. data can be recovered quickly from other sources
 Level 2 and 4 never used since they are subsumed by 3 and 5
 Level 3 is not used anymore since bit-striping forces single block reads to access all disks, wasting disk
arm movement, which block striping (level 5) avoids
 Level 6 is rarely used since levels 1 and 5 offer adequate safety for almost all applications
 So competition is between 1 and 5 only
18
Choice of RAID Level (Cont.)
 Level 1 provides much better write performance than level 5
 Level 5 requires at least 2 block reads and 2 block writes to write a single block,
whereas Level 1 only requires 2 block writes
 Level 1 preferred for high update environments such as log disks
 Level 1 had higher storage cost than level 5
 disk drive capacities increasing rapidly (50%/year) whereas disk access times have
decreased much less (x 3 in 10 years)
 I/O requirements have increased greatly, e.g. for Web servers
 When enough disks have been bought to satisfy required rate of I/O, they often have
spare storage capacity
 so there is often no extra monetary cost for Level 1!
 Level 5 is preferred for applications with low update rate,
and large amounts of data
 Level 1 is preferred for all other applications
19
Hardware Issues
 Software RAID: RAID implementations done entirely in software, with no special
hardware support
 Hardware RAID: RAID implementations with special hardware
 Use non-volatile RAM to record writes that are being executed
 Beware: power failure during write can result in corrupted disk
 E.g. failure after writing one block but before writing the second in a mirrored system
 Such corrupted data must be detected when power is restored
 Recovery from corruption is similar to recovery from failed disk
 NV-RAM helps to efficiently detected potentially corrupted blocks
 Otherwise all blocks of disk must be read and compared with mirror/parity block

20
Hardware Issues (Cont.)
 Hot swapping: replacement of disk while system is running, without power down
 Supported by some hardware RAID systems,
 reduces time to recovery, and improves availability greatly
 Many systems maintain spare disks which are kept online, and used as replacements for
failed disks immediately on detection of failure
 Reduces time to recovery greatly
 Many hardware RAID systems ensure that a single point of failure will not stop the
functioning of the system by using
 Redundant power supplies with battery backup
 Multiple controllers and multiple interconnections to guard against
controller/interconnection failures

21
File Organization
 The database is stored as a collection of files. Each file is a sequence
of records. A record is a sequence of fields.
 One approach:
 assume record size is fixed
 each file has records of one particular type only
 different files are used for different relations
 Fixed length record is easiest to implement; We will consider variable
length records later.

22
Fixed-Length Records
 Addition of record:
 Store record i starting from byte n  (i – 1), where n is the size of each record.
 Record access is simple but records may cross blocks
 Modification: do not allow records to cross block boundaries
 Deletion of record I:
alternatives:
 move records i + 1, . . ., n
to i, . . . , n – 1
 move record n to i
 do not move records, but
link all free records on a
free list

23
Free Lists
 Store the address of the first deleted record in the file header.
 Use this first record to store the address of the second deleted record, and so on
 Can think of these stored addresses as pointers since they “point” to the location of a
record.
 More space efficient representation: reuse space for normal attributes of free
records to store pointers. (No pointers stored in in-use records.)

24
Variable-Length Records
 Variable-length records arise in database systems in several ways:
 Storage of multiple record types in a file.
 Record types that allow variable lengths for one or more fields.
 Record types that allow repeating fields (used in some older data models).
 Byte string representation
 Attach an end-of-record () control character to the end of each record
 Difficulty with deletion
 Difficulty with growth

25
Variable-Length Records - Slotted Page Structure
 Slotted page header contains:
 number of record entries
 end of free space in the block
 location and size of each record
 Records can be moved around within a page to keep them contiguous with no
empty space between them; entry in the header must be updated.
 Pointers should not point directly to record — instead they should point to
the entry for the record in header.

26
Variable-Length Records (Cont.)
 Fixed-length representation:
 reserved space
 pointers
 Reserved space – can use fixed-length records of a known maximum length;
unused space in shorter records filled with a null or end-of-record symbol.

27
Variable-Length Records
 Variable-length records arise in database systems in several ways:
 Storage of multiple record types in a file.
 Record types that allow variable lengths for one or more fields such as strings
(varchar)
 Record types that allow repeating fields (used in some older data models).
 Attributes are stored in order
 Variable length attributes represented by fixed size (offset, length), with actual data
stored after all fixed length attributes
 Null values represented by null-value bitmap
Variable-Length Records
 Pointer method
 A variable-length record is represented by a list of fixed-length records,
chained together via pointers.
 Can be used even if the maximum record length is not known
 Disadvantage to pointer structure; space is wasted in all records except the
first in a chain.

29
Variable-Length Records
 Solution is to allow two kinds of block in file:
 Anchor block – contains the first records of chain
 Overflow block – contains records other than those that are the first records of chairs.

30
Data Dictionary Storage
Data dictionary (also called system catalog) stores metadata:
that is, data about data, such as
 Information about relations
 names of relations
 names and types of attributes of each relation
 names and definitions of views
 integrity constraints
 User and accounting information, including passwords
 Statistical and descriptive data
 number of tuples in each relation
 Physical file organization information
 How relation is stored (sequential/hash/…)
 Physical location of relation
 operating system file name or
 disk addresses of blocks containing records of the relation
 Information about indices (Chapter 11) 31
Relational Representation of System Metadata

 Relational representation on
disk
 Specialized data structures
designed for efficient access,
in memory
Thank You
for your
Attention!!

33

You might also like