0% found this document useful (0 votes)
25 views

Lec 2

The document discusses different types of physical storage media used for storing and retrieving data including cache, main memory, flash memory, magnetic disk storage, optical storage, and tape storage. It provides details on the characteristics of each type in terms of accessing speed, cost per unit of data, reliability, volatility, and suitability for different data storage purposes such as active databases versus backups and archives.

Uploaded by

Tanvir Arefin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Lec 2

The document discusses different types of physical storage media used for storing and retrieving data including cache, main memory, flash memory, magnetic disk storage, optical storage, and tape storage. It provides details on the characteristics of each type in terms of accessing speed, cost per unit of data, reliability, volatility, and suitability for different data storage purposes such as active databases versus backups and archives.

Uploaded by

Tanvir Arefin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Chapter 7

Relational database design (RDD) models information and data into a set of tables with rows
and columns. Each row of a relation/table represents a record, and each column represents
an attribute of data. The Structured Query Language (SQL) is used to manipulate relational
databases. The design of a relational database is composed of four stages, where the data
are modeled into a set of related tables. The stages are:

• Define attributes

• Define primary keys

• Define relationships

• Normalization
Decomposition- The process of decomposition in DBMS helps us remove redundancy,
inconsistencies and anomalies from a database when we divide the table into numerous
tables. Decomposition must always be lossless. This way, we can rest assured that the
data/information that was there in the original relation can be reconstructed accurately on
the basis of the decomposed relations. In case the relation is not decomposed properly, then
it may eventually lead to problems such as information loss.

Properties of Decomposition

Decomposition must have the following properties:

1. Decomposition Must be Lossless

2. Dependency Preservation

3. Lack of Data Redundancy

1. Decomposition Must be Lossless

Decomposition must always be lossless, which means the information must never get lost
from a decomposed relation. This way, we get a guarantee that when joining the relations,
the join would eventually lead to the same relation in the result as it was actually
decomposed.
2. Dependency Preservation

Dependency is a crucial constraint on a database, and a minimum of one decomposed table


must satisfy every dependency. If {P → Q} holds, then two sets happen to be dependent
functionally. Thus, it becomes more useful when checking the dependency if both of these
are set in the very same relation. This property of decomposition can be done only when we
maintain the functional dependency. Added to this, this property allows us to check various
updates without having to compute the database structure’s natural join.

3. Lack of Data Redundancy

It is also commonly termed as a repetition of data. According to this property, decomposition


must not suffer from data redundancy. When decomposition is careless, it may cause issues
with the overall data in the database. When we perform normalization, we can easily achieve
the property of lack of data redundancy

Decomposition is of two major types in DBMS:

1. Lossless Decomposition

Let R be a relation schema and let 𝑅𝑅1 and 𝑅𝑅2 form a decomposition of R—that is, viewing
R, 𝑅𝑅1 , and 𝑅𝑅2 as sets of attributes, R = 𝑅𝑅1 ∪ 𝑅𝑅2 . We say that the decomposition is a
lossless decomposition if there is no loss of information by replacing R with two relations
schemas 𝑅𝑅1 and 𝑅𝑅2 . Loss of information occurs if it is possible to have an instance of
a relation r(R) that includes information that cannot be represented if instead of the
instance of r(R) we must use instances of 𝑟𝑟1 (𝑅𝑅1 ) and 𝑟𝑟2 (𝑅𝑅2 ). More precisely, we say
the decomposition is lossless if, for all legal (we shall formally define “legal” in Section
7.2.2.) database instances, relation r contains the same set of tuples as the result of the
following relational algebra:

A decomposition is said to be lossless when it is feasible to reconstruct the original relation R


using joins from the decomposed tables. It is the most preferred choice. This way, the
information will not be lost from the relation when we decompose it. A lossless join would
eventually result in the original relation that is very similar.
For example, let us take ‘A’ as the Relational Schema, having an instance of ‘a’. Consider that
it is decomposed into: A1, A2, A3, . . .. An; with instance: a1, a2, a3, . . .. an, If a1 ⋈ a2 ⋈ a3 . . . . ⋈
an, then it is known as ‘Lossless Join Decomposition’.

2. Lossy Decomposition

Let us consider a relation X. Let us now consider that it gets decomposed into n number of
sub relations, X1, X2, X3, …, Xn. If we naturally join these sub relations, then we will either obtain
the exact previous relation X or we will lose information in this process. In case we do not get
the same relation X (that was decomposed) after joining X1 and X2, it is known as a lossy
decomposition in DBMS.

The natural joining of these sub relations always has some extraneous tuples. In the case of a
lossless decomposition, we can see that:

X1 ⋈ X2 ⋈ X3 ……. ⋈ Xn ⊃ X

Here, the operator ⋈ acts as a natural join operator.

Chapter 12
Physical Storage Media is used for storing (writing /recording/saving) and retrieving
(reading/opening) data. A storage medium is the physical material on which data is stored.
A storage device is the computer hardware that records and retrieves data from a storage
medium.

Physical storage media can be classified into different types –

• Accessing Speed
• Cost Per unit of data
• Reliability of the storage device

Cache: most costly and fastest form of storage. Usually very small, and managed by the
operating system.

Main Memory (MM): the storage area for data available to be operated on. Main memory
may contain tens of gigabytes of data on a personal computer, and even hundreds to
thousands of gigabytes of data in large server systems. It is generally too small (or too
expensive) for storing the entire database for very large databases, but many enterprise
databases can fit in main memory. However
• General-purpose machine instructions operate on main memory.
• Contents of main memory are usually lost in a power failure or ``crash''.
• Usually too small (even with megabytes) and too expensive to store the entire
database.

Flash memory: EEPROM (electrically erasable programmable read-only memory). Flash


memory differs from main memory in that stored data are retained even if power is turned off
(or fails)—that is, it is non-volatile. Flash memory has a lower cost per byte than main
memory, but a higher cost per byte than magnetic disks. Flash memory is widely used for
data storage in devices such as cameras and cell phones. Flash memory is also used for
storing data in “USB flash drives,” also known as “pen drives,” which can be plugged into the
Universal Serial Bus (USB) slots of computing devices. Flash memory is also increasingly used
as a replacement for magnetic disk in personal computers as well as in servers. A solid-state
drive (SSD) uses flash memory internally to store data but provides an interface similar to a
magnetic disk, allowing data to be stored or retrieved in units of a block; such an interface is
called a block-oriented interface. Block sizes typically range from 512 bytes to 8-kilobytes.

• Data in flash memory survive from power failure.


• Reading data from flash memory takes about 10 nano-secs (roughly as fast as from
main memory), and writing data into flash memory is more complicated: write-once
takes about 4-10 microsecs.
• To overwrite what has been written, one has to first erase the entire bank of the
memory. It may support only a limited number of erase cycles 104 𝑡𝑡𝑡𝑡 108 ).
• It has found its popularity as a replacement for disks for storing small volumes of data
(5-10 megabytes).

Magnetic-disk storage: primary medium for long-term storage. The primary medium for the
long-term online storage of data is the magnetic disk drive, which is also referred to as the
hard disk drive (HDD). Magnetic disk, like flash memory, is non-volatile: that is, magnetic disk
storage survives power failures and system crashes. Disks may sometimes fail and destroy
data, but such failures are quite rare compared to system crashes or power failures. To
access data stored on magnetic disk, the system must first move the data from disk to main
memory, from where they can be accessed. After the system has performed the designated
operations, the data that have been modified must be written to disk.

• Typically, the entire database is stored on disk.


• Data must be moved from disk to main memory in order for the data to be operated
on.
• After operations are performed, data must be copied back to disk if any changes were
made.
• Disk storage is called direct access storage as it is possible to read data on the disk in
any order (unlike sequential access).
• Disk storage usually survives power failures and system crashes.

Optical storage: CD-ROM (compact-disk read-only memory), WORM (write-once read-


many) disk (for archival storage of data), and Juke box (containing a few drives and
numerous disks loaded on demand). The digital video disk (DVD) is an optical storage
medium, with data written and read back using a laser light source. The Blu-ray DVD format
has a capacity of 27 gigabytes to 128 gigabytes, depending on the number of layers
supported. Although the original (and still main) use of DVDs was to store video data, they are
capable of storing any type of digital data, including backups of database contents. DVDs
are not suitable for storing active database data since the time required to access a given
piece of data can be quite long compared to the time taken by a magnetic disk.

Tape Storage: used primarily for backup and archival data. Tape storage is used primarily for
backup and archival data. Archival data refers to data that must be stored safely for a long
period of time, often for legal reasons. Magnetic tape is cheaper than disks and can safely
store data for many years. However, access to data is much slower because the tape must
be accessed sequentially from the beginning of the tape; tapes can be very long, requiring
tens to hundreds of seconds to access data. For this reason, tape storage is referred to as
sequential-access storage. In contrast, magnetic disk and SSD storage are referred to as
direct-access storage because it is possible to read data from any location on disk. Tapes
have a high capacity (1 to 12 terabyte capacities are currently available), and can be
removed from the tape drive. Tape drives tend to be expensive, but individual tapes are
usually significantly cheaper than magnetic disks of the same capacity.

• Cheaper, but much slower access, since tape must be read sequentially from the
beginning.
• Used as protection from disk failures!
In the hierarchy, the storage systems from main memory up are volatile, whereas the storage
systems from flash memory down are non-volatile. Data must be written to non-volatile
storage for safekeeping.
RAID (redundant array of independent disks) is a way of storing the same data in different
places on multiple hard disks or solid-state drives (SSDs) to protect data in the case of a
drive failure.
Physical Characteristics of Disks A disk drive consists of-

Disk Structure:

• A disk drive consists of multiple disk platters, each having a flat, circular shape.
• Each platter's two surfaces are covered with a magnetic material where information is
recorded.
• Platters are made from rigid metal or glass materials.
• A drive motor spins the disk at a constant high speed, typically ranging from 5400 to
10,000 revolutions per minute (RPM), depending on the model.
Data Organization:

• The disk surface is logically divided into tracks, which are further subdivided into
sectors.
• Sectors are the smallest units of information that can be read from or written to the
disk, typically having a size of 512 bytes.
• Modern disks can have between 2 billion and 24 billion sectors.
• The inner tracks (closer to the spindle) are shorter in length compared to the outer
tracks, and outer tracks contain more sectors.

Head-Disk Assembly:

• Each side of a platter has a read-write head that moves across the platter to access
different tracks.
• Multiple platters are typically mounted on a spindle, and the read-write heads are
mounted on a single assembly called a disk arm.
• This combination of platters and heads is collectively known as the head-disk
assembly.
• All heads move together, and when the head on one platter is on a specific track, all
other heads are on the corresponding track of their respective platters.

Data Integrity and Reliability:

• The read-write heads are kept very close to the disk surface to increase recording
density.
• To maintain this close proximity, the head "floats" just microns above the disk surface
due to the breeze created by the spinning disk.
• Careful machining of platters is required to ensure they are flat, as head crashes,
where the head contacts the disk surface, can result in data loss.
• In older-generation disks, head crashes could lead to the failure of the entire disk,
while current-generation disks are less susceptible to complete failure but can still
experience sector-level failures.

Data Storage and Access:

• Data is stored on a sector magnetically as reversals of the direction of magnetization


of the magnetic material.
• Disk assemblies contain the physical data storage, and head assemblies contain the
heads for accessing data.
• Tracks are grouped into cylinders, which are collections of tracks equidistant from the
center on all surfaces.
• A disk block or page, which is the smallest logical unit of data storage, can be equal in
size to a disk sector or a small number of sectors (e.g., 2, 3, 4).

Disk Latency:

• Disk latency refers to the time between issuing a disk access command and delivering
the data from the disk to main memory.
• Disk latency comprises several components, including disk controller processing time
(typically a fraction of a millisecond).
• Disk controllers’ interface between the computer system and the hardware of the disk
drive, accept high-level commands, and perform actions like moving the disk arm,
reading or writing data, and handling checksums.
• Seek time is the time it takes to move the disk head to the correct cylinder (usually 10-
40 milliseconds).
• Rotational latency is the time it takes for the disk to complete a full rotation
(approximately 10 milliseconds).
• Transfer time represents the time needed to read all the sectors of a block
(approximately 10 megabytes per second).
Factors to measure Performance of Disks-

The performance of disks, whether they are hard disk drives (HDDs) or solid-state drives
(SSDs), is measured using several key factors and metrics. These factors help assess how well
a disk can read and write data and how it performs in various scenarios. The primary factors
that measure the performance of disks include:

1. Throughput:

- Throughput measures the rate at which data can be read from or written to the disk. It is
typically expressed in megabytes per second (MB/s) or gigabytes per second (GB/s).

- Higher throughput indicates faster data transfer capabilities and is crucial for tasks
involving large data files or streaming.

2. IOPS (Input/Output Operations Per Second):

- IOPS measures the number of input/output operations that a disk can perform in one
second. It's a critical metric for assessing a disk's ability to handle random read and write
operations.

- IOPS is particularly important in database workloads and environments with multiple users
or concurrent tasks.

3. Latency:
- Latency represents the time it takes for a disk to respond to an I/O request. Lower latency
indicates faster response times.

- Low-latency disks are essential for applications that require rapid data access, such as
real-time data processing and high-performance databases.

4. Seek Time (HDDs):

- Seek time measures the time it takes for the disk's read/write head to move to the correct
track or location to access data. Lower seek times are better for reducing access times,
especially in HDDs.

- Seek time is particularly relevant in HDDs with rotating platters.

Typical seek times range from 2 to 20 milliseconds depending on how far the track is from the
initial arm position. Smaller disks tend to have lower seek times since the head has to travel a
smaller distance. The average seek time is the average of the seek times, measured over a
sequence of (uniformly distributed) random requests. If all tracks have the same number of
sectors, and we disregard the time required for the head to start moving and to stop moving,
we can show that the average seek time is one-third the worst-case seek time. Taking these
factors into account, the average seek time is around one-half of the maximum seek time.
Average seek times currently range between 4 and 10 milliseconds, depending on the disk
model.

5. Access Time (HDDs):

- Access time combines seek time and rotational delay (the time it takes for the desired
data to rotate under the read/write head). It is another metric for evaluating the time it takes
to retrieve data from different locations on the disk.

- Access time is more relevant for HDDs than SSDs.

6. Queue Depth:

- Queue depth refers to the number of I/O requests that can be queued for execution on the
disk simultaneously. A higher queue depth can help improve disk performance, especially in
multi-threaded or multi-user environments.

7. Burst Transfer Rate:

- Burst transfer rate measures the highest data transfer rate that a disk can achieve when
data is read or written sequentially. It is useful for tasks involving large sequential data
transfers, such as media streaming.
8. Sequential vs. Random Performance:

- Disk performance can differ between sequential I/O operations (e.g., reading large files)
and random I/O operations (e.g., accessing small, scattered pieces of data). Understanding
the performance characteristics for both types of operations is important.

9. Cache Performance:

- Disk caches, including read and write caches, can significantly impact performance.
Monitoring cache hit rates and assessing how effectively the cache is used is crucial in
understanding disk performance.

10. Reliability and Durability:

- While not direct performance metrics, the reliability and durability of a disk are essential
for overall performance and availability. Disk failure rates and mean time between failures
(MITF) are important factors to consider. The mean time to failure of a disk (or of any other
system) is the amount of time that, on average, we can expect the system to run
continuously without any failure. According to vendors’ claims, the mean time to failure of
disks today ranges from 500,000 to 1,200,000 hours—about 57 to 136 years. In practice the
claimed mean time to failure is computed on the probability of failure when the disk is new—
the figure means that given 1000 relatively new disks, if the MTTF is 1,200,000 hours, on an
average one of them will fail in 1200 hours. A mean time to failure of 1,200,000 hours does not
imply that the disk can be expected to function for 136 years! Most disks have an expected life
span of about 5 years and have significantly higher rates of failure once they become more
than a few years old.

11. Cost-Effectiveness:

- The cost of the disk and its performance relative to the cost are also important
considerations. Selecting a storage solution should balance performance and budget
constraints.

Measuring and understanding these performance factors is critical when choosing disks for
specific use cases, such as database servers, file servers, workstations, and more. Different
applications and workloads may prioritize different performance characteristics, so selecting
the right disk is essential for optimal system performance.
Chapter 14
Indexing is a technique for improving database performance by reducing the number of disk
accesses necessary when a query is run. An index is a form of data structure. It’s used to
swiftly identify and access data and information present in a database table.

Data reference contains a set of pointers holding the address of the disk block where the
value of the particular key can be found.

An attribute or set of attributes used to look up records in a file is called a search key.

There are two basic kinds of indices:

• Ordered indices. Based on a sorted ordering of the values.

• Hash indices. Based on a uniform distribution of values across a range of buckets. The
bucket to which a value is assigned is determined by a function, called a hash function.
We shall consider several techniques for ordered indexing. No one technique is the best.
Rather, each technique is best suited to particular database applications. Each technique
must be evaluated on the basis of these factors:

• Access types: The types of access that are supported efficiently. Access types can

include finding records with a specified attribute value and finding records whose

attribute values fall in a specified range.

• Access time: The time it takes to find a particular data item, or set of items, using

the technique in question.

• Insertion time: The time it takes to insert a new data item. This value includes the

time it takes to find the correct place to insert the new data item, as well as the

time it takes to update the index structure.

• Deletion time: The time it takes to delete a data item. This value includes the time
it takes to find the item to be deleted, as well as the time it takes to update the

index structure.

• Space overhead: The additional space occupied by an index structure. Provided that

the amount of additional space is moderate, it is usually worthwhile to sacrifice

the space to achieve improved performance.

Ordered indices

The indices are usually sorted to make searching faster. The indices which are sorted are
known as ordered indices.

Example: Suppose we have an employee table with thousands of records and each of which
is 10 bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student with ID-
543.

• In the case of a database with no index, we have to search the disk block from starting
till it reaches 543. The DBMS will read the record after reading 543*10=5430 bytes.
• In the case of an index, we will search using indexes and the DBMS will read the record
after reading 542*2= 1084 bytes which are very less compared to the previous case.

There are two types of ordered indices that we can use:

Dense index- In a dense index, an index entry appears for every search-key value in the file. In
a dense clustering index, the index record contains the search-key value and a pointer to the
first data record with that search-key value. The rest of the records with the same search-key
value would be stored sequentially after the first record, since, because the index is a clustering
one, records are sorted on the same search key. In a dense non clustering index, the index must
store a list of pointers to all records with the same search-key value.

o The dense index contains an index record for every search key value in the data file. It
makes searching faster.

o In this, the number of records in the index table is same as the number of records in the
main table.

o It needs more space to store index record itself. The index records have the search key
and a pointer to the actual record on the disk.
Sparse index- In a sparse index, an index entry appears for only some of the search key values.
Sparse indices can be used only if the relation is stored in sorted order of the search key; that
is, if the index is a clustering index. As is true in dense indices, each index entry contains a
search-key value and a pointer to the first data record with that search-key value. To locate a
record, we find the index entry with the largest search-key value that is less than or equal to
the search-key value for which we are looking. We start at the record pointed to by that index
entry and follow the pointers in the file until we find the desired record.

o In the data file, index record appears only for a few items. Each item points to a block.

o In this, instead of pointing to each record in the main table, the index points to the
records in the main table in a gap.
Clustering Index

A clustered index can be defined as an ordered data file. Sometimes the index is created on
non-primary key columns which may not be unique for each record.

In this case, to identify the record faster, we will group two or more columns to get the unique
value and create index out of them. This method is called a clustering index.

The records which have similar characteristics are grouped, and indexes are created for
these group. Example: suppose a company contains several employees in each department.
Suppose we use a clustering index, where all employees which belong to the same Dept_ID
are considered within a single cluster, and index pointers point to the cluster as a whole. Here
Dept_Id is a non-unique key.

Primary Index

o If the index is created on the basis of the primary key of the table, then it is known as
primary indexing. These primary keys are unique to each record and contain 1:1 relation
between the records.
o As primary keys are stored in sorted order, the performance of the searching operation
is quite efficient.

o The primary index can be classified into two types: Dense index and Sparse index.

Secondary Index

In the sparse indexing, as the size of the table grows, the size of mapping also grows. These
mappings are usually kept in the primary memory so that address fetch should be faster. Then
the secondary memory searches the actual data based on the address got from mapping. If
the mapping size grows then fetching the address itself becomes slower. In this case, the
sparse index will not be efficient. To overcome this problem, secondary indexing is introduced.

In secondary indexing, to reduce the size of mapping, another level of indexing is introduced.
In this method, the huge range for the columns is selected initially so that the mapping size of
the first level becomes small. Then each range is further divided into smaller ranges. The
mapping of the first level is stored in the primary memory, so that address fetch is faster. The
mapping of the second level and actual data are stored in the secondary memory (hard disk).
For example:

o If you want to find the record of roll 111 in the diagram, then it will search the highest entry
which is smaller than or equal to 111 in the first level index. It will get 100 at this level.

o Then in the second index level, again it does max (111) <= 111 and gets 110. Now using the
address 110, it goes to the data block and starts searching each record till it gets 111.

o This is how a search is performed in this method. Inserting, updating or deleting is also
done in the same manner.

Multilevel indexes refer to a hierarchical structure of indexes. Here, each level of the index
provides a more detailed reference to the data. It allows faster data retrieval, reduces disk
access, and improves query performance. Multilevel indexes are essential in large databases
where traditional indexes are not efficient.
A multi-level index can be created for any first-level index (primary, secondary, or clustering)
that has more than one disk block. It's like a search tree, but adding or removing new index
entries is challenging because every level of the index is an ordered file.

To solve this problem, most multi-level indexes use B-tree or B+ tree data structures. These
structures leave some space in each tree node (disk block) to allow for new index entries. A
multi-level index treats the index file like an ordered file with a unique entry for each K(i).

Chapter 17
A transaction is a unit of program execution that accesses and possibly updates various
data items. Usually, a transaction is initiated by a user program written in a high-level data-
manipulation language (typically SQL), or programming language (e.g., C++ or Java), with
embedded database accesses in JDBC or ODBC. A transaction is delimited by statements (or
function calls) of the form begin transaction and end transaction. The transaction consists of
all operations executed between the begin transaction and end transaction.
read(X), which transfers the data item X from the database to a variable, also called X, in a
buffer in main memory belonging to the transaction that executed the read operation.

write(X), which transfers the value in the variable X in the main-memory buffer of the
transaction that executed the write to the data item X in the database.

E.g., Let Ti be a transaction that transfers $50 from account A to account B. This transaction
can be defined as:

ACID Properties

A transaction is a very small unit of a program and it may contain several low-level tasks. A
transaction in a database system must maintain Atomicity, Consistency, Isolation,
and Durability − commonly known as ACID properties − in order to ensure accuracy,
completeness, and data integrity.
• Atomicity − This property states that a transaction must be treated as an atomic unit,
that is, either all of its operations are executed or none. There must be no state in a
database where a transaction is left partially completed. States should be defined
either before the execution of the transaction or after the execution/abortion/failure of
the transaction.
Suppose that, just before the execution of transaction 𝑇𝑇𝑖𝑖 , the values of accounts A and
B are $1000 and $2000, respectively. Now suppose that, during the execution of
transaction 𝑇𝑇𝑖𝑖 , a failure occurs that prevents 𝑇𝑇𝑖𝑖 , from completing its execution
successfully. Further, suppose that the failure happened after the write(A) operation
but before the write(B) operation. In this case, the values of accounts A and B
reflected in the database are $950 and $2000. The system destroyed $50 as a result
of this failure. In particular, we note that the sum A + B is no longer preserved. Thus,
because of the failure, the state of the system no longer reflects a real state of the
world that the database is supposed to capture. We term such a state an inconsistent
state.
• Consistency − The database must remain in a consistent state after any transaction.
No transaction should have any adverse effect on the data residing in the database. If
the database was in a consistent state before the execution of a transaction, it must
remain consistent after the execution of the transaction as well.
The consistency requirement here is that the sum of A and B be unchanged by the
execution of the transaction. Without the consistency requirement, money could be
created or destroyed by the transaction! It can be verified easily that, if the database
is consistent before an execution of the transaction, the database remains consistent
after the execution of the transaction. Ensuring consistency for an individual
transaction is the responsibility of the application programmer who codes the
transaction.
• Isolation − In a database system where more than one transaction is being executed
simultaneously and in parallel, the property of isolation states that all the transactions
will be carried out and executed as if it is the only transaction in the system. No
transaction will affect the existence of any other transaction.

Even if the consistency and atomicity properties are ensured for each transaction, if
several transactions are executed concurrently, their operations may interleave in
some undesirable way, resulting in an inconsistent state.
For example, as we saw earlier, the database is temporarily inconsistent while the
transaction to transfer funds from A to B is executing, with the deducted total written
to A and the increased total yet to be written to B. If a second concurrently running
transaction reads A and B at this intermediate point and computes A + B, it will
observe an inconsistent value. Furthermore, if this second transaction, then performs
updates on A and B based on the inconsistent values that it read, the database may
be left in an inconsistent state even after both transactions have completed.
• Durability − The database should be durable enough to hold all its latest updates
even if the system fails or restarts. If a transaction updates a chunk of data in a
database and commits, then the database will hold the modified data. If a
transaction commits but the system fails before the data could be written on to the
disk, then that data will be updated once the system springs back into action.

Once the execution of the transaction completes successfully, and the user who
initiated the transaction has been notified that the transfer of funds has taken place, it
must be the case that no system failure can result in a loss of data corresponding to
this transfer of funds. The durability property guarantees that, once a transaction
completes successfully, all the updates that it carried out on the database persist,
even if there is a system failure after the transaction completes execution. We assume
for now that a failure of the computer system may result in loss of data in main
memory, but data written to disk are never lost. Protection against loss of data on disk
is discussed in Chapter 19. We can guarantee durability by ensuring that either:
1. The updates carried out by the transaction have been written to disk before the
transaction completes.
2. Information about the updates carried out by the transaction is written to disk, and
such information is sufficient to enable the database to reconstruct the updates when
the database system is restarted after the failure.

Transaction Atomicity and Durability


A transaction may not always complete its execution successfully. Such a transaction is
termed aborted. If we are to ensure the atomicity property, an aborted transaction must have
no effect on the state of the database. Thus, any changes that the aborted transaction made
to the database must be undone. Once the changes caused by an aborted transaction have
been undone, we say that the transaction has been rolled back. It is part of the responsibility
of the recovery scheme to manage transaction aborts. This is done typically by maintaining a
log. Each database modification made by a transaction is first recorded in the log. We record
the identifier of the transaction performing the modification, the identifier of the data item
being modified, and both the old value (prior to modification) and the new value (after
modification) of the data item. Only then is the database itself modified. Maintaining a log
provides the possibility of redoing a modification to ensure atomicity and durability as well as
the possibility of undoing a modification to ensure atomicity in case of a failure during
transaction execution. A transaction that completes its execution successfully is said to be
committed. A committed transaction that has performed updates transforms the database
into a new consistent state, which must persist even if there is a system failure. Once a
transaction has committed, we cannot undo its effects by aborting it. The only way to undo
the effects of a committed transaction is to execute a compensating transaction. For
instance, if a transaction added $20 to an account, the compensating transaction would
subtract $20 from the account. However, it is not always possible to create such a
compensating transaction. Therefore, the responsibility of writing and executing a
compensating transaction is left to the user and is not handled by the database system.
We need to be more precise about what we mean by successful completion of a transaction.
We therefore establish a simple abstract transaction model. A transaction must be in one of
the following states:
Active, the initial state; the transaction stays in this state while it is executing.
• Partially committed, after the final statement has been executed.
• Failed, after the discovery that normal execution can no longer proceed.
• Aborted, after the transaction has been rolled back and the database has been
restored to its state prior to the start of the transaction.
• Committed, after successful completion.
The state diagram corresponding to a transaction appears in Figure 17.1. We say that a
transaction has committed only if it has entered the committed state. Similarly, we say that a
transaction has aborted only if it has entered the aborted state. A transaction is said to have
terminated if it has either committed or aborted. A transaction starts in the active state. When
it finishes its final statement, it enters the partially committed state. At this point, the
transaction has completed its execution, but it is still possible that it may have to be aborted,
since the actual output may still be temporarily residing in main memory, and thus a
hardware failure may preclude its successful completion. The database system then writes
out enough information to disk that, even in the event of a failure, the updates performed by
the transaction can be re-created when the system restarts after the failure. When the last of
this information is written out, the transaction enters the committed state. As mentioned
earlier, we assume for now that failures do not result in loss of data on disk. A transaction
enters the failed state after the system determines that the transaction can no longer
proceed with its normal execution (e.g., because of hardware or logical errors). Such a
transaction must be rolled back. Then, it enters the aborted state. At this point, the system
has two options:

• It can restart the transaction, but only if the transaction was aborted as a result of
some hardware or software error that was not created through the internal logic
of the transaction. A restarted transaction is considered to be a new transaction.
• It can kill the transaction. It usually does so because of some internal logical error
that can be corrected only by rewriting the application program, or because the
input was bad, or because the desired data were not found in the database.

We must be cautious when dealing with observable external writes, such as writes to a user’s
screen, or sending email. Once such a write has occurred, it cannot be erased, since it may
have been seen external to the database system. Most systems allow such writes to take
place only after the transaction has entered the committed state. One way to implement
such a scheme is for the database system to store any value associated with such external
writes temporarily in a special relation in the database, and to perform the actual writes only
after the transaction enters the committed state. If the system should fail after the
transaction has entered the committed state, but before it could complete the external
writes; the database system will carry out the external writes (using the data in non-volatile
storage) when the system is restarted. Handling external writes can be more complicated in
some situations. For example, suppose the external action is that of dispensing cash at an
automated teller machine, and the system fails just before the cash is actually dispensed (we
assume that cash can be dispensed atomically). It makes no sense to dispense cash when
the system is restarted, since the user may have left the machine. In such a case a
compensating transaction, such as depositing the cash back into the user’s account, needs
to be executed when the system is restarted. As another example, consider a user making a
booking over the web. It is possible that the database system or the application server
crashes just after the booking transaction commits. It is also possible that the network
connection to the user is lost just after the booking transaction commits. In either case, even
though the transaction has committed, the external write has not taken place. To handle
such situations, the application must be designed such that when the user connects to the
web application again, she will be able to see whether her transaction had succeeded or not.
For certain applications, it may be desirable to allow active transactions to display
data to users, particularly for long-duration transactions that run for minutes or hours.
Unfortunately, we cannot allow such output of observable data unless we are willing to
compromise transaction atomicity.

You might also like