20CS402 - DATABASE MANAGEMENT SYSTEMS NOTES

KONGUNADU COLLEGE OF ENGINEERING AND TECHNOLOGY
(AUTONOMOUS)
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
20CS402 - DATABASE MANAGEMENT SYSTEMS
UNIT 1 – RELATIONAL DATABASES

Database Applications Examples
 Enterprise Information
• Sales: customers, products, purchases
• Accounting: payments, receipts, assets
• Human Resources: Information about employees, salaries, payroll
taxes.
 Manufacturing: management of production, inventory, orders, supply
chain.
 Banking and finance
• customer information, accounts, loans, and banking transactions.
• Credit card transactions
• Finance: sales and purchases of financial instruments (e.g., stocks
and bonds; storing real-time market data
 Universities: registration, grades

Database Applications Examples (Cont.)
 Airlines: reservations, schedules
 Telecommunication: records of calls, texts, and data usage, generating
monthly bills, maintaining balances on prepaid calling cards
 Web-based services
• Online retailers: order tracking, customized recommendations
• Online advertisements
 Document databases
 Navigation systems: For maintaining the locations of varies places of
interest along with the exact routes of roads, train systems, buses, etc.

Purpose of Database Systems
 Data redundancy and inconsistency: data is stored in multiple file
formats resulting induplication of information in different files
 Difficulty in accessing data
• Need to write a new program to carry out each new task
 Data isolation
• Multiple files and formats
 Integrity problems
• Integrity constraints (e.g., account balance > 0) become “buried”
in program code rather than being stated explicitly
• Hard to add new constraints or change existing ones
In the early days, database applications were built directly on top of file
systems, which leads to:

Purpose of Database Systems (Cont.)
 Atomicity of updates
• Failures may leave database in an inconsistent state with partial
updates carried out
• Example: Transfer of funds from one account to another should either
complete or not happen at all
 Concurrent access by multiple users
• Concurrent access needed for performance
• Uncontrolled concurrent accesses can lead to inconsistencies
 Ex: Two people reading a balance (say 100) and updating it by
withdrawing money (say 50 each) at the same time
 Security problems
• Hard to provide user access to some, but not all, data
Database systems offer solutions to all the above problems

Data Models
 A collection of tools for describing
• Data
• Data relationships
• Data semantics
• Data constraints
 Relational model
 Entity-Relationship data model (mainly for database design)
 Object-based data models (Object-oriented and Object-relational)
 Semi-structured data model (XML)
 Other older models:
• Network model
• Hierarchical model

Relational Model
 All the data is stored in various tables.
 Example of tabular data in the relational model
Columns
Rows
Ted Codd
Turing Award 1981

View of Data
An architecture for a database system

Instances and Schemas
 Similar to types and variables in programming languages
 Logical Schema – the overall logical structure of the database
• Example: The database consists of information about a set of
customers and accounts in a bank and the relationship between them
 Analogous to type information of a variable in a program
 Physical schema – the overall physical structure of the database
 Instance – the actual content of the database at a particular point in time
• Analogous to the value of a variable

Physical Data Independence
 Physical Data Independence – the ability to modify the physical
schema without changing the logical schema
• Applications depend on the logical schema
• In general, the interfaces between the various levels and
components should be well defined so that changes in some parts
do not seriously influence others.

Data Definition Language (DDL)
 Specification notation for defining the database schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
 DDL compiler generates a set of table templates stored in a data
dictionary
 Data dictionary contains metadata (i.e., data about data)
• Database schema
• Integrity constraints
 Primary key (ID uniquely identifies instructors)
• Authorization
 Who can access what

Data Manipulation Language (DML)
 Language for accessing and updating the data organized by the
appropriate data model
• DML also known as query language
 There are basically two types of data-manipulation language
• Procedural DML -- require a user to specify what data are needed
and how to get those data.
• Declarative DML -- require a user to specify what data are needed
without specifying how to get those data.
 Declarative DMLs are usually easier to learn and use than are procedural
DMLs.
 Declarative DMLs are also referred to as non-procedural DMLs
 The portion of a DML that involves information retrieval is called a query
language.

SQL Query Language
 SQL query language is nonprocedural. A query takes as input several
tables (possibly only one) and always returns a single table.
 Example to find all instructors in Comp. Sci. dept
select name
from instructor
where dept_name = 'Comp. Sci.'
 SQL is NOT a Turing machine equivalent language
 To be able to compute complex functions SQL is usually embedded in
some higher-level language
 Application programs generally access databases through one of
• Language extensions to allow embedded SQL
• Application program interface (e.g., ODBC/JDBC) which allow SQL
queries to be sent to a database

Database Access from Application Program
 Non-procedural query languages such as SQL are not as powerful as a
universal Turing machine.
 SQL does not support actions such as input from users, output to
displays, or communication over the network.
 Such computations and actions must be written in a host language, such
as C/C++, Java or Python, with embedded SQL queries that access the
data in the database.
 Application programs -- are programs that are used to interact with the
database in this fashion.

Database Design
 Logical Design – Deciding on the database schema. Database design
requires that we find a “good” collection of relation schemas.
• Business decision – What attributes should we record in the
database?
• Computer Science decision – What relation schemas should we
have and how should the attributes be distributed among the
various relation schemas?
 Physical Design – Deciding on the physical layout of the database
The process of designing the general structure of the database:

Database Engine
 A database system is partitioned into modules that deal with each of the
responsibilities of the overall system.
 The functional components of a database system can be divided into
• The storage manager,
• The query processor component,
• The transaction management component.

Storage Manager
 A program module that provides the interface between the low-level data
stored in the database and the application programs and queries
submitted to the system.
 The storage manager is responsible to the following tasks:
• Interaction with the OS file manager
• Efficient storing, retrieving and updating of data
 The storage manager components include:
• Authorization and integrity manager
• Transaction manager
• File manager
• Buffer manager

Storage Manager (Cont.)
 The storage manager implements several data structures as part of the
physical system implementation:
• Data files -- store the database itself
• Data dictionary -- stores metadata about the structure of the
database, in particular the schema of the database.
• Indices -- can provide fast access to data items. A database index
provides pointers to those data items that hold a particular value.

Query Processor
 The query processor components include:
• DDL interpreter -- interprets DDL statements and records the
definitions in the data dictionary.
• DML compiler -- translates DML statements in a query language into
an evaluation plan consisting of low-level instructions that the query
evaluation engine understands.
 The DML compiler performs query optimization; that is, it picks
the lowest cost evaluation plan from among the various
alternatives.
• Query evaluation engine -- executes low-level instructions generated
by the DML compiler.

Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation

Transaction Management
 A transaction is a collection of operations that performs a single logical
function in a database application
 Transaction-management component ensures that the database
remains in a consistent (correct) state despite system failures (e.g.,
power failures and operating system crashes) and transaction failures.
 Concurrency-control manager controls the interaction among the
concurrent transactions, to ensure the consistency of the database.

Database Architecture
 Centralized databases
• One to a few cores, shared memory
 Client-server,
• One server machine executes work on behalf of multiple client
machines.
 Parallel databases
• Many core shared memory
• Shared disk
• Shared nothing
 Distributed databases
• Geographical distribution
• Schema/data heterogeneity

Database Architecture
(Centralized/Shared-Memory)

Database Applications
 Two-tier architecture -- the application resides at the client machine,
where it invokes database system functionality at the server machine
 Three-tier architecture -- the client machine acts as a front end and
does not contain any direct database calls.
• The client end communicates with an application server, usually
through a forms interface.
• The application server in turn communicates with a database
system to access data.
Database applications are usually partitioned into two or three parts

Two-tier and three-tier architectures

Database Administrator
 Schema definition
 Storage structure and access-method definition
 Schema and physical-organization modification
 Granting of authorization for data access
 Routine maintenance
 Periodically backing up the database
 Ensuring that enough free disk space is available for normal
operations, and upgrading disk space as required
 Monitoring jobs running on the database
A person who has central control over the system is called a database
administrator (DBA). Functions of a DBA include:

History of Database Systems
 1950s and early 1960s:
• Data processing using magnetic tapes for storage
 Tapes provided only sequential access
• Punched cards for input
 Late 1960s and 1970s:
• Hard disks allowed direct access to data
• Network and hierarchical data models in widespread use
• Ted Codd defines the relational data model
 Would win the ACM Turing Award for this work
 IBM Research begins System R prototype
 UC Berkeley (Michael Stonebraker) begins Ingres prototype
 Oracle releases first commercial relational database
• High-performance (for the era) transaction processing

History of Database Systems (Cont.)
 1980s:
• Research relational prototypes evolve into commercial systems
 SQL becomes industrial standard
• Parallel and distributed database systems
 Wisconsin, IBM, Teradata
• Object-oriented database systems
 1990s:
• Large decision support and data-mining applications
• Large multi-terabyte data warehouses
• Emergence of Web commerce

History of Database Systems (Cont.)
 2000s
• Big data storage systems
 Google BigTable, Yahoo PNuts, Amazon,
 “NoSQL” systems.
• Big data analysis: beyond SQL
 Map reduce and friends
 2010s
• SQL reloaded
 SQL front end to Map Reduce systems
 Massively parallel database systems
 Multi-core main-memory databases

Outline
 Structure of Relational Databases
 Database Schema
 Keys
 Schema Diagrams
 Relational Query Languages
 The Relational Algebra

Example of a Instructor Relation
attributes
(or columns)
tuples
(or rows)

Relation Schema and Instance
 A1, A2, …, An are attributes
 R = (A1, A2, …, An ) is a relation schema
Example:
instructor = (ID, name, dept_name, salary)
 A relation instance r defined over schema R is denoted by r (R).
 The current values a relation are specified by a table
 An element t of relation r is called a tuple and is represented by
a row in a table

Attributes
 The set of allowed values for each attribute is called the domain of the
attribute
 Attribute values are (normally) required to be atomic; that is, indivisible
 The special value null is a member of every domain. Indicated that the
value is “unknown”
 The null value causes complications in the definition of many operations

Relations are Unordered
 Order of tuples is irrelevant (tuples may be stored in an arbitrary order)
 Example: instructor relation with unordered tuples

Database Schema
 Database schema -- is the logical structure of the database.
 Database instance -- is a snapshot of the data in the database at a given
instant in time.
 Example:
• schema: instructor (ID, name, dept_name, salary)
• Instance:

Keys
 Let K  R
 K is a superkey of R if values for K are sufficient to identify a unique tuple
of each possible relation r(R)
• Example: {ID} and {ID,name} are both superkeys of instructor.
 Superkey K is a candidate key if K is minimal
Example: {ID} is a candidate key for Instructor
 One of the candidate keys is selected to be the primary key.
• Which one?
 Foreign key constraint: Value in one relation must appear in another
• Referencing relation
• Referenced relation
• Example: dept_name in instructor is a foreign key from instructor
referencing department

Schema Diagram for University Database

Relational Query Languages
 Procedural versus non-procedural, or declarative
 “Pure” languages:
• Relational algebra
• Tuple relational calculus
• Domain relational calculus
 The above 3 pure languages are equivalent in computing power
 We will concentrate in this chapter on relational algebra
• Not Turing-machine equivalent
• Consists of 6 basic operations

Relational Algebra
 A procedural language consisting of a set of operations that take one or
two relations as input and produce a new relation as their result.
 Six basic operators
• select: 
• project: 
• union: 
• set difference: –
• Cartesian product: x
• rename: 

Select Operation
 The select operation selects tuples that satisfy a given predicate.
 Notation:  p (r)
 p is called the selection predicate
 Example: select those tuples of the instructor relation where the
instructor is in the “Physics” department.
• Query
 dept_name=“Physics” (instructor)
• Result

Select Operation (Cont.)
 We allow comparisons using
=, , >, . <. 
in the selection predicate.
 We can combine several predicates into a larger predicate by using the
connectives:
 (and),  (or),  (not)
 Example: Find the instructors in Physics with a salary greater $90,000, we
write:
 dept_name=“Physics”  salary > 90,000 (instructor)
 The select predicate may include comparisons between two attributes.
• Example, find all departments whose name is the same as their
building name:
•  dept_name=building (department)

Project Operation
 A unary operation that returns its argument relation, with certain attributes
left out.
 Notation:
 A1,A2,A3 ….Ak
(r)
where A1, A2, …, Ak are attribute names and r is a relation name.
 The result is defined as the relation of k columns obtained by erasing the
columns that are not listed
 Duplicate rows removed from result, since relations are sets

Project Operation Example
 Example: eliminate the dept_name attribute of instructor
 Query:
ID, name, salary (instructor)
 Result:

Composition of Relational Operations
 The result of a relational-algebra operation is relation and therefore of
relational-algebra operations can be composed together into a
relational-algebra expression.
 Consider the query -- Find the names of all instructors in the Physics
department.
name( dept_name =“Physics” (instructor))
 Instead of giving the name of a relation as the argument of the projection
operation, we give an expression that evaluates to a relation.

Cartesian-Product Operation
 The Cartesian-product operation (denoted by X) allows us to combine
information from any two relations.
 Example: the Cartesian product of the relations instructor and teaches is
written as:
instructor X teaches
 We construct a tuple of the result out of each possible pair of tuples: one
from the instructor relation and one from the teaches relation (see next
slide)
 Since the instructor ID appears in both relations we distinguish between
these attribute by attaching to the attribute the name of the relation from
which the attribute originally came.
• instructor.ID
• teaches.ID

The instructor X teaches table

Join Operation
 The Cartesian-Product
instructor X teaches
associates every tuple of instructor with every tuple of teaches.
• Most of the resulting rows have information about instructors who did
NOT teach a particular course.
 To get only those tuples of “instructor X teaches “ that pertain to
instructors and the courses that they taught, we write:
 instructor.id = teaches.id (instructor x teaches ))
• We get only those tuples of “instructor X teaches” that pertain to
instructors and the courses that they taught.
 The result of this expression, shown in the next slide

Join Operation (Cont.)
 The table corresponding to:
 instructor.id = teaches.id (instructor x teaches))

Join Operation (Cont.)
 The join operation allows us to combine a select operation and a
Cartesian-Product operation into a single operation.
 Consider relations r (R) and s (S)
 Let “theta” be a predicate on attributes in the schema R “union” S. The
join operation r ⋈𝜃 s is defined as follows:
𝑟 ⋈𝜃 𝑠 = 𝜎𝜃 (𝑟 × 𝑠)
 Thus
 instructor.id = teaches.id (instructor x teaches ))
 Can equivalently be written as
instructor ⋈ Instructor.id = teaches.id teaches.

Union Operation
 The union operation allows us to combine two relations
 Notation: r  s
 For r  s to be valid.
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (example: 2nd
column of r deals with the same type of values as does the
2nd column of s)
 Example: to find all courses taught in the Fall 2017 semester, or in the
Spring 2018 semester, or in both
course_id ( semester=“Fall” Λ year=2017 (section)) 
course_id ( semester=“Spring” Λ year=2018 (section))

Union Operation (Cont.)
 Result of:
course_id ( semester=“Fall” Λ year=2017 (section)) 

Set-Intersection Operation
 The set-intersection operation allows us to find tuples that are in both
the input relations.
 Notation: r  s
 Assume:
• r, s have the same arity
• attributes of r and s are compatible
 Example: Find the set of all courses taught in both the Fall 2017 and the
Spring 2018 semesters.
course_id ( semester=“Fall” Λ year=2017 (section)) 
• Result

Set Difference Operation
 The set-difference operation allows us to find tuples that are in one relation
but are not in another.
 Notation r – s
 Set differences must be taken between compatible relations.
• r and s must have the same arity
• attribute domains of r and s must be compatible
 Example: to find all courses taught in the Fall 2017 semester, but not in the
Spring 2018 semester
course_id ( semester=“Fall” Λ year=2017 (section)) −

The Assignment Operation
 It is convenient at times to write a relational-algebra expression by
assigning parts of it to temporary relation variables.
 The assignment operation is denoted by  and works like assignment in
a programming language.
 Example: Find all instructor in the “Physics” and Music department.
Physics   dept_name=“Physics” (instructor)
Music   dept_name=“Music” (instructor)
Physics  Music
 With the assignment operation, a query can be written as a sequential
program consisting of a series of assignments followed by an expression
whose value is displayed as the result of the query.

The Rename Operation
 The results of relational-algebra expressions do not have a name that we
can use to refer to them. The rename operator,  , is provided for that
purpose
 The expression:
x (E)
returns the result of expression E under the name x
 Another form of the rename operation:
x(A1,A2, .. An) (E)

Equivalent Queries
 There is more than one way to write a query in relational algebra.
 Example: Find information about courses taught by instructors in the
Physics department with salary greater than 90,000
 Query 1
 dept_name=“Physics”  salary > 90,000 (instructor)
 Query 2
 dept_name=“Physics” ( salary > 90.000 (instructor))
 The two queries are not identical; they are, however, equivalent -- they
give the same result on any database.

Equivalent Queries
 There is more than one way to write a query in relational algebra.
 Example: Find information about courses taught by instructors in the
Physics department
 Query 1
dept_name=“Physics” (instructor ⋈ instructor.ID = teaches.ID teaches)
 Query 2
(dept_name=“Physics” (instructor)) ⋈ instructor.ID = teaches.ID teaches
 The two queries are not identical; they are, however, equivalent -- they
give the same result on any database.

Entity Sets
 An entity is an object that exists and is distinguishable from other
objects.
• Example: specific person, company, event, plant
 An entity set is a set of entities of the same type that share the same
properties.
• Example: set of all persons, companies, trees, holidays
 An entity is represented by a set of attributes; i.e., descriptive properties
possessed by all members of an entity set.
• Example:
instructor = (ID, name, salary )
course= (course_id, title, credits)
 A subset of the attributes form a primary key of the entity set; i.e.,
uniquely identifying each member of the set.

Representing Entity sets in ER Diagram
 Entity sets can be represented graphically as follows:
• Rectangles represent entity sets.
• Attributes listed inside entity rectangle
• Underline indicates primary key attributes

Relationship Sets
 A relationship is an association among several entities
Example:
44553 (Peltier) advisor 22222 (Einstein)
student entity relationship set instructor entity
 A relationship set is a mathematical relation among n  2 entities, each
taken from entity sets
{(e1, e2, … en) | e1  E1, e2  E2, …, en  En}
where (e1, e2, …, en) is a relationship
• Example:
(44553,22222)  advisor

Relationship Sets (Cont.)
 Example: we define the relationship set advisor to denote the
associations between students and the instructors who act as their
advisors.
 Pictorially, we draw a line between related entities.

Representing Relationship Sets via ER Diagrams
 Diamonds represent relationship sets.

Relationship Sets (Cont.)
 An attribute can also be associated with a relationship set.
 For instance, the advisor relationship set between entity sets instructor
and student may have the attribute date which tracks when the student
started being associated with the advisor
instructor
student
76766 Crick
Katz
Srinivasan
Kim
Singh
Einstein
45565
10101
98345
76543
22222
98988
12345
00128
76543
44553
Tanaka
Shankar
Zhang
Brown
Aoi
Chavez
Peltier
3 May 2008
10 June 2007
12 June 2006
6 June 2009
30 June 2007
31 May 2007
4 May 2006
76653
23121

Relationship Sets with Attributes

Roles
 Entity sets of a relationship need not be distinct
• Each occurrence of an entity set plays a “role” in the relationship
 The labels “course_id” and “prereq_id” are called roles.

Degree of a Relationship Set
 Binary relationship
• involve two entity sets (or degree two).
• most relationship sets in a database system are binary.
 Relationships between more than two entity sets are rare. Most
relationships are binary. (More on this later.)
• Example: students work on research projects under the guidance of
an instructor.
• relationship proj_guide is a ternary relationship between instructor,
student, and project

Non-binary Relationship Sets
 Most relationship sets are binary
 There are occasions when it is more convenient to represent
relationships as non-binary.
 E-R Diagram with a Ternary Relationship

Complex Attributes
 Attribute types:
• Simple and composite attributes.
• Single-valued and multivalued attributes
 Example: multivalued attribute: phone_numbers
• Derived attributes
 Can be computed from other attributes
 Example: age, given date_of_birth
 Domain – the set of permitted values for each attribute

Composite Attributes
 Composite attributes allow us to divided attributes into subparts (other
attributes).
name address
first_name middle_initial last_name street city state postal_code
street_number street_name apartment_number
composite
attributes
component
attributes

Representing Complex Attributes in ER Diagram

Mapping Cardinality Constraints
 Express the number of entities to which another entity can be associated
via a relationship set.
 Most useful in describing binary relationship sets.
 For a binary relationship set the mapping cardinality must be one of the
following types:
• One to one
• One to many
• Many to one
• Many to many

Mapping Cardinalities
One to one One to many
Note: Some elements in A and B may not be mapped to any
elements in the other set

Mapping Cardinalities
Many to one Many to many
Note: Some elements in A and B may not be mapped to any
elements in the other set

Representing Cardinality Constraints in ER Diagram
 We express cardinality constraints by drawing either a directed line (),
signifying “one,” or an undirected line (—), signifying “many,” between the
relationship set and the entity set.
 One-to-one relationship between an instructor and a student :
• A student is associated with at most one instructor via the relationship
advisor
• A student is associated with at most one department via stud_dept

One-to-Many Relationship
 one-to-many relationship between an instructor and a student
• an instructor is associated with several (including 0) students via
advisor
• a student is associated with at most one instructor via advisor,

Many-to-One Relationships
 In a many-to-one relationship between an instructor and a student,
• an instructor is associated with at most one student via advisor,
• and a student is associated with several (including 0) instructors via
advisor

Many-to-Many Relationship
 An instructor is associated with several (possibly 0) students via advisor
 A student is associated with several (possibly 0) instructors via advisor

Total and Partial Participation
 Total participation (indicated by double line): every entity in the entity set
participates in at least one relationship in the relationship set
participation of student in advisor relation is total
 every student must have an associated instructor
 Partial participation: some entities may not participate in any relationship
in the relationship set
• Example: participation of instructor in advisor is partial

Notation for Expressing More Complex Constraints
 A line may have an associated minimum and maximum cardinality, shown
in the form l..h, where l is the minimum and h the maximum cardinality
• A minimum value of 1 indicates total participation.
• A maximum value of 1 indicates that the entity participates in at most
one relationship
• A maximum value of * indicates no limit.
 Example
• Instructor can advise 0 or more students. A student must have 1
advisor; cannot have multiple advisors

Cardinality Constraints on Ternary Relationship
 We allow at most one arrow out of a ternary (or greater degree)
relationship to indicate a cardinality constraint
 For example, an arrow from proj_guide to instructor indicates each
student has at most one guide for a project
 If there is more than one arrow, there are two ways of defining the
meaning.
• For example, a ternary relationship R between A, B and C with
arrows to B and C could mean
1. Each A entity is associated with a unique entity from B
and C or
2. Each pair of entities from (A, B) is associated with a
unique C entity, and each pair (A, C) is associated
with a unique B
• Each alternative has been used in different formalisms
• To avoid confusion we outlaw more than one arrow

Primary Key
 Primary keys provide a way to specify how entities and relations are
distinguished. We will consider:
• Entity sets
• Relationship sets.
• Weak entity sets

Primary key for Entity Sets
 By definition, individual entities are distinct.
 From database perspective, the differences among them must be
expressed in terms of their attributes.
 The values of the attribute values of an entity must be such that they can
uniquely identify the entity.
• No two entities in an entity set are allowed to have exactly the same
value for all attributes.
 A key for an entity is a set of attributes that suffice to distinguish entities
from each other

Primary Key for Relationship Sets
 To distinguish among the various relationships of a relationship set we use
the individual primary keys of the entities in the relationship set.
• Let R be a relationship set involving entity sets E1, E2, .. En
• The primary key for R is consists of the union of the primary keys of
entity sets E1, E2, ..En
• If the relationship set R has attributes a1, a2, .., am associated with it,
then the primary key of R also includes the attributes a1, a2, .., am
 Example: relationship set “advisor”.
• The primary key consists of instructor.ID and student.ID
 The choice of the primary key for a relationship set depends on the
mapping cardinality of the relationship set.

Choice of Primary key for Binary Relationship
 Many-to-Many relationships. The preceding union of the primary keys is a
minimal superkey and is chosen as the primary key.
 One-to-Many relationships . The primary key of the “Many” side is a
minimal superkey and is used as the primary key.
 Many-to-one relationships. The primary key of the “Many” side is a minimal
superkey and is used as the primary key.
 One-to-one relationships. The primary key of either one of the participating
entity sets forms a minimal superkey, and either one can be chosen as the
primary key.

Weak Entity Sets
 Consider a section entity, which is uniquely identified by a course_id,
semester, year, and sec_id.
 Clearly, section entities are related to course entities. Suppose we create
a relationship set sec_course between entity sets section and course.
 Note that the information in sec_course is redundant, since section
already has an attribute course_id, which identifies the course with which
the section is related.
 One option to deal with this redundancy is to get rid of the relationship
sec_course; however, by doing so the relationship between section and
course becomes implicit in an attribute, which is not desirable.

Weak Entity Sets (Cont.)
 An alternative way to deal with this redundancy is to not store the attribute
course_id in the section entity and to only store the remaining attributes
section_id, year, and semester.
• However, the entity set section then does not have enough attributes
to identify a particular section entity uniquely
 To deal with this problem, we treat the relationship sec_course as a
special relationship that provides extra information, in this case, the
course_id, required to identify section entities uniquely.
 A weak entity set is one whose existence is dependent on another entity,
called its identifying entity
 Instead of associating a primary key with a weak entity, we use the
identifying entity, along with extra attributes called discriminator to
uniquely identify a weak entity.

Weak Entity Sets (Cont.)
 An entity set that is not a weak entity set is termed a strong entity set.
 Every weak entity must be associated with an identifying entity; that is,
the weak entity set is said to be existence dependent on the identifying
entity set.
 The identifying entity set is said to own the weak entity set that it
identifies.
 The relationship associating the weak entity set with the identifying entity
set is called the identifying relationship.
 Note that the relational schema we eventually create from the entity set
section does have the attribute course_id, for reasons that will become
clear later, even though we have dropped the attribute course_id from
the entity set section.

Expressing Weak Entity Sets
 In E-R diagrams, a weak entity set is depicted via a double rectangle.
 We underline the discriminator of a weak entity set with a dashed line.
 The relationship set connecting the weak entity set to the identifying
strong entity set is depicted by a double diamond.
 Primary key for section – (course_id, sec_id, semester, year)

Redundant Attributes
 Suppose we have entity sets:
• student, with attributes: ID, name, tot_cred, dept_name
• department, with attributes: dept_name, building, budget
 We model the fact that each student has an associated department using
a relationship set stud_dept
 The attribute dept_name in student below replicates information present
in the relationship and is therefore redundant
• and needs to be removed.
 BUT: when converting back to tables, in some cases the attribute gets
reintroduced, as we will see later.

E-R Diagram for a University Enterprise

Reduction to Relation Schemas
 Entity sets and relationship sets can be expressed uniformly as relation
schemas that represent the contents of the database.
 A database which conforms to an E-R diagram can be represented by a
collection of schemas.
 For each entity set and relationship set there is a unique schema that is
assigned the name of the corresponding entity set or relationship set.
 Each schema has a number of columns (generally corresponding to
attributes), which have unique names.

Representing Entity Sets
 A strong entity set reduces to a schema with the same attributes
student(ID, name, tot_cred)
 A weak entity set becomes a table that includes a column for the primary
key of the identifying strong entity set
section ( course_id, sec_id, sem, year )
 Example

Representation of Entity Sets with Composite Attributes
 Composite attributes are flattened out by creating a
separate attribute for each component attribute
• Example: given entity set instructor with composite
attribute name with component attributes first_name
and last_name the schema corresponding to the
entity set has two attributes name_first_name and
name_last_name
 Prefix omitted if there is no ambiguity
(name_first_name could be first_name)
 Ignoring multivalued attributes, extended instructor
schema is
• instructor(ID,
first_name, middle_initial, last_name,
street_number, street_name,
apt_number, city, state, zip_code,
date_of_birth)

Representation of Entity Sets with Multivalued Attributes
 A multivalued attribute M of an entity E is represented by a separate
schema EM
 Schema EM has attributes corresponding to the primary key of E and an
attribute corresponding to multivalued attribute M
 Example: Multivalued attribute phone_number of instructor is
represented by a schema:
inst_phone= ( ID, phone_number)
 Each value of the multivalued attribute maps to a separate tuple of the
relation on schema EM
• For example, an instructor entity with primary key 22222 and phone
numbers 456-7890 and 123-4567 maps to two tuples:
(22222, 456-7890) and (22222, 123-4567)

Representing Relationship Sets
 A many-to-many relationship set is represented as a schema with
attributes for the primary keys of the two participating entity sets, and
any descriptive attributes of the relationship set.
 Example: schema for relationship set advisor
advisor = (s_id, i_id)

Redundancy of Schemas
 Many-to-one and one-to-many relationship sets that are total on the many-
side can be represented by adding an extra attribute to the “many” side,
containing the primary key of the “one” side
 Example: Instead of creating a schema for relationship set inst_dept, add
an attribute dept_name to the schema arising from entity set instructor
 Example

Redundancy of Schemas (Cont.)
 For one-to-one relationship sets, either side can be chosen to act as the
“many” side
• That is, an extra attribute can be added to either of the tables
corresponding to the two entity sets
 If participation is partial on the “many” side, replacing a schema by an
extra attribute in the schema corresponding to the “many” side could
result in null values

Redundancy of Schemas (Cont.)
 The schema corresponding to a relationship set linking a weak entity set
to its identifying strong entity set is redundant.
 Example: The section schema already contains the attributes that would
appear in the sec_course schema

Specialization
 Top-down design process; we designate sub-groupings within an entity set
that are distinctive from other entities in the set.
 These sub-groupings become lower-level entity sets that have attributes or
participate in relationships that do not apply to the higher-level entity set.
 Depicted by a triangle component labeled ISA (e.g., instructor “is a”
person).
 Attribute inheritance – a lower-level entity set inherits all the attributes
and relationship participation of the higher-level entity set to which it is
linked.

Specialization Example
 Overlapping – employee and student
 Disjoint – instructor and secretary
 Total and partial

Representing Specialization via Schemas
 Method 1:
• Form a schema for the higher-level entity
• Form a schema for each lower-level entity set, include primary key
of higher-level entity set and local attributes
• Drawback: getting information about, an employee requires
accessing two relations, the one corresponding to the low-level
schema and the one corresponding to the high-level schema

Representing Specialization as Schemas (Cont.)
 Method 2:
• Form a schema for each entity set with all local and inherited
attributes
• Drawback: name, street and city may be stored redundantly for
people who are both students and employees

Generalization
 A bottom-up design process – combine a number of entity sets that
share the same features into a higher-level entity set.
 Specialization and generalization are simple inversions of each other;
they are represented in an E-R diagram in the same way.
 The terms specialization and generalization are used interchangeably.

Completeness constraint
 Completeness constraint -- specifies whether or not an entity in the
higher-level entity set must belong to at least one of the lower-level
entity sets within a generalization.
• total: an entity must belong to one of the lower-level entity sets
• partial: an entity need not belong to one of the lower-level entity
sets

Completeness constraint (Cont.)
 Partial generalization is the default.
 We can specify total generalization in an ER diagram by adding the
keyword total in the diagram and drawing a dashed line from the
keyword to the corresponding hollow arrow-head to which it applies (for
a total generalization), or to the set of hollow arrow-heads to which it
applies (for an overlapping generalization).
 The student generalization is total: All student entities must be either
graduate or undergraduate. Because the higher-level entity set arrived
at through generalization is generally composed of only those entities
in the lower-level entity sets, the completeness constraint for a
generalized higher-level entity set is usually total

Aggregation
 Consider the ternary relationship proj_guide, which we saw earlier
 Suppose we want to record evaluations of a student by a guide on a
project

Aggregation (Cont.)
 Relationship sets eval_for and proj_guide represent overlapping
information
• Every eval_for relationship corresponds to a proj_guide relationship
• However, some proj_guide relationships may not correspond to any
eval_for relationships
 So we can’t discard the proj_guide relationship
 Eliminate this redundancy via aggregation
• Treat relationship as an abstract entity
• Allows relationships between relationships
• Abstraction of relationship into new entity

Aggregation (Cont.)
 Eliminate this redundancy via aggregation without introducing
redundancy, the following diagram represents:
• A student is guided by a particular instructor on a particular project
• A student, instructor, project combination may have an associated
evaluation

Entities vs. Attributes
 Use of entity sets vs. attributes
 Use of phone as an entity allows extra information about phone numbers
(plus multiple phone numbers)

Entities vs. Relationship sets
 Use of entity sets vs. relationship sets
Possible guideline is to designate a relationship set to describe
an action that occurs between entities
 Placement of relationship attributes
For example, attribute date as attribute of advisor or as attribute
of student

Summary of Symbols Used in E-R Notation

Symbols Used in E-R Notation (Cont.)

Outline
 Features of Good Relational Design
 Functional Dependencies
 Decomposition Using Functional Dependencies
 Normal Forms
 Functional Dependency Theory
 Algorithms for Decomposition using Functional Dependencies
 Decomposition Using Multivalued Dependencies
 More Normal Form
 Atomic Domains and First Normal Form
 Database-Design Process
 Modeling Temporal Data

Features of Good Relational Designs
 Suppose we combine instructor and department into in_dep, which
represents the natural join on the relations instructor and department
 There is repetition of information
 Need to use null values (if we add a new department with no instructors)

Decomposition
 The only way to avoid the repetition-of-information problem in the in_dep
schema is to decompose it into two schemas – instructor and department
schemas.
 Not all decompositions are good. Suppose we decompose
employee(ID, name, street, city, salary)
into
employee1 (ID, name)
employee2 (name, street, city, salary)
The problem arises when we have two employees with the same name
 The next slide shows how we lose information -- we cannot reconstruct
the original employee relation -- and so, this is a lossy decomposition.

Lossless Decomposition
 Let R be a relation schema and let R1 and R2 form a decomposition of R .
That is R = R1 U R2
 We say that the decomposition is a lossless decomposition if there is
no loss of information by replacing R with the two relation schemas R1
U R2
 Formally,
 R1
(r)  R2
(r) = r
 And, conversely a decomposition is lossy if
r   R1
(r)  R2
(r) = r

Example of Lossless Decomposition
 Decomposition of R = (A, B, C)
R1 = (A, B) R2 = (B, C)

Normalization Theory
 Decide whether a particular relation R is in “good” form.
 In the case that a relation R is not in “good” form, decompose it into set
of relations {R1, R2, ..., Rn} such that
• Each relation is in good form
• The decomposition is a lossless decomposition
 Our theory is based on:
• Functional dependencies
• Multivalued dependencies

Functional Dependencies
 There are usually a variety of constraints (rules) on the data in the real
world.
 For example, some of the constraints that are expected to hold in a
university database are:
• Students and instructors are uniquely identified by their ID.
• Each student and instructor has only one name.
• Each instructor and student is (primarily) associated with only one
department.
• Each department has only one value for its budget, and only one
associated building.

Functional Dependencies (Cont.)
 An instance of a relation that satisfies all such real-world constraints is
called a legal instance of the relation;
 A legal instance of a database is one where all the relation instances are
legal instances
 Constraints on the set of legal relations.
 Require that the value for a certain set of attributes determines uniquely
the value for another set of attributes.
 A functional dependency is a generalization of the notion of a key.

Functional Dependencies Definition
 Let R be a relation schema
  R and   R
 The functional dependency
  
holds on R if and only if for any legal relations r(R), whenever any two
tuples t1 and t2 of r agree on the attributes , they also agree on the
attributes . That is,
t1[] = t2 []  t1[ ] = t2 [ ]
 Example: Consider r(A,B ) with the following instance of r.
 On this instance, B  A hold; A  B does NOT hold,
1 4
1 5
3 7

Closure of a Set of Functional Dependencies
 Given a set F set of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
• If A  B and B  C, then we can infer that A  C
• etc.
 The set of all functional dependencies logically implied by F is the
closure of F.
 We denote the closure of F by F+
.

Keys and Functional Dependencies
 K is a superkey for relation schema R if and only if K  R
 K is a candidate key for R if and only if
• K  R, and
• for no   K,   R
 Functional dependencies allow us to express constraints that cannot be
expressed using superkeys. Consider the schema:
in_dep (ID, name, salary, dept_name, building, budget ).
We expect these functional dependencies to hold:
dept_name building
ID  building
but would not expect the following to hold:
dept_name  salary

Use of Functional Dependencies
 We use functional dependencies to:
• To test relations to see if they are legal under a given set of
functional dependencies.
 If a relation r is legal under a set F of functional dependencies,
we say that r satisfies F.
• To specify constraints on the set of legal relations
 We say that F holds on R if all legal relations on R satisfy the set
of functional dependencies F.
 Note: A specific instance of a relation schema may satisfy a functional
dependency even if the functional dependency does not hold on all legal
instances.
• For example, a specific instance of instructor may, by chance, satisfy
name  ID.

Trivial Functional Dependencies
 A functional dependency is trivial if it is satisfied by all instances of a
relation
 Example:
• ID, name  ID
• name  name
 In general,    is trivial if   

Lossless Decomposition
 We can use functional dependencies to show when certain
decomposition are lossless.
 For the case of R = (R1, R2), we require that for all possible relations r on
schema R
r = R1 (r ) R2 (r )
 A decomposition of R into R1 and R2 is lossless decomposition if at least
one of the following dependencies is in F+:
• R1  R2  R1
• R1  R2  R2
 The above functional dependencies are a sufficient condition for lossless
join decomposition; the dependencies are a necessary condition only if all
constraints are functional dependencies

Example
 R = (A, B, C)
F = {A  B, B  C)
 R1 = (A, B), R2 = (B, C)
• Lossless decomposition:
R1  R2 = {B} and B  BC
 R1 = (A, B), R2 = (A, C)
• Lossless decomposition:
R1  R2 = {A} and A  AB
 Note:
• B  BC
is a shorthand notation for
• B  {B, C}

Dependency Preservation
 Testing functional dependency constraints each time the database is
updated can be costly
 It is useful to design the database in a way that constraints can be
tested efficiently.
 If testing a functional dependency can be done by considering just one
relation, then the cost of testing this constraint is low
 When decomposing a relation it is possible that it is no longer possible
to do the testing without having to perform a Cartesian Produced.
 A decomposition that makes it computationally hard to enforce
functional dependency is said to be NOT dependency preserving.

Dependency Preservation Example
 Consider a schema:
dept_advisor(s_ID, i_ID, department_name)
 With function dependencies:
i_ID  dept_name
s_ID, dept_name  i_ID
 In the above design we are forced to repeat the department name once
for each time an instructor participates in a dept_advisor relationship.
 To fix this, we need to decompose dept_advisor
 Any decomposition will not include all the attributes in
 Thus, the composition NOT be dependency preserving

Boyce-Codd Normal Form
 A relation schema R is in BCNF with respect to a set F of functional
dependencies if for all functional dependencies in F+ of the form
  
where   R and   R, at least one of the following holds:
•    is trivial (i.e.,   )
•  is a superkey for R

Boyce-Codd Normal Form (Cont.)
 Example schema that is not in BCNF:
in_dep (ID, name, salary, dept_name, building, budget )
because :
• dept_name building, budget
 holds on in_dep
 but
• dept_name is not a superkey
 When decompose in_dept into instructor and department
• instructor is in BCNF
• department is in BCNF

Example
 R = (A, B, C)
F = {A  B, B  C)
 R1 = (A, B), R2 = (B, C)
• Lossless-join decomposition:
R1  R2 = {B} and B  BC
• Dependency preserving
 R1 = (A, B), R2 = (A, C)
• Lossless-join decomposition:
R1  R2 = {A} and A  AB
• Not dependency preserving
(cannot check B  C without computing R1 R2)

BCNF and Dependency Preservation
 It is not always possible to achieve both BCNF and dependency
preservation
dept_advisor(s_ID, i_ID, department_name)
i_ID  dept_name
 dept_advisor is not in BCNF
• i_ID is not a superkey.
 Any decomposition of dept_advisor will not include all the attributes in
 Thus, the composition is NOT be dependency preserving

Third Normal Form
 A relation schema R is in third normal form (3NF) if for all:
   in F+
at least one of the following holds:
•    is trivial (i.e.,   )
•  is a superkey for R
• Each attribute A in  –  is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
 If a relation is in BCNF it is in 3NF (since in BCNF one of the first two
conditions above must hold).
 Third condition is a minimal relaxation of BCNF to ensure dependency
preservation (will see why later).

3NF Example
dept_advisor(s_ID, i_ID, dept_name)
i_ID  dept_name
 Two candidate keys = {s_ID, dept_name}, {s_ID, i_ID }
 We have seen before that dept_advisor is not in BCNF
 R, however, is in 3NF
• s_ID, dept_name is a superkey
• i_ID  dept_name and i_ID is NOT a superkey, but:
 { dept_name} – {i_ID } = {dept_name } and
 dept_name is contained in a candidate key

Comparison of BCNF and 3NF
 Advantages to 3NF over BCNF. It is always possible to obtain a 3NF
design without sacrificing losslessness or dependency preservation.
 Disadvantages to 3NF.
• We may have to use null values to represent some of the possible
meaningful relationships among data items.
• There is the problem of repetition of information.

 It is better to decompose inst_info into:
• inst_child:
• inst_phone:
 This suggests the need for higher normal forms, such as Fourth
Normal Form (4NF), which we shall see later
Higher Normal Forms

 Given a set F set of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
• If A  B and B  C, then we can infer that A  C
• etc.
 The set of all functional dependencies logically implied by F is the closure
of F.
 We denote the closure of F by F+
.

 We can compute F+, the closure of F, by repeatedly applying Armstrong’s
Axioms:
• Reflexive rule: if   , then   
• Augmentation rule: if   , then     
• Transitivity rule: if   , and   , then   
 These rules are
• Sound -- generate only functional dependencies that actually hold,
and
• Complete -- generate all functional dependencies that hold.

Example of F+
 R = (A, B, C, G, H, I)
F = { A  B
A  C
CG  H
CG  I
B  H}
 Some members of F+
• A  H
 by transitivity from A  B and B  H
• AG  I
 by augmenting A  C with G, to get AG  CG
and then transitivity with CG  I
• CG  HI
 by augmenting CG  I to infer CG  CGI,
and augmenting of CG  H to infer CGI  HI,
and then transitivity

Closure of Attribute Sets
 Given a set of attributes , define the closure of  under F (denoted by
+) as the set of attributes that are functionally determined by  under F
 Algorithm to compute +, the closure of  under F
result := ;
while (changes to result) do
for each    in F do
begin
if   result then result := result  
end

Example of Attribute Set Closure
 R = (A, B, C, G, H, I)
 F = {A  B
A  C
CG  H
CG  I
B  H}
 (AG)+
1. result = AG
2. result = ABCG (A  C and A  B)
3. result = ABCGH (CG  H and CG  AGBC)
4. result = ABCGHI (CG  I and CG  AGBCH)
 Is AG a candidate key?
1. Is AG a super key?
1. Does AG  R? == Is R  (AG)+
2. Is any subset of AG a superkey?
1. Does A  R? == Is R  (A)+
2. Does G  R? == Is R  (G)+
3. In general: check for each subset of size n-1

Canonical Cover
 Suppose that we have a set of functional dependencies F on a relation
schema. Whenever a user performs an update on the relation, the
database system must ensure that the update does not violate any
functional dependencies; that is, all the functional dependencies in F are
satisfied in the new database state.
 If an update violates any functional dependencies in the set F, the system
must roll back the update.
 We can reduce the effort spent in checking for violations by testing a
simplified set of functional dependencies that has the same closure as the
given set.
 This simplified set is termed the canonical cover
 To define canonical cover we must first define extraneous attributes.
• An attribute of a functional dependency in F is extraneous if we can
remove it without changing F +

Dependency Preservation (Cont.)
 Let F be the set of dependencies on schema R and let R1, R2 , .., Rn be
a decomposition of R.
 The restriction of F to Ri is the set Fi of all functional dependencies in F +
that include only attributes of Ri .
 Since all functional dependencies in a restriction involve attributes of only
one relation schema, it is possible to test such a dependency for
satisfaction by checking only one relation.
 Note that the definition of restriction uses all dependencies in in F +, not
just those in F.
 The set of restrictions F1, F2 , .. , Fn is the set of functional dependencies
that can be checked efficiently.

Testing for BCNF
 To check if a non-trivial dependency   causes a violation of BCNF
1. compute + (the attribute closure of ), and
2. verify that it includes all attributes of R, that is, it is a superkey of R.
 Simplified test: To check if a relation schema R is in BCNF, it suffices to
check only the dependencies in the given set F for violation of BCNF,
rather than checking all dependencies in F+.
• If none of the dependencies in F causes a violation of BCNF, then
none of the dependencies in F+ will cause a violation of BCNF either.
 However, simplified test using only F is incorrect when testing a relation
in a decomposition of R
• Consider R = (A, B, C, D, E), with F = { A  B, BC  D}
 Decompose R into R1 = (A,B) and R2 = (A,C,D, E)
 Neither of the dependencies in F contain only attributes from
(A,C,D,E) so we might be mislead into thinking R2 satisfies BCNF.
 In fact, dependency AC  D in F+ shows R2 is not in BCNF.

Testing Decomposition for BCNF
 Either test Ri for BCNF with respect to the restriction of F+ to Ri (that
is, all FDs in F+ that contain only attributes from Ri)
 Or use the original set of dependencies F that hold on R, but with the
following test:
 for every set of attributes   Ri, check that + (the attribute
closure of ) either includes no attribute of Ri- , or includes all
attributes of Ri.
• If the condition is violated by some    in F+, the dependency
  (+ - )  Ri
can be shown to hold on Ri, and Ri violates BCNF.
• We use above dependency to decompose Ri
To check if a relation Ri in a decomposition of R is in BCNF

BCNF Decomposition Algorithm
result := {R };
done := false;
compute F +;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let    be a nontrivial functional dependency that
holds on Ri such that   Ri is not in F +,
and    = ;
result := (result – Ri )  (Ri – )  (,  );
end
else done := true;
Note: each Ri is in BCNF, and decomposition is lossless-join.

BCNF Decomposition (Cont.)
 course is in BCNF
• How do we know this?
 building, room_number→capacity holds on class-1
• but {building, room_number} is not a superkey for class-1.
• We replace class-1 by:
 classroom (building, room_number, capacity)
 section (course_id, sec_id, semester, year, building,
room_number, time_slot_id)
 classroom and section are in BCNF.

Third Normal Form
 There are some situations where
• BCNF is not dependency preserving, and
• efficient checking for FD violation on updates is important
 Solution: define a weaker normal form, called Third Normal Form (3NF)
• Allows some redundancy (with resultant problems; we will see
examples later)
• But functional dependencies can be checked on individual relations
without computing a join.
• There is always a lossless-join, dependency-preserving
decomposition into 3NF.

3NF Example -- Relation dept_advisor
 dept_advisor (s_ID, i_ID, dept_name)
F = {s_ID, dept_name  i_ID, i_ID  dept_name}
 Two candidate keys: s_ID, dept_name, and i_ID, s_ID
 R is in 3NF
• s_ID, dept_name  i_ID s_ID
 dept_name is a superkey
• i_ID  dept_name
 dept_name is contained in a candidate key

3NF Decomposition Algorithm
Let Fc be a canonical cover for F;
i := 0;
for each functional dependency    in Fc do
if none of the schemas Rj, 1  j  i contains  
then begin
i := i + 1;
Ri :=  
end
if none of the schemas Rj, 1  j  i contains a candidate key for R
then begin
i := i + 1;
Ri := any candidate key for R;
end
/* Optionally, remove redundant relations */
repeat
if any schema Rj is contained in another schema Rk
then /* delete Rj */
Rj = R;;
i=i-1;
return (R1, R2, ..., Ri)

3NF Decomposition Algorithm (Cont.)
 Each relation schema Ri is in 3NF
 Decomposition is dependency preserving and lossless-join
 Proof of correctness is at end of this presentation (click here)
Above algorithm ensures

Comparison of BCNF and 3NF
 It is always possible to decompose a relation into a set of relations that
are in 3NF such that:
• The decomposition is lossless
• The dependencies are preserved
 It is always possible to decompose a relation into a set of relations that
are in BCNF such that:
• The decomposition is lossless
• It may not be possible to preserve dependencies.

Multivalued Dependencies (MVDs)
 Suppose we record names of children, and phone numbers for
instructors:
• inst_child(ID, child_name)
• inst_phone(ID, phone_number)
 If we were to combine these schemas to get
• inst_info(ID, child_name, phone_number)
• Example data:
(99999, David, 512-555-1234)
(99999, David, 512-555-4321)
(99999, William, 512-555-1234)
(99999, William, 512-555-4321)
 This relation is in BCNF
• Why?

Multivalued Dependencies
 Let R be a relation schema and let   R and   R. The multivalued
dependency
  
holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r
such that t1[] = t2 [], there exist tuples t3 and t4 in r such that:
t1[] = t2 [] = t3 [] = t4 []
t3[] = t1 []
t3[R – ] = t2[R – ]
t4 [] = t2[]
t4[R – ] = t1[R – ]

Fourth Normal Form
 A relation schema R is in 4NF with respect to a set D of functional and
multivalued dependencies if for all multivalued dependencies in D+ of the
form   , where   R and   R, at least one of the following hold:
•    is trivial (i.e.,    or    = R)
•  is a superkey for schema R
 If a relation is in 4NF it is in BCNF

4NF Decomposition Algorithm
result: = {R};
done := false;
compute D+;
Let Di denote the restriction of D+ to Ri
while (not done)
if (there is a schema Ri in result that is not in 4NF) then
begin
let    be a nontrivial multivalued dependency that holds
on Ri such that   Ri is not in Di, and ;
result := (result - Ri)  (Ri - )  (, );
end
else done:= true;
Note: each Ri is in 4NF, and decomposition is lossless-join

Example
 R =(A, B, C, G, H, I)
F ={ A  B
B  HI
CG  H }
 R is not in 4NF since A  B and A is not a superkey for R
 Decomposition
a) R1 = (A, B) (R1 is in 4NF)
b) R2 = (A, C, G, H, I) (R2 is not in 4NF, decompose into R3 and R4)
c) R3 = (C, G, H) (R3 is in 4NF)
d) R4 = (A, C, G, I) (R4 is not in 4NF, decompose into R5 and R6)
• A  B and B  HI  A  HI, (MVD transitivity), and
• and hence A  I (MVD restriction to R4)
e) R5 = (A, I) (R5 is in 4NF)
f)R6 = (A, C, G) (R6 is in 4NF)

First Normal Form
 Domain is atomic if its elements are considered to be indivisible units
• Examples of non-atomic domains:
 Set of names, composite attributes
 Identification numbers like CS101 that can be broken up into parts
 A relational schema R is in first normal form if the domains of all attributes
of R are atomic
 Non-atomic values complicate storage and encourage redundant
(repeated) storage of data
• Example: Set of accounts stored with each customer, and set of
owners stored with each account
• We assume all relations are in first normal form (and revisit this in
Chapter 22: Object Based Databases)

First Normal Form (Cont.)
 Atomicity is actually a property of how the elements of the domain are
used.
• Example: Strings would normally be considered indivisible
• Suppose that students are given roll numbers which are strings of the
form CS0012 or EE1127
• If the first two characters are extracted to find the department, the
domain of roll numbers is not atomic.
• Doing so is a bad idea: leads to encoding of information in application
program rather than in the database.

UNIT 3 – DATA STORAGE AND QUERY
PROCESSING

Classification of Physical Storage Media
 Can differentiate storage into:
• volatile storage: loses contents when power is switched off
• non-volatile storage:
 Contents persist even when power is switched off.
 Includes secondary and tertiary storage, as well as batter-backed
up main-memory.
 Factors affecting choice of storage media include
• Speed with which data can be accessed
• Cost per unit of data
• Reliability

Storage Hierarchy (Cont.)
 primary storage: Fastest media but volatile (cache, main memory).
 secondary storage: next level in hierarchy, non-volatile, moderately fast
access time
• Also called on-line storage
• E.g., flash memory, magnetic disks
 tertiary storage: lowest level in hierarchy, non-volatile, slow access time
• also called off-line storage and used for archival storage
• e.g., magnetic tape, optical storage
• Magnetic tape
 Sequential access, 1 to 12 TB capacity
 A few drives with many tapes
 Juke boxes with petabytes (1000’s of TB) of storage

Storage Interfaces
 Disk interface standards families
• SATA (Serial ATA)
 SATA 3 supports data transfer speeds of up to 6 gigabits/sec
• SAS (Serial Attached SCSI)
 SAS Version 3 supports 12 gigabits/sec
• NVMe (Non-Volatile Memory Express) interface
 Works with PCIe connectors to support lower latency and higher
transfer rates
 Supports data transfer rates of up to 24 gigabits/sec
 Disks usually connected directly to computer system
 In Storage Area Networks (SAN), a large number of disks are connected
by a high-speed network to a number of servers
 In Network Attached Storage (NAS) networked storage provides a file
system interface using networked file system protocol, instead of
providing a disk system interface

Magnetic Hard Disk Mechanism
Schematic diagram of magnetic disk drive Photo of magnetic disk drive

Magnetic Disks
 Read-write head
 Surface of platter divided into circular tracks
• Over 50K-100K tracks per platter on typical hard disks
 Each track is divided into sectors.
• A sector is the smallest unit of data that can be read or written.
• Sector size typically 512 bytes
• Typical sectors per track: 500 to 1000 (on inner tracks) to 1000 to
2000 (on outer tracks)
 To read/write a sector
• disk arm swings to position head on right track
• platter spins continually; data is read/written as sector passes under
head
 Head-disk assemblies
• multiple disk platters on a single spindle (1 to 5 usually)
• one head per platter, mounted on a common arm.
 Cylinder i consists of ith track of all the platters

Magnetic Disks (Cont.)
 Disk controller – interfaces between the computer system and the disk
drive hardware.
• accepts high-level commands to read or write a sector
• initiates actions such as moving the disk arm to the right track and
actually reading or writing the data
• Computes and attaches checksums to each sector to verify that
data is read back correctly
 If data is corrupted, with very high probability stored checksum
won’t match recomputed checksum
• Ensures successful writing by reading back sector after writing it
• Performs remapping of bad sectors

Performance Measures of Disks
 Access time – the time it takes from when a read or write request is
issued to when data transfer begins. Consists of:
• Seek time – time it takes to reposition the arm over the correct track.
 Average seek time is 1/2 the worst case seek time.
• Would be 1/3 if all tracks had the same number of sectors, and
we ignore the time to start and stop arm movement
 4 to 10 milliseconds on typical disks
• Rotational latency – time it takes for the sector to be accessed to
appear under the head.
 4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.)
 Average latency is 1/2 of the above latency.
• Overall latency is 5 to 20 msec depending on disk model
 Data-transfer rate – the rate at which data can be retrieved from or stored
to the disk.
• 25 to 200 MB per second max rate, lower for inner tracks

Performance Measures (Cont.)
 Disk block is a logical unit for storage allocation and retrieval
• 4 to 16 kilobytes typically
 Smaller blocks: more transfers from disk
 Larger blocks: more space wasted due to partially filled blocks
 Sequential access pattern
• Successive requests are for successive disk blocks
• Disk seek required only for first block
 Random access pattern
• Successive requests are for blocks that can be anywhere on disk
• Each access requires a seek
• Transfer rates are low since a lot of time is wasted in seeks
 I/O operations per second (IOPS)
• Number of random block reads that a disk can support per second
• 50 to 200 IOPS on current generation magnetic disks

Performance Measures (Cont.)
 Mean time to failure (MTTF) – the average time the disk is expected to
run continuously without any failure.
• Typically 3 to 5 years
• Probability of failure of new disks is quite low, corresponding to a
“theoretical MTTF” of 500,000 to 1,200,000 hours for a new disk
 E.g., an MTTF of 1,200,000 hours for a new disk means that given
1000 relatively new disks, on an average one will fail every 1200
hours
• MTTF decreases as disk ages

Flash Storage
 NOR flash vs NAND flash
 NAND flash
• used widely for storage, cheaper than NOR flash
• requires page-at-a-time read (page: 512 bytes to 4 KB)
 20 to 100 microseconds for a page read
 Not much difference between sequential and random read
• Page can only be written once
 Must be erased to allow rewrite
 Solid state disks
• Use standard block-oriented disk interfaces, but store data on multiple
flash storage devices internally
• Transfer rate of up to 500 MB/sec using SATA, and
up to 3 GB/sec using NVMe PCIe

Flash Storage (Cont.)
 Erase happens in units of erase block
• Takes 2 to 5 millisecs
• Erase block typically 256 KB to 1 MB (128 to 256 pages)
 Remapping of logical page addresses to physical page addresses avoids
waiting for erase
 Flash translation table tracks mapping
• also stored in a label field of flash page
• remapping carried out by flash translation layer
 After 100,000 to 1,000,000 erases, erase block becomes unreliable and
cannot be used
• wear leveling

SSD Performance Metrics
 Random reads/writes per second
• Typical 4 KB reads: 10,000 reads per second (10,000 IOPS)
• Typical 4KB writes: 40,000 IOPS
• SSDs support parallel reads
 Typical 4KB reads:
• 100,000 IOPS with 32 requests in parallel (QD-32) on SATA
• 350,000 IOPS with QD-32 on NVMe PCIe
 Typical 4KB writes:
• 100,000 IOPS with QD-32, even higher on some models
 Data transfer rate for sequential reads/writes
• 400 MB/sec for SATA3, 2 to 3 GB/sec using NVMe PCIe
 Hybrid disks: combine small amount of flash cache with larger magnetic
disk

Storage Class Memory
 3D-XPoint memory technology pioneered by Intel
 Available as Intel Optane
• SSD interface shipped from 2017
 Allows lower latency than flash SSDs
• Non-volatile memory interface announced in 2018
 Supports direct access to words, at speeds comparable to main-
memory speeds

RAID
 RAID: Redundant Arrays of Independent Disks
• disk organization techniques that manage a large numbers of disks,
providing a view of a single disk of
 high capacity and high speed by using multiple disks in parallel,
 high reliability by storing data redundantly, so that data can be
recovered even if a disk fails
 The chance that some disk out of a set of N disks will fail is much higher
than the chance that a specific single disk will fail.
• E.g., a system with 100 disks, each with MTTF of 100,000 hours
(approx. 11 years), will have a system MTTF of 1000 hours (approx.
41 days)
• Techniques for using redundancy to avoid data loss are critical with
large numbers of disks

Improvement of Reliability via Redundancy
 Redundancy – store extra information that can be used to rebuild
information lost in a disk failure
 E.g., Mirroring (or shadowing)
• Duplicate every disk. Logical disk consists of two physical disks.
• Every write is carried out on both disks
 Reads can take place from either disk
• If one disk in a pair fails, data still available in the other
 Data loss would occur only if a disk fails, and its mirror disk also
fails before the system is repaired
• Probability of combined event is very small
 Except for dependent failure modes such as fire or building
collapse or electrical power surges
 Mean time to data loss depends on mean time to failure,
and mean time to repair
• E.g., MTTF of 100,000 hours, mean time to repair of 10 hours gives
mean time to data loss of 500*106 hours (or 57,000 years) for a
mirrored pair of disks (ignoring dependent failure modes)

Improvement in Performance via Parallelism
 Two main goals of parallelism in a disk system:
1. Load balance multiple small accesses to increase throughput
2. Parallelize large accesses to reduce response time.
 Improve transfer rate by striping data across multiple disks.
 Bit-level striping – split the bits of each byte across multiple disks
• In an array of eight disks, write bit i of each byte to disk i.
• Each access can read data at eight times the rate of a single disk.
• But seek/access time worse than for a single disk
 Bit level striping is not used much any more
 Block-level striping – with n disks, block i of a file goes to disk (i mod n)
+ 1
• Requests for different blocks can run in parallel if the blocks reside on
different disks
• A request for a long sequence of blocks can utilize all disks in parallel

RAID Levels
 Schemes to provide redundancy at lower cost by using disk striping
combined with parity bits
• Different RAID organizations, or RAID levels, have differing cost,
performance and reliability characteristics
 RAID Level 0: Block striping; non-redundant.
• Used in high-performance applications where data loss is not critical.
 RAID Level 1: Mirrored disks with block striping
• Offers best write performance.
• Popular for applications such as storing log files in a database system.

RAID Levels (Cont.)
 Parity blocks: Parity block j stores XOR of bits from block j of each disk
• When writing data to a block j, parity block j must also be computed
and written to disk
 Can be done by using old parity block, old value of current block
and new value of current block (2 block reads + 2 block writes)
 Or by recomputing the parity value using the new values of blocks
corresponding to the parity block
• More efficient for writing large amounts of data sequentially
• To recover data for a block, compute XOR of bits from all other
blocks in the set including the parity block

RAID Levels (Cont.)
 RAID Level 5: Block-Interleaved Distributed Parity; partitions data and
parity among all N + 1 disks, rather than storing data in N disks and parity
in 1 disk.
• E.g., with 5 disks, parity block for nth set of blocks is stored on disk
(n mod 5) + 1, with the data blocks stored on the other 4 disks.

RAID Levels (Cont.)
 RAID Level 5 (Cont.)
• Block writes occur in parallel if the blocks and their parity blocks are
on different disks.
 RAID Level 6: P+Q Redundancy scheme; similar to Level 5, but stores
two error correction blocks (P, Q) instead of single parity block to guard
against multiple disk failures.
• Better reliability than Level 5 at a higher cost
 Becoming more important as storage sizes increase

RAID Levels (Cont.)
 Other levels (not used in practice):
• RAID Level 2: Memory-Style Error-Correcting-Codes (ECC) with bit
striping.
• RAID Level 3: Bit-Interleaved Parity
• RAID Level 4: Block-Interleaved Parity; uses block-level striping,
and keeps a parity block on a separate parity disk for corresponding
blocks from N other disks.
 RAID 5 is better than RAID 4, since with RAID 4 with random
writes, parity disk gets much higher write load than other disks
and becomes a bottleneck

Choice of RAID Level
 Factors in choosing RAID level
• Monetary cost
• Performance: Number of I/O operations per second, and bandwidth
during normal operation
• Performance during failure
• Performance during rebuild of failed disk
 Including time taken to rebuild failed disk
 RAID 0 is used only when data safety is not important
• E.g., data can be recovered quickly from other sources

Choice of RAID Level (Cont.)
 Level 1 provides much better write performance than level 5
• Level 5 requires at least 2 block reads and 2 block writes to write a
single block, whereas Level 1 only requires 2 block writes
 Level 1 had higher storage cost than level 5
 Level 5 is preferred for applications where writes are sequential and large
(many blocks), and need large amounts of data storage
 RAID 1 is preferred for applications with many random/small updates
 Level 6 gives better data protection than RAID 5 since it can tolerate two
disk (or disk block) failures
• Increasing in importance since latent block failures on one disk,
coupled with a failure of another disk can result in data loss with RAID
1 and RAID 5.

Hardware Issues
 Software RAID: RAID implementations done entirely in software, with
no special hardware support
 Hardware RAID: RAID implementations with special hardware
• Use non-volatile RAM to record writes that are being executed
• Beware: power failure during write can result in corrupted disk
 E.g., failure after writing one block but before writing the second
in a mirrored system
 Such corrupted data must be detected when power is restored
• Recovery from corruption is similar to recovery from failed
disk
• NV-RAM helps to efficiently detected potentially corrupted
blocks
 Otherwise all blocks of disk must be read and compared
with mirror/parity block

Hardware Issues (Cont.)
 Latent failures: data successfully written earlier gets damaged
• can result in data loss even if only one disk fails
 Data scrubbing:
• continually scan for latent failures, and recover from copy/parity
 Hot swapping: replacement of disk while system is running, without power
down
• Supported by some hardware RAID systems,
• reduces time to recovery, and improves availability greatly
 Many systems maintain spare disks which are kept online, and used as
replacements for failed disks immediately on detection of failure
• Reduces time to recovery greatly
 Many hardware RAID systems ensure that a single point of failure will not
stop the functioning of the system by using
• Redundant power supplies with battery backup
• Multiple controllers and multiple interconnections to guard against
controller/interconnection failures

Optimization of Disk-Block Access
 Buffering: in-memory buffer to cache disk blocks
 Read-ahead: Read extra blocks from a track in anticipation that they will
be requested soon
 Disk-arm-scheduling algorithms re-order block requests so that disk arm
movement is minimized
• elevator algorithm

Optimization of Disk-Block Access
 Buffering: in-memory buffer to cache disk blocks
 Read-ahead: Read extra blocks from a track in anticipation that they will
be requested soon
 Disk-arm-scheduling algorithms re-order block requests so that disk arm
movement is minimized
• elevator algorithm
R1 R5 R2 R4
R3
R6
Inner track Outer track

Magnetic Tapes
 Hold large volumes of data and provide high transfer rates
• Few GB for DAT (Digital Audio Tape) format, 10-40 GB with DLT
(Digital Linear Tape) format, 100 GB+ with Ultrium format, and 330 GB
with Ampex helical scan format
• Transfer rates from few to 10s of MB/s
 Tapes are cheap, but cost of drives is very high
 Very slow access time in comparison to magnetic and optical disks
• limited to sequential access.
• Some formats (Accelis) provide faster seek (10s of seconds) at cost of
lower capacity
 Used mainly for backup, for storage of infrequently used information, and
as an off-line medium for transferring information from one system to
another.
 Tape jukeboxes used for very large capacity storage
• Multiple petabyes (1015 bytes)

File Organization
 The database is stored as a collection of files. Each file is a sequence of
records. A record is a sequence of fields.
 One approach
• Assume record size is fixed
• Each file has records of one particular type only
• Different files are used for different relations
This case is easiest to implement; will consider variable length records
later
 We assume that records are smaller than a disk block
.

Fixed-Length Records
 Simple approach:
• Store record i starting from byte n  (i – 1), where n is the size of
each record.
• Record access is simple but records may cross blocks
 Modification: do not allow records to cross block boundaries

 Deletion of record i: alternatives:
• move records i + 1, . . ., n to i, . . . , n – 1
• move record n to i
• do not move records, but link all free records on a free list
Record 3 deleted

Record 3 deleted and replaced by record 11

Variable-Length Records
 Variable-length records arise in database systems in several ways:
• Storage of multiple record types in a file.
• Record types that allow variable lengths for one or more fields such
as strings (varchar)
• Record types that allow repeating fields (used in some older data
models).
 Attributes are stored in order
 Variable length attributes represented by fixed size (offset, length), with
actual data stored after all fixed length attributes
 Null values represented by null-value bitmap

Variable-Length Records: Slotted Page Structure
 Slotted page header contains:
• number of record entries
• end of free space in the block
• location and size of each record
 Records can be moved around within a page to keep them contiguous
with no empty space between them; entry in the header must be
updated.
 Pointers should not point directly to record — instead they should point
to the entry for the record in header.

Storing Large Objects
 E.g., blob/clob types
 Records must be smaller than pages
 Alternatives:
• Store as files in file systems
• Store as files managed by database
• Break into pieces and store in multiple tuples in separate relation
 PostgreSQL TOAST

Organization of Records in Files
 Heap – record can be placed anywhere in the file where there is space
 Sequential – store records in sequential order, based on the value of the
search key of each record
 In a multitable clustering file organization records of several different
relations can be stored in the same file
• Motivation: store related records on the same block to minimize I/O
 B+-tree file organization
• Ordered storage even with inserts/deletes
• More on this in Chapter 14
 Hashing – a hash function computed on search key; the result specifies in
which block of the file the record should be placed
• More on this in Chapter 14

Heap File Organization
 Records can be placed anywhere in the file where there is free space
 Records usually do not move once allocated
 Important to be able to efficiently find free space within file
 Free-space map
• Array with 1 entry per block. Each entry is a few bits to a byte, and
records fraction of block that is free
• In example below, 3 bits per block, value divided by 8 indicates
fraction of block that is free
• Can have second-level free-space map
• In example below, each entry stores maximum from 4 entries of first-
level free-space map
 Free space map written to disk periodically, OK to have wrong (old) values
for some entries (will be detected and fixed)

Sequential File Organization
 Suitable for applications that require sequential processing of
the entire file
 The records in the file are ordered by a search-key

Sequential File Organization (Cont.)
 Deletion – use pointer chains
 Insertion –locate the position where the record is to be inserted
• if there is free space insert there
• if no free space, insert the record in an overflow block
• In either case, pointer chain must be updated
 Need to reorganize the file
from time to time to restore
sequential order

Multitable Clustering File Organization
Store several relations in one file using a multitable clustering
file organization
department
instructor
multitable clustering
of department and
instructor

Multitable Clustering File Organization (cont.)
 good for queries involving department ⨝ instructor, and for queries
involving one single department and its instructors
 bad for queries involving only department
 results in variable size records
 Can add pointer chains to link records of a particular relation

Partitioning
 Table partitioning: Records in a relation can be partitioned into smaller
relations that are stored separately
 E.g., transaction relation may be partitioned into
transaction_2018, transaction_2019, etc.
 Queries written on transaction must access records in all partitions
• Unless query has a selection such as year=2019, in which case only
one partition in needed
 Partitioning
• Reduces costs of some operations such as free space management
• Allows different partitions to be stored on different storage devices
 E.g., transaction partition for current year on SSD, for older years
on magnetic disk

Data Dictionary Storage
 Information about relations
• names of relations
• names, types and lengths of attributes of each relation
• names and definitions of views
• integrity constraints
 User and accounting information, including passwords
 Statistical and descriptive data
• number of tuples in each relation
 Physical file organization information
• How relation is stored (sequential/hash/…)
• Physical location of relation
 Information about indices (Chapter 14)
The Data dictionary (also called system catalog) stores
metadata; that is, data about data, such as

Relational Representation of System Metadata
 Relational
representation on
disk
 Specialized data
structures designed
for efficient access,
in memory

Storage Access
 Blocks are units of both storage allocation and data transfer.
 Database system seeks to minimize the number of block transfers
between the disk and memory. We can reduce the number of disk
accesses by keeping as many blocks as possible in main memory.
 Buffer – portion of main memory available to store copies of disk blocks.
 Buffer manager – subsystem responsible for allocating buffer space in
main memory.

Buffer Manager
 Programs call on the buffer manager when they need a block from disk.
• If the block is already in the buffer, buffer manager returns the
address of the block in main memory
• If the block is not in the buffer, the buffer manager
 Allocates space in the buffer for the block
• Replacing (throwing out) some other block, if required, to make
space for the new block.
• Replaced block written back to disk only if it was modified
since the most recent time that it was written to/fetched from
the disk.
 Reads the block from the disk to the buffer, and returns the
address of the block in main memory to requester.

Buffer Manager
 Buffer replacement strategy (details coming up!)
 Pinned block: memory block that is not allowed to be written back to disk
• Pin done before reading/writing data from a block
• Unpin done when read /write is complete
• Multiple concurrent pin/unpin operations possible
 Keep a pin count, buffer block can be evicted only if pin count = 0
 Shared and exclusive locks on buffer
• Needed to prevent concurrent operations from reading page contents
as they are moved/reorganized, and to ensure only one
move/reorganize at a time
• Readers get shared lock, updates to a block require exclusive lock
• Locking rules:
 Only one process can get exclusive lock at a time
 Shared lock cannot be concurrently with exclusive lock
 Multiple processes may be given shared lock concurrently

Buffer-Replacement Policies
 Most operating systems replace the block least recently used (LRU
strategy)
• Idea behind LRU – use past pattern of block references as a
predictor of future references
• LRU can be bad for some queries
 Queries have well-defined access patterns (such as sequential scans),
and a database system can use the information in a user’s query to
predict future references
 Mixed strategy with hints on replacement strategy provided
by the query optimizer is preferable
 Example of bad access pattern for LRU: when computing the join of 2
relations r and s by a nested loops
for each tuple tr of r do
for each tuple ts of s do
if the tuples tr and ts match …

Buffer-Replacement Policies (Cont.)
 Toss-immediate strategy – frees the space occupied by a block as soon
as the final tuple of that block has been processed
 Most recently used (MRU) strategy – system must pin the block
currently being processed. After the final tuple of that block has been
processed, the block is unpinned, and it becomes the most recently used
block.
 Buffer manager can use statistical information regarding the probability
that a request will reference a particular relation
• E.g., the data dictionary is frequently accessed. Heuristic: keep
data-dictionary blocks in main memory buffer
 Operating system or buffer manager may reorder writes
• Can lead to corruption of data structures on disk
 E.g., linked list of blocks with missing block on disk
 File systems perform consistency check to detect such situations
• Careful ordering of writes can avoid many such problems

Optimization of Disk Block Access (Cont.)
 Buffer managers support forced output of blocks for the purpose of recovery
(more in Chapter 19)
 Nonvolatile write buffers speed up disk writes by writing blocks to a non-
volatile RAM or flash buffer immediately
• Writes can be reordered to minimize disk arm movement
 Log disk – a disk devoted to writing a sequential log of block updates
• Used exactly like nonvolatile RAM
 Write to log disk is very fast since no seeks are required
 Journaling file systems write data in-order to NV-RAM or log disk
• Reordering without journaling: risk of corruption of file system data

Column-Oriented Storage
 Also known as columnar representation
 Store each attribute of a relation separately
 Example

Columnar Representation
 Benefits:
• Reduced IO if only some attributes are accessed
• Improved CPU cache performance
• Improved compression
• Vector processing on modern CPU architectures
 Drawbacks
• Cost of tuple reconstruction from columnar representation
• Cost of tuple deletion and update
• Cost of decompression
 Columnar representation found to be more efficient for decision support than
row-oriented representation
 Traditional row-oriented representation preferable for transaction processing
 Some databases support both representations
• Called hybrid row/column stores

Columnar File Representation
 ORC and Parquet: file
formats with columnar
storage inside file
 Very popular for big-data
applications
 Orc file format shown on
right:

Storage Organization in Main-Memory Databases
 Can store records directly in
memory without a buffer manager
 Column-oriented storage can be
used in-memory for decision
support applications
• Compression reduces
memory requirement

Outline
 Basic Concepts
 Ordered Indices
 B+-Tree Index Files
 B-Tree Index Files
 Hashing
 Static Hashing
 Dynamic Hashing

Basic Concepts
 Indexing mechanisms used to speed up access to desired data.
• E.g., author catalog in library
 Search Key - attribute to set of attributes used to look up records in a
file.
 An index file consists of records (called index entries) of the form
 Index files are typically much smaller than the original file
 Two basic kinds of indices:
• Ordered indices: search keys are stored in sorted order
• Hash indices: search keys are distributed uniformly across
“buckets” using a “hash function”.
search-key pointer

Index Evaluation Metrics
 Access types supported efficiently. E.g.,
• Records with a specified value in the attribute
• Records with an attribute value falling in a specified range of values.
 Access time
 Insertion time
 Deletion time
 Space overhead

Ordered Indices
 In an ordered index, index entries are stored sorted on the search key
value.
 Clustering index: in a sequentially ordered file, the index whose search
key specifies the sequential order of the file.
• Also called primary index
• The search key of a primary index is usually but not necessarily the
primary key.
 Secondary index: an index whose search key specifies an order
different from the sequential order of the file. Also called
nonclustering index.
 Index-sequential file: sequential file ordered on a search key, with a
clustering index on the search key.

Dense Index Files
 Dense index — Index record appears for every search-key value in the
file.
 E.g. index on ID attribute of instructor relation

Dense Index Files (Cont.)
 Dense index on dept_name, with instructor file sorted on dept_name

Sparse Index Files
 Sparse Index: contains index records for only some search-key
values.
• Applicable when records are sequentially ordered on search-key
 To locate a record with search-key value K we:
• Find index record with largest search-key value < K
• Search file sequentially starting at the record to which the index
record points

Sparse Index Files (Cont.)
 Compared to dense indices:
• Less space and less maintenance overhead for insertions and deletions.
• Generally slower than dense index for locating records.
 Good tradeoff:
• for clustered index: sparse index with an index entry for every block in file,
corresponding to least search-key value in the block.
• For unclustered index: sparse index on top of dense index (multilevel index)

Secondary Indices Example
 Secondary index on salary field of instructor
 Index record points to a bucket that contains pointers to all the actual
records with that particular search-key value.
 Secondary indices have to be dense

Multilevel Index
 If index does not fit in memory, access becomes expensive.
 Solution: treat index kept on disk as a sequential file and construct a
sparse index on it.
• outer index – a sparse index of the basic index
• inner index – the basic index file
 If even outer index is too large to fit in main memory, yet another level of
index can be created, and so on.
 Indices at all levels must be updated on insertion or deletion from the file.

Indices on Multiple Keys
 Composite search key
• E.g., index on instructor relation on attributes (name, ID)
• Values are sorted lexicographically
 E.g. (John, 12121) < (John, 13514) and
(John, 13514) < (Peter, 11223)
• Can query on just name, or on (name, ID)

B+-Tree Index Files (Cont.)
 All paths from root to leaf are of the same length
 Each node that is not a root or a leaf has between n/2 and n
children.
 A leaf node has between (n–1)/2 and n–1 values
 Special cases:
• If the root is not a leaf, it has at least 2 children.
• If the root is a leaf (that is, there are no other nodes in the tree), it
can have between 0 and (n–1) values.
A B+-tree is a rooted tree satisfying the following properties:

B+-Tree Node Structure
 Typical node
• Ki are the search-key values
• Pi are pointers to children (for non-leaf nodes) or pointers to records or
buckets of records (for leaf nodes).
 The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn–1
(Initially assume no duplicate keys, address duplicates later)

Leaf Nodes in B+-Trees
 For i = 1, 2, . . ., n–1, pointer Pi points to a file record with search-key value
Ki,
 If Li, Lj are leaf nodes and i < j, Li’s search-key values are less than or equal
to Lj’s search-key values
 Pn points to next leaf node in search-key order
Properties of a leaf node:

Non-Leaf Nodes in B+-Trees
 Non leaf nodes form a multi-level sparse index on the leaf nodes. For a
non-leaf node with m pointers:
• All the search-keys in the subtree to which P1 points are less than K1
• For 2  i  n – 1, all the search-keys in the subtree to which Pi points
have values greater than or equal to Ki–1 and less than Ki
• All the search-keys in the subtree to which Pn points have values
greater than or equal to Kn–1
• General structure

Example of B+-tree
 B+-tree for instructor file (n = 6)
 Leaf nodes must have between 3 and 5 values
((n–1)/2 and n –1, with n = 6).
 Non-leaf nodes other than root must have between 3 and 6
children ((n/2 and n with n =6).
 Root must have at least 2 children.

Observations about B+-trees
 Since the inter-node connections are done by pointers, “logically” close
blocks need not be “physically” close.
 The non-leaf levels of the B+-tree form a hierarchy of sparse indices.
 The B+-tree contains a relatively small number of levels
 Level below root has at least 2* n/2 values
 Next level has at least 2* n/2 * n/2 values
 .. etc.
• If there are K search-key values in the file, the tree height is no more
than  logn/2(K)
• thus searches can be conducted efficiently.
 Insertions and deletions to the main file can be handled efficiently, as the
index can be restructured in logarithmic time (as we shall see).

Queries on B+-Trees
function find(v)
1. C=root
2. while (C is not a leaf node)
1. Let i be least number s.t. V  Ki.
2. if there is no such number i then
3. Set C = last non-null pointer in C
4. else if (v = C.Ki ) Set C = Pi +1
5. else set C = C.Pi
3. if for some i, Ki = V then return C.Pi
4. else return null /* no record with search-key value v exists. */

Queries on B+-Trees (Cont.)
 Range queries find all records with search key values in a given range
• See book for details of function findRange(lb, ub) which returns set
of all such records
• Real implementations usually provide an iterator interface to fetch
matching records one at a time, using a next() function

Queries on B+-Trees (Cont.)
 If there are K search-key values in the file, the height of the tree is no
more than logn/2(K).
 A node is generally the same size as a disk block, typically 4 kilobytes
• and n is typically around 100 (40 bytes per index entry).
 With 1 million search key values and n = 100
• at most log50(1,000,000) = 4 nodes are accessed in a lookup
traversal from root to leaf.
 Contrast this with a balanced binary tree with 1 million search key values
— around 20 nodes are accessed in a lookup
• above difference is significant since every node access may need a
disk I/O, costing around 20 milliseconds

Updates on B+-Trees: Insertion (Cont.)
 Splitting a leaf node:
• take the n (search-key value, pointer) pairs (including the one being
inserted) in sorted order. Place the first n/2 in the original node, and
the rest in a new node.
• let the new node be p, and let k be the least key value in p. Insert
(k,p) in the parent of the node being split.
• If the parent is full, split it and propagate the split further up.
 Splitting of nodes proceeds upwards till a node that is not full is found.
• In the worst case the root node may be split increasing the height of
the tree by 1.
Result of splitting node containing Brandt, Califieri and Crick on inserting Adams
Next step: insert entry with (Califieri, pointer-to-new-node) into parent

B+-Tree Insertion
B+-Tree before and after insertion of “Adams”
Affected nodes

B+-Tree Insertion
B+-Tree before and after insertion of “Lamport”
Affected nodes
Affected nodes

Examples of B+-Tree Deletion
 Deleting “Srinivasan” causes merging of under-full leaves
Before and after deleting “Srinivasan”
Affected nodes

Examples of B+-Tree Deletion (Cont.)
 Leaf containing Singh and Wu became underfull, and borrowed a value
Kim from its left sibling
 Search-key value in the parent changes as a result
Before and after deleting “Singh” and “Wu”
Affected nodes

Example of B+-tree Deletion (Cont.)
 Node with Gold and Katz became underfull, and was merged with its sibling
 Parent node becomes underfull, and is merged with its sibling
• Value separating two nodes (at the parent) is pulled down when merging
 Root node then has only one child, and is deleted
Before and after deletion of “Gold”

B+-Tree File Organization
 B+-Tree File Organization:
• Leaf nodes in a B+-tree file organization store records, instead of
pointers
• Helps keep data records clustered even when there are
insertions/deletions/updates
 Leaf nodes are still required to be half full
• Since records are larger than pointers, the maximum number of
records that can be stored in a leaf node is less than the number of
pointers in a nonleaf node.
 Insertion and deletion are handled in the same way as insertion and
deletion of entries in a B+-tree index.

B+-Tree File Organization (Cont.)
 Example of B+-tree File Organization
 Good space utilization important since records use more space than
pointers.
 To improve space utilization, involve more sibling nodes in redistribution
during splits and merges
• Involving 2 siblings in redistribution (to avoid split / merge where
possible) results in each node having at least entries
 
3
/
2n

Static Hashing
 A bucket is a unit of storage containing one or more entries (a bucket
is typically a disk block).
• we obtain the bucket of an entry from its search-key value using a
hash function
 Hash function h is a function from the set of all search-key values K to
the set of all bucket addresses B.
 Hash function is used to locate entries for access, insertion as well as
deletion.
 Entries with different search-key values may be mapped to the same
bucket; thus entire bucket has to be searched sequentially to locate an
entry.
 In a hash index, buckets store entries with pointers to records
 In a hash file-organization buckets store records

Handling of Bucket Overflows
 Bucket overflow can occur because of
• Insufficient buckets
• Skew in distribution of records. This can occur due to two reasons:
 multiple records have same search-key value
 chosen hash function produces non-uniform distribution of key
values
 Although the probability of bucket overflow can be reduced, it cannot be
eliminated; it is handled by using overflow buckets.

Handling of Bucket Overflows (Cont.)
 Overflow chaining – the overflow buckets of a given bucket are chained
together in a linked list.
 Above scheme is called closed addressing (also called closed hashing
or open hashing depending on the book you use)
• An alternative, called
open addressing
(also called
open hashing or
closed hashing
depending on the
book you use) which
does not use over-
flow buckets, is not
suitable for database
applications.

Example of Hash File Organization
Hash file organization of instructor file, using dept_name as key.

Dynamic Hashing
 Periodic rehashing
• If number of entries in a hash table becomes (say) 1.5 times size of
hash table,
 create new hash table of size (say) 2 times the size of the
previous hash table
 Rehash all entries to new table
 Linear Hashing
• Do rehashing in an incremental manner
 Extendable Hashing
• Tailored to disk based hashing, with buckets shared by multiple hash
values
• Doubling of # of entries in hash table, without doubling # of buckets

Comparison of Ordered Indexing and Hashing
 Cost of periodic re-organization
 Relative frequency of insertions and deletions
 Is it desirable to optimize average access time at the expense of worst-
case access time?
 Expected type of queries:
• Hashing is generally better at retrieving records having a specified
value of the key.
• If range queries are common, ordered indices are to be preferred
 In practice:
• PostgreSQL supports hash indices, but discourages use due to poor
performance
• Oracle supports static hash organization, but not hash indices
• SQLServer supports only B+-trees

UNIT 4 – TRANSACTION MANAGEMENT

Outline
 Transaction Concept
 Transaction State
 Concurrent Executions
 Serializability
 Recoverability
 Implementation of Isolation
 Transaction Definition in SQL
 Testing for Serializability.

Transaction Concept
 A transaction is a unit of program execution that accesses and possibly
updates various data items.
 E.g., transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
 Two main issues to deal with:
• Failures of various kinds, such as hardware failures and system
crashes
• Concurrent execution of multiple transactions

Example of Fund Transfer
 Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
 Atomicity requirement
• If the transaction fails after step 3 and before step 6, money will be “lost”
leading to an inconsistent database state
 Failure could be due to software or hardware
• The system should ensure that updates of a partially executed transaction
are not reflected in the database
 Durability requirement — once the user has been notified that the transaction
has completed (i.e., the transfer of the $50 has taken place), the updates to the
database by the transaction must persist even if there are software or hardware
failures.

Example of Fund Transfer (Cont.)
 Consistency requirement in above example:
• The sum of A and B is unchanged by the execution of the transaction
 In general, consistency requirements include
• Explicitly specified integrity constraints such as primary keys and foreign
keys
• Implicit integrity constraints
 e.g., sum of balances of all accounts, minus sum of loan amounts must
equal value of cash-in-hand
• A transaction must see a consistent database.
• During transaction execution the database may be temporarily
inconsistent.
• When the transaction completes successfully the database must be
consistent
 Erroneous transaction logic can lead to inconsistency

Example of Fund Transfer (Cont.)
 Isolation requirement — if between steps 3 and 6, another transaction T2
is allowed to access the partially updated database, it will see an
inconsistent database (the sum A + B will be less than it should be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
 Isolation can be ensured trivially by running transactions serially
• That is, one after the other.
 However, executing multiple transactions concurrently has significant
benefits, as we will see later.

ACID Properties
 Atomicity. Either all operations of the transaction are properly reflected in
the database or none are.
 Consistency. Execution of a transaction in isolation preserves the
consistency of the database.
 Isolation. Although multiple transactions may execute concurrently, each
transaction must be unaware of other concurrently executing transactions.
Intermediate transaction results must be hidden from other concurrently
executed transactions.
• That is, for every pair of transactions Ti and Tj, it appears to Ti that
either Tj, finished execution before Ti started, or Tj started execution
after Ti finished.
 Durability. After a transaction completes successfully, the changes it has
made to the database persist, even if there are system failures.
A transaction is a unit of program execution that accesses and possibly
updates various data items. To preserve the integrity of data the database
system must ensure:

Transaction State
 Active – the initial state; the transaction stays in this state while it is
executing
 Partially committed – after the final statement has been executed.
 Failed -- after the discovery that normal execution can no longer proceed.
 Aborted – after the transaction has been rolled back and the database
restored to its state prior to the start of the transaction. Two options after it
has been aborted:
• Restart the transaction
 Can be done only if no internal logical error
• Kill the transaction
 Committed – after successful completion.

Concurrent Executions
 Multiple transactions are allowed to run concurrently in the system.
Advantages are:
• Increased processor and disk utilization, leading to better
transaction throughput
 E.g., one transaction can be using the CPU while another is
reading from or writing to the disk
• Reduced average response time for transactions: short transactions
need not wait behind long ones.
 Concurrency control schemes – mechanisms to achieve isolation
• That is, to control the interaction among the concurrent transactions in
order to prevent them from destroying the consistency of the database
 Will study in Chapter 15, after studying notion of correctness of
concurrent executions.

Schedules
 Schedule – a sequences of instructions that specify the chronological order
in which instructions of concurrent transactions are executed
• A schedule for a set of transactions must consist of all instructions of
those transactions
• Must preserve the order in which the instructions appear in each
individual transaction.
 A transaction that successfully completes its execution will have a commit
instructions as the last statement
• By default transaction assumed to execute commit instruction as its last
step
 A transaction that fails to successfully complete its execution will have an
abort instruction as the last statement

Schedule 1
 Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from
A to B.
 A serial schedule in which T1 is followed by T2 :

Schedule 2
 A serial schedule where T2 is followed by T1

Schedule 3
 Let T1 and T2 be the transactions defined previously. The following
schedule is not a serial schedule, but it is equivalent to Schedule 1
 In Schedules 1, 2 and 3, the sum A + B is preserved.

Schedule 4
 The following concurrent schedule does not preserve the value of (A + B ).

Serializability
 Basic Assumption – Each transaction preserves database consistency.
 Thus, serial execution of a set of transactions preserves database
consistency.
 A (possibly concurrent) schedule is serializable if it is equivalent to a serial
schedule. Different forms of schedule equivalence give rise to the notions of:
1. Conflict serializability
2. View serializability

Conflicting Instructions
 Instructions li and lj of transactions Ti and Tj respectively, conflict if and
only if there exists some item Q accessed by both li and lj, and at least one
of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
 Intuitively, a conflict between li and lj forces a (logical) temporal order
between them.
 If li and lj are consecutive in a schedule and they do not conflict, their
results would remain the same even if they had been interchanged in the
schedule.

Conflict Serializability
 If a schedule S can be transformed into a schedule S’ by a series of swaps
of non-conflicting instructions, we say that S and S’ are conflict
equivalent.
 We say that a schedule S is conflict serializable if it is conflict equivalent
to a serial schedule

Conflict Serializability (Cont.)
 Schedule 3 can be transformed into Schedule 6, a serial schedule where T2
follows T1, by series of swaps of non-conflicting instructions. Therefore
Schedule 3 is conflict serializable.
Schedule 3 Schedule 6

Conflict Serializability (Cont.)
 Example of a schedule that is not conflict serializable:
 We are unable to swap instructions in the above schedule to obtain either
the serial schedule < T3, T4 >, or the serial schedule < T4, T3 >.

View Serializability
 Let S and S’ be two schedules with the same set of transactions. S and S’
are view equivalent if the following three conditions are met, for each data
item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then in
schedule S’ also transaction Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes read(Q), and that value was
produced by transaction Tj (if any), then in schedule S’ also
transaction Ti must read the value of Q that was produced by the
same write(Q) operation of transaction Tj .
3. The transaction (if any) that performs the final write(Q) operation in
schedule S must also perform the final write(Q) operation in schedule S’.
 As can be seen, view equivalence is also based purely on reads and writes
alone.

View Serializability (Cont.)
 A schedule S is view serializable if it is view equivalent to a serial
schedule.
 Every conflict serializable schedule is also view serializable.
 Below is a schedule which is view-serializable but not conflict serializable.
 What serial schedule is above equivalent to?
 Every view serializable schedule that is not conflict serializable has blind
writes.

Other Notions of Serializability
 The schedule below produces same outcome as the serial schedule
< T1, T5 >, yet is not conflict equivalent or view equivalent to it.
 Determining such equivalence requires analysis of operations other
than read and write.

Testing for Serializability
 Consider some schedule of a set of transactions T1, T2, ..., Tn
 Precedence graph — a direct graph where the vertices are the
transactions (names).
 We draw an arc from Ti to Tj if the two transaction conflict, and Ti
accessed the data item on which the conflict arose earlier.
 We may label the arc by the item that was accessed.
 Example of a precedence graph

Test for Conflict Serializability
 A schedule is conflict serializable if and only if
its precedence graph is acyclic.
 Cycle-detection algorithms exist which take
order n2 time, where n is the number of
vertices in the graph.
• (Better algorithms take order n + e where
e is the number of edges.)
 If precedence graph is acyclic, the
serializability order can be obtained by a
topological sorting of the graph.
• This is a linear order consistent with the
partial order of the graph.
• For example, a serializability order for
Schedule A would be
T5  T1  T3  T2  T4
 Are there others?

Recoverable Schedules
 Recoverable schedule — if a transaction Tj reads a data item previously
written by a transaction Ti , then the commit operation of Ti appears before
the commit operation of Tj.
 The following schedule (Schedule 11) is not recoverable
 If T8 should abort, T9 would have read (and possibly shown to the user) an
inconsistent database state. Hence, database must ensure that schedules
are recoverable.
Need to address the effect of transaction failures on concurrently
running transactions.

Cascading Rollbacks
 Cascading rollback – a single transaction failure leads to a series of
transaction rollbacks. Consider the following schedule where none of the
transactions has yet committed (so the schedule is recoverable)
If T10 fails, T11 and T12 must also be rolled back.
 Can lead to the undoing of a significant amount of work

Cascadeless Schedules
 Cascadeless schedules — cascading rollbacks cannot occur;
• For each pair of transactions Ti and Tj such that Tj reads a data item
previously written by Ti, the commit operation of Ti appears before the
read operation of Tj.
 Every Cascadeless schedule is also recoverable
 It is desirable to restrict the schedules to those that are cascadeless

Concurrency Control
 A database must provide a mechanism that will ensure that all possible
schedules are
• either conflict or view serializable, and
• are recoverable and preferably cascadeless
 A policy in which only one transaction can execute at a time generates
serial schedules, but provides a poor degree of concurrency
• Are serial schedules recoverable/cascadeless?
 Testing a schedule for serializability after it has executed is a little too late!
 Goal – to develop concurrency control protocols that will assure
serializability.

Concurrency Control (Cont.)
 Schedules must be conflict or view serializable, and recoverable, for the
sake of database consistency, and preferably cascadeless.
 A policy in which only one transaction can execute at a time generates
serial schedules, but provides a poor degree of concurrency.
 Concurrency-control schemes tradeoff between the amount of concurrency
they allow and the amount of overhead that they incur.
 Some schemes allow only conflict-serializable schedules to be generated,
while others allow view-serializable schedules that are not conflict-
serializable.

Outline
 Lock-Based Protocols
 Timestamp-Based Protocols
 Validation-Based Protocols
 Multiple Granularity
 Multiversion Schemes
 Insert and Delete Operations
 Concurrency in Index Structures

Lock-Based Protocols
 A lock is a mechanism to control concurrent access to a data item
 Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well as
written. X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is
requested using lock-S instruction.
 Lock requests are made to concurrency-control manager. Transaction can
proceed only after request is granted.

Lock-Based Protocols (Cont.)
 Lock-compatibility matrix
 A transaction may be granted a lock on an item if the requested lock is
compatible with locks already held on the item by other transactions
 Any number of transactions can hold shared locks on an item,
 But if any transaction holds an exclusive on the item no other transaction
may hold any lock on the item.

Schedule With Lock Grants
 Grants omitted in rest of
chapter
• Assume grant
happens just before
the next instruction
following lock
request
 This schedule is not
serializable (why?)
 A locking protocol is a
set of rules followed by
all transactions while
requesting and releasing
locks.
 Locking protocols
enforce serializability by
restricting the set of
possible schedules.

Deadlock
 Consider the partial schedule
 Neither T3 nor T4 can make progress — executing lock-S(B) causes T4
to wait for T3 to release its lock on B, while executing lock-X(A) causes
T3 to wait for T4 to release its lock on A.
 Such a situation is called a deadlock.
• To handle a deadlock one of T3 or T4 must be rolled back
and its locks released.

Deadlock (Cont.)
 The potential for deadlock exists in most locking protocols. Deadlocks are
a necessary evil.
 Starvation is also possible if concurrency control manager is badly
designed. For example:
• A transaction may be waiting for an X-lock on an item, while a
sequence of other transactions request and are granted an S-lock on
the same item.
• The same transaction is repeatedly rolled back due to deadlocks.
 Concurrency control manager can be designed to prevent starvation.

The Two-Phase Locking Protocol
 A protocol which ensures conflict-
serializable schedules.
 Phase 1: Growing Phase
• Transaction may obtain locks
• Transaction may not release locks
 Phase 2: Shrinking Phase
• Transaction may release locks
• Transaction may not obtain locks
 The protocol assures serializability. It can be
proved that the transactions can be
serialized in the order of their lock points
(i.e., the point where a transaction acquired
its final lock).
Time
Locks

The Two-Phase Locking Protocol (Cont.)
 Two-phase locking does not ensure freedom from deadlocks
 Extensions to basic two-phase locking needed to ensure recoverability of
freedom from cascading roll-back
• Strict two-phase locking: a transaction must hold all its exclusive
locks till it commits/aborts.
 Ensures recoverability and avoids cascading roll-backs
• Rigorous two-phase locking: a transaction must hold all locks till
commit/abort.
 Transactions can be serialized in the order in which they commit.
 Most databases implement rigorous two-phase locking, but refer to it as
simply two-phase locking

The Two-Phase Locking Protocol (Cont.)
 Two-phase locking is not a necessary
condition for serializability
• There are conflict serializable
schedules that cannot be obtained
if the two-phase locking protocol is
used.
 In the absence of extra information
(e.g., ordering of access to data), two-
phase locking is necessary for conflict
serializability in the following sense:
• Given a transaction Ti that does
not follow two-phase locking, we
can find a transaction Tj that uses
two-phase locking, and a schedule
for Ti and Tj that is not conflict
serializable.

Locking Protocols
 Given a locking protocol (such as 2PL)
• A schedule S is legal under a locking protocol if it can be generated
by a set of transactions that follow the protocol
• A protocol ensures serializability if all legal schedules under that
protocol are serializable

Lock Conversions
 Two-phase locking protocol with lock conversions:
– Growing Phase:
• can acquire a lock-S on item
• can acquire a lock-X on item
• can convert a lock-S to a lock-X (upgrade)
– Shrinking Phase:
• can release a lock-S
• can release a lock-X
• can convert a lock-X to a lock-S (downgrade)
 This protocol ensures serializability

Deadlock Handling
 System is deadlocked if there is a set of transactions such that every
transaction in the set is waiting for another transaction in the set.

Deadlock Handling
 Deadlock prevention protocols ensure that the system will never enter
into a deadlock state. Some prevention strategies:
• Require that each transaction locks all its data items before it begins
execution (pre-declaration).
• Impose partial ordering of all data items and require that a
transaction can lock data items only in the order specified by the
partial order (graph-based protocol).

More Deadlock Prevention Strategies
 wait-die scheme — non-preemptive
• Older transaction may wait for younger one to release data item.
• Younger transactions never wait for older ones; they are rolled back
instead.
• A transaction may die several times before acquiring a lock
 wound-wait scheme — preemptive
• Older transaction wounds (forces rollback) of younger transaction
instead of waiting for it.
• Younger transactions may wait for older ones.
• Fewer rollbacks than wait-die scheme.
 In both schemes, a rolled back transactions is restarted with its original
timestamp.
• Ensures that older transactions have precedence over newer ones,
and starvation is thus avoided.

Deadlock prevention (Cont.)
 Timeout-Based Schemes:
• A transaction waits for a lock only for a specified amount of time. After
that, the wait times out and the transaction is rolled back.
• Ensures that deadlocks get resolved by timeout if they occur
• Simple to implement
• But may roll back transaction unnecessarily in absence of deadlock
 Difficult to determine good value of the timeout interval.
• Starvation is also possible

Deadlock Detection
 Wait-for graph
• Vertices: transactions
• Edge from Ti Tj. : if Ti is waiting for a lock held in conflicting mode
byTj
 The system is in a deadlock state if and only if the wait-for graph has a
cycle.
 Invoke a deadlock-detection algorithm periodically to look for cycles.
Wait-for graph without a cycle Wait-for graph with a cycle

Deadlock Recovery
 When deadlock is detected :
• Some transaction will have to rolled back (made a victim) to break
deadlock cycle.
 Select that transaction as victim that will incur minimum cost
• Rollback -- determine how far to roll back transaction
 Total rollback: Abort the transaction and then restart it.
 Partial rollback: Roll back victim transaction only as far as
necessary to release locks that another transaction in cycle is
waiting for
 Starvation can happen (why?)
• One solution: oldest transaction in the deadlock set is never chosen
as victim

Multiple Granularity
 Allow data items to be of various sizes and define a hierarchy of data
granularities, where the small granularities are nested within larger ones
 Can be represented graphically as a tree (but don't confuse with tree-
locking protocol)
 When a transaction locks a node in the tree explicitly, it implicitly locks all
the node's descendants in the same mode.
 Granularity of locking (level in tree where locking is done):
• Fine granularity (lower in tree): high concurrency, high locking
overhead
• Coarse granularity (higher in tree): low locking overhead, low
concurrency

Example of Granularity Hierarchy
The levels, starting from the coarsest (top) level are
• database
• area
• file
• record

Example of Granularity Hierarchy
 The levels, starting from the coarsest (top) level are
• database
• area
• file
• record
 The corresponding tree

Intention Lock Modes
 In addition to S and X lock modes, there are three additional lock modes
with multiple granularity:
• intention-shared (IS): indicates explicit locking at a lower level of the
tree but only with shared locks.
• intention-exclusive (IX): indicates explicit locking at a lower level with
exclusive or shared locks
• shared and intention-exclusive (SIX): the subtree rooted by that
node is locked explicitly in shared mode and explicit locking is being
done at a lower level with exclusive-mode locks.
 Intention locks allow a higher level node to be locked in S or X mode
without having to check all descendent nodes.

Compatibility Matrix with Intention Lock Modes
 The compatibility matrix for all lock modes is:

Outline
 Failure Classification
 Storage Structure
 Recovery and Atomicity
 Log-Based Recovery
 Remote Backup Systems

Failure Classification
 Transaction failure :
• Logical errors: transaction cannot complete due to some internal
error condition
• System errors: the database system must terminate an active
transaction due to an error condition (e.g., deadlock)
 System crash: a power failure or other hardware or software failure
causes the system to crash.
• Fail-stop assumption: non-volatile storage contents are assumed to
not be corrupted by system crash
 Database systems have numerous integrity checks to prevent
corruption of disk data
 Disk failure: a head crash or similar disk failure destroys all or part of disk
storage
• Destruction is assumed to be detectable: disk drives use checksums to
detect failures

Recovery Algorithms
 Suppose transaction Ti transfers $50 from account A to account B
• Two updates: subtract 50 from A and add 50 to B
 Transaction Ti requires updates to A and B to be output to the database.
• A failure may occur after one of these modifications have been made
but before both of them are made.
• Modifying the database without ensuring that the transaction will
commit may leave the database in an inconsistent state
• Not modifying the database may result in lost updates if failure occurs
just after transaction commits
 Recovery algorithms have two parts
1. Actions taken during normal transaction processing to ensure enough
information exists to recover from failures
2. Actions taken after a failure to recover the database contents to a state
that ensures atomicity, consistency and durability

Storage Structure
 Volatile storage:
• Does not survive system crashes
• Examples: main memory, cache memory
 Nonvolatile storage:
• Survives system crashes
• Examples: disk, tape, flash memory, non-volatile RAM
• But may still fail, losing data
 Stable storage:
• A mythical form of storage that survives all failures
• Approximated by maintaining multiple copies on distinct nonvolatile
media
• See book for more details on how to implement stable storage

Stable-Storage Implementation
 Maintain multiple copies of each block on separate disks
• copies can be at remote sites to protect against disasters such as fire
or flooding.
 Failure during data transfer can still result in inconsistent copies: Block
transfer can result in
• Successful completion
• Partial failure: destination block has incorrect information
• Total failure: destination block was never updated
 Protecting storage media from failure during data transfer (one solution):
• Execute output operation as follows (assuming two copies of each
block):
1. Write the information onto the first physical block.
2. When the first write successfully completes, write the same
information onto the second physical block.
3. The output is completed only after the second write successfully
completes.

Protecting storage media from failure (Cont.)
 Copies of a block may differ due to failure during output operation.
 To recover from failure:
1. First find inconsistent blocks:
1. Expensive solution: Compare the two copies of every disk block.
2. Better solution:
• Record in-progress disk writes on non-volatile storage (Flash,
Non-volatile RAM or special area of disk).
• Use this information during recovery to find blocks that may
be inconsistent, and only compare copies of these.
• Used in hardware RAID systems
2. If either copy of an inconsistent block is detected to have an error
(bad checksum), overwrite it by the other copy. If both have no error,
but are different, overwrite the second block by the first block.

Data Access
 Physical blocks are those blocks residing on the disk.
 Buffer blocks are the blocks residing temporarily in main memory.
 Block movements between disk and main memory are initiated through
the following two operations:
• input (B) transfers the physical block B to main memory.
• output (B) transfers the buffer block B to the disk, and replaces the
appropriate physical block there.
 We assume, for simplicity, that each data item fits in, and is stored inside,
a single block.

Data Access (Cont.)
 Each transaction Ti has its private work-area in which local copies of all
data items accessed and updated by it are kept.
• Ti 's local copy of a data item X is called xi.
 Transferring data items between system buffer blocks and its private work-
area done by:
• read(X) assigns the value of data item X to the local variable xi.
• write(X) assigns the value of local variable xi to data item {X} in the
buffer block.
• Note: output(BX) need not immediately follow write(X). System can
perform the output operation when it deems fit.
 Transactions
• Must perform read(X) before accessing X for the first time (subsequent
reads can be from local copy)
• write(X) can be executed at any time before the transaction commits

Recovery and Atomicity
 To ensure atomicity despite failures, we first output information describing
the modifications to stable storage without modifying the database itself.
 We study log-based recovery mechanisms in detail
• We first present key concepts
• And then present the actual recovery algorithm
 Less used alternative: shadow-copy and shadow-paging (brief details in
book)
shadow-copy

Log-Based Recovery
 A log is a sequence of log records. The records keep information about
update activities on the database.
• The log is kept on stable storage
 When transaction Ti starts, it registers itself by writing a
<Ti start> log record
 Before Ti executes write(X), a log record
<Ti, X, V1, V2>
is written, where V1 is the value of X before the write (the old
value), and V2 is the value to be written to X (the new value).
 When Ti finishes it last statement, the log record <Ti commit> is written.
 Two approaches using logs
• Immediate database modification
• Deferred database modification.

Immediate Database Modification
 The immediate-modification scheme allows updates of an
uncommitted transaction to be made to the buffer, or the disk itself,
before the transaction commits
 Update log record must be written before database item is written
• We assume that the log record is output directly to stable storage
• (Will see later that how to postpone log record output to some
extent)
 Output of updated blocks to disk can take place at any time before or
after transaction commit
 Order in which blocks are output can be different from the order in which
they are written.
 The deferred-modification scheme performs updates to buffer/disk only
at the time of transaction commit
• Simplifies some aspects of recovery
• But has overhead of storing local copy

Transaction Commit
 A transaction is said to have committed when its commit log record is
output to stable storage
• All previous log records of the transaction must have been output
already
 Writes performed by a transaction may still be in the buffer when the
transaction commits, and may be output later

Immediate Database Modification Example
Log Write Output
<T0 start>
<T0, A, 1000, 950>
<T0, B, 2000, 2050>
A = 950
B = 2050
<T0 commit>
<T1 start>
<T1, C, 700, 600>
C = 600
BB , BC
<T1 commit>
BA
 Note: BX denotes block containing X.
BC output before T1
commits
BA output after T0
commits

Concurrency Control and Recovery
 With concurrent transactions, all transactions share a single disk buffer and
a single log
• A buffer block can have data items updated by one or more
transactions
 We assume that if a transaction Ti has modified an item, no other
transaction can modify the same item until Ti has committed or aborted
• i.e., the updates of uncommitted transactions should not be visible to
other transactions
 Otherwise, how to perform undo if T1 updates A, then T2 updates A
and commits, and finally T1 has to abort?
• Can be ensured by obtaining exclusive locks on updated items and
holding the locks till end of transaction (strict two-phase locking)
 Log records of different transactions may be interspersed in the log.

Undo and Redo Operations
 Undo and Redo of Transactions
• undo(Ti) -- restores the value of all data items updated by Ti to their
old values, going backwards from the last log record for Ti
 Each time a data item X is restored to its old value V a special log
record <Ti , X, V> is written out
 When undo of a transaction is complete, a log record
<Ti abort> is written out.
• redo(Ti) -- sets the value of all data items updated by Ti to the new
values, going forward from the first log record for Ti
 No logging is done in this case

Recovering from Failure
 When recovering after failure:
• Transaction Ti needs to be undone if the log
 Contains the record <Ti start>,
 But does not contain either the record <Ti commit> or <Ti abort>.
• Transaction Ti needs to be redone if the log
 Contains the records <Ti start>
 And contains the record <Ti commit> or <Ti abort>

Recovering from Failure (Cont.)
 Suppose that transaction Ti was undone earlier and the <Ti abort> record
was written to the log, and then a failure occurs,
 On recovery from failure transaction Ti is redone
• Such a redo redoes all the original actions of transaction Ti including
the steps that restored old values
 Known as repeating history
 Seems wasteful, but simplifies recovery greatly

Checkpoints
 Redoing/undoing all transactions recorded in the log can be very slow
• Processing the entire log is time-consuming if the system has run for a
long time
• We might unnecessarily redo transactions which have already output
their updates to the database.
 Streamline recovery procedure by periodically performing checkpointing
1. Output all log records currently residing in main memory onto stable
storage.
2. Output all modified buffer blocks to the disk.
3. Write a log record < checkpoint L> onto stable storage where L is a
list of all transactions active at the time of checkpoint.
4. All updates are stopped while doing checkpointing

Checkpoints (Cont.)
 During recovery we need to consider only the most recent transaction Ti
that started before the checkpoint, and transactions that started after Ti.
• Scan backwards from end of log to find the most recent <checkpoint
L> record
• Only transactions that are in L or started after the checkpoint need to
be redone or undone
• Transactions that committed or aborted before the checkpoint
already have all their updates output to stable storage.
 Some earlier part of the log may be needed for undo operations
• Continue scanning backwards till a record <Ti start> is found for
every transaction Ti in L.
• Parts of log prior to earliest <Ti start> record above are not needed
for recovery, and can be erased whenever desired.

Example of Checkpoints
 T1 can be ignored (updates already output to disk due to
checkpoint)
 T2 and T3 redone.
 T4 undone

Recovery Algorithm
 Logging (during normal operation):
• <Ti start> at transaction start
• <Ti, Xj, V1, V2> for each update, and
• <Ti commit> at transaction end
 Transaction rollback (during normal operation)
• Let Ti be the transaction to be rolled back
• Scan log backwards from the end, and for each log record of Ti of the
form <Ti, Xj, V1, V2>
 Perform the undo by writing V1 to Xj,
 Write a log record <Ti , Xj, V1>
• such log records are called compensation log records
• Once the record <Ti start> is found stop the scan and write the log
record <Ti abort>

Recovery Algorithm (Cont.)
 Recovery from failure: Two phases
• Redo phase: replay updates of all transactions, whether they
committed, aborted, or are incomplete
• Undo phase: undo all incomplete transactions
 Redo phase:
1. Find last <checkpoint L> record, and set undo-list to L.
2. Scan forward from above <checkpoint L> record
1. Whenever a record <Ti, Xj, V1, V2> or <Ti, Xj, V2> is found, redo
it by writing V2 to Xj
2. Whenever a log record <Ti start> is found, add Ti to undo-list
3. Whenever a log record <Ti commit> or <Ti abort> is found,
remove Ti from undo-list

Recovery Algorithm (Cont.)
 Undo phase:
1. Scan log backwards from end
1. Whenever a log record <Ti, Xj, V1, V2> is found where Ti is in
undo-list perform same actions as for transaction rollback:
1. perform undo by writing V1 to Xj.
2. write a log record <Ti , Xj, V1>
2. Whenever a log record <Ti start> is found where Ti is in undo-list,
1. Write a log record <Ti abort>
2. Remove Ti from undo-list
3. Stop when undo-list is empty
1. i.e., <Ti start> has been found for every transaction in undo-list
 After undo phase completes, normal transaction processing can commence

Database Buffering
 Database maintains an in-memory buffer of data blocks
• When a new block is needed, if buffer is full an existing block needs to
be removed from buffer
• If the block chosen for removal has been updated, it must be output to
disk
 The recovery algorithm supports the no-force policy: i.e., updated blocks
need not be written to disk when transaction commits
• force policy: requires updated blocks to be written at commit
 More expensive commit
 The recovery algorithm supports the steal policy: i.e., blocks containing
updates of uncommitted transactions can be written to disk, even before the
transaction commits

Database Buffering (Cont.)
 If a block with uncommitted updates is output to disk, log records with
undo information for the updates are output to the log on stable storage
first
• (Write ahead logging)
 No updates should be in progress on a block when it is output to disk. Can
be ensured as follows.
• Before writing a data item, transaction acquires exclusive lock on
block containing the data item
• Lock can be released once the write is completed.
 Such locks held for short duration are called latches.
 To output a block to disk
1. First acquire an exclusive latch on the block
 Ensures no update can be in progress on the block
2. Then perform a log flush
3. Then output the block to disk
4. Finally release the latch on the block

Failure with Loss of Nonvolatile Storage
 So far we assumed no loss of non-volatile storage
 Technique similar to checkpointing used to deal with loss of non-volatile
storage
• Periodically dump the entire content of the database to stable
storage
• No transaction may be active during the dump procedure; a
procedure similar to checkpointing must take place
 Output all log records currently residing in main memory onto
stable storage.
 Output all buffer blocks onto the disk.
 Copy the contents of the database to stable storage.
 Output a record <dump> to log on stable storage.

Recovering from Failure of Non-Volatile Storage
 To recover from disk failure
• restore database from most recent dump.
• Consult the log and redo all transactions that committed after the dump
 Can be extended to allow transactions to be active during dump;
known as fuzzy dump or online dump
• Similar to fuzzy checkpointing

ARIES
 ARIES is a state of the art recovery method
• Incorporates numerous optimizations to reduce overheads during
normal processing and to speed up recovery
• The recovery algorithm we studied earlier is modeled after ARIES, but
greatly simplified by removing optimizations
 Unlike the recovery algorithm described earlier, ARIES
1. Uses log sequence number (LSN) to identify log records
 Stores LSNs in pages to identify what updates have already been
applied to a database page
2. Physiological redo
3. Dirty page table to avoid unnecessary redos during recovery
4. Fuzzy checkpointing that only records information about dirty pages,
and does not require dirty pages to be written out at checkpoint time
 More coming up on each of the above …

ARIES Data Structures: Log Record
 Each log record contains LSN of previous log record of the same
transaction
• LSN in log record may be implicit
 Special redo-only log record called compensation log record (CLR) used
to log actions taken during recovery that never need to be undone
• Serves the role of operation-abort log records used in earlier recovery
algorithm
• Has a field UndoNextLSN to note next (earlier) record to be undone
 Records in between would have already been undone
 Required to avoid repeated undo of already undone actions
LSN TransID PrevLSN RedoInfo UndoInfo
LSN TransID UndoNextLSN RedoInfo

ARIES Recovery Algorithm
ARIES recovery involves three passes
 Analysis pass: Determines
• Which transactions to undo
• Which pages were dirty (disk version not up to date) at time of crash
• RedoLSN: LSN from which redo should start
 Redo pass:
• Repeats history, redoing all actions from RedoLSN
 RecLSN and PageLSNs are used to avoid redoing actions already
reflected on page
 Undo pass:
• Rolls back all incomplete transactions
 Transactions whose abort was complete earlier are not undone
• Key idea: no need to undo these transactions: earlier undo
actions were logged, and are redone as required

UNIT V – DATABASE APPLICAITONS

Centralized Database Systems
 Run on a single computer system
 Single-user system
 Multi-user systems also known as server systems.
• Service requests received from client systems
• Multi-core systems with coarse-grained parallelism
 Typically, a few to tens of processor cores
 In contrast, fine-grained parallelism uses very large number
of computers

Speed-Up and Scale-Up
 Speedup: a fixed-sized problem executing on a small system is given to a
system which is N-times larger.
• Measured by:
speedup = small system elapsed time
large system elapsed time
• Speedup is linear if equation equals N.
 Scaleup: increase the size of both the problem and the system
• N-times larger system used to perform N-times larger job
• Measured by:
scaleup = small system small problem elapsed time
big system big problem elapsed time
• Scale up is linear if equation equals 1.

Distributed Systems
 Data spread over multiple machines (also referred to as sites or nodes).
 Local-area networks (LANs)
 Wide-area networks (WANs)
• Higher latency
site A site C
site B
communication
via network
network

Distributed Databases
 Homogeneous distributed databases
• Same software/schema on all sites, data may be partitioned among
sites
• Goal: provide a view of a single database, hiding details of distribution
 Heterogeneous distributed databases
• Different software/schema on different sites
• Goal: integrate existing databases to provide useful functionality
 Differentiate between local transactions and global transactions
• A local transaction accesses data in the single site at which the
transaction was initiated.
• A global transaction either accesses data in a site different from the
one at which the transaction was initiated or accesses data in several
different sites.

Data Integration and Distributed Databases
 Data integration between multiple distributed databases
 Benefits:
• Sharing data – users at one site able to access the data residing at
some other sites.
• Autonomy – each site is able to retain a degree of control over data
stored locally.

Availability
 Network partitioning
 Availability of system
• If all nodes are required for system to function, failure of even one
node stops system functioning.
• Higher system availability through redundancy
 data can be replicated at remote sites, and system can function
even if a site fails.

Implementation Issues for Distributed
Databases
 Atomicity needed even for transactions that update data at multiple sites
 The two-phase commit protocol (2PC) is used to ensure atomicity
• Basic idea: each site executes transaction until just before commit, and
the leaves final decision to a coordinator
• Each site must follow decision of coordinator, even if there is a failure
while waiting for coordinators decision
 2PC is not always appropriate: other transaction models based on
persistent messaging, and workflows, are also used
 Distributed concurrency control (and deadlock detection) required
 Data items may be replicated to improve data availability
 Details of all above in Chapter 24

Cloud Based Services
 Cloud computing widely adopted today
• On-demand provisioning and elasticity
 ability to scale up at short notice and to release of unused
resources for use by others
 Infrastructure as a service
• Virtual machines/real machines
 Platform as a service
• Storage, databases, application server
 Software as a service
• Enterprise applications, emails, shared documents, etc,
 Potential drawbacks
• Security
• Network bandwidth

Application Deployment Alternatives
Individual Machines Virtual Machines Containers
(e.g. VMWare, KVM, ..) (e.g. Docker)

Application Deployment Architectures
 Services
 Microservice Architecture
• Application uses a variety of services
• Service can add or remove instances as required
 Kubernetes supports containers, and microservices

Outline
 Complex Data Types and Object Orientation
 Structured Data Types and Inheritance in SQL
 Table Inheritance
 Array and Multiset Types in SQL
 Object Identity and Reference Types in SQL
 Implementing O-R Features
 Persistent Programming Languages
 Comparison of Object-Oriented and Object-Relational Databases

Object-Relational Data Models
 Extend the relational data model by including object orientation and
constructs to deal with added data types.
 Allow attributes of tuples to have complex types, including non-atomic
values such as nested relations.
 Preserve relational foundations, in particular the declarative access to
data, while extending modeling power.
 Upward compatibility with existing relational languages.

Complex Data Types
 Motivation:
• Permit non-atomic domains (atomic  indivisible)
• Example of non-atomic domain: set of integers, or set of tuples
• Allows more intuitive modeling for applications with complex data
 Intuitive definition:
• Allow relations whenever we allow atomic (scalar) values — relations
within relations
• Retains mathematical foundation of relational model
• Violates first normal form.

Example of a Nested Relation
 Example: library information system
 Each book has
• Title,
• A list (array) of authors,
• Publisher, with subfields name and branch, and
• A set of keywords
 Non-1NF relation books

Structured Types and Inheritance in SQL
 Structured types (a.k.a. user-defined types) can be declared and used in
SQL
create type Name as
(firstname varchar(20),
lastname varchar(20))
final
create type Address as
(street varchar(20),
city varchar(20),
zipcode varchar(20))
not final
• Note: final and not final indicate whether subtypes can be created
 Structured types can be used to create tables with composite attributes
create table person (
name Name,
address Address,
dateOfBirth date)
 Dot notation used to reference components: name.firstname

Structured Types (cont.)
 User-defined row types
create type PersonType as (
name Name,
address Address,
dateOfBirth date)
not final
 Can then create a table whose rows are a user-defined type
create table customer of CustomerType
 Alternative using unnamed row types.
create table person_r(
name row(firstname varchar(20),
lastname varchar(20)),
address row(street varchar(20),
city varchar(20),
zipcode varchar(20)),
dateOfBirth date)

Methods
 Can add a method declaration with a structured type.
method ageOnDate (onDate date)
returns interval year
 Method body is given separately.
create instance method ageOnDate (onDate date)
returns interval year
for CustomerType
begin
return onDate - self.dateOfBirth;
end
 We can now find the age of each customer:
select name.lastname, ageOnDate (current_date)
from customer

Object-Identity and Reference Types
 Define a type Department with a field name and a field head which is a
reference to the type Person, with table people as scope:
create type Department (
name varchar (20),
head ref (Person) scope people)
 We can then create a table departments as follows
create table departments of Department
 We can omit the declaration scope people from the type declaration and
instead make an addition to the create table statement:
create table departments of Department
(head with options scope people)
 Referenced table must have an attribute that stores the identifier, called
the self-referential attribute
create table people of Person
ref is person_id system generated;

©Silberschatz, Korth and Sudarshan
29.9
Database System Concepts - 7th Edition
Path Expressions
 Find the names and addresses of the heads of all departments:
select head –>name, head –>address
from departments
 An expression such as “head–>name” is called a path expression
 Path expressions help avoid explicit joins
• If department head were not a reference, a join of departments with
people would be required to get at the address
• Makes expressing the query much easier for the user

Implementing O-R Features
 Similar to how E-R features are mapped onto relation schemas
 Subtable implementation
• Each table stores primary key and those attributes defined in that
table
or,
• Each table stores both locally defined and inherited attributes

Persistent Programming Languages
 Languages extended with constructs to handle persistent data
 Programmer can manipulate persistent data directly
• no need to fetch it into memory and store it back to disk (unlike
embedded SQL)
 Persistent objects:
• Persistence by class - explicit declaration of persistence
• Persistence by creation - special syntax to create persistent objects
• Persistence by marking - make objects persistent after creation
• Persistence by reachability - object is persistent if it is declared
explicitly to be so or is reachable from a persistent object

Comparison of O-O and O-R Databases
 Relational systems
• simple data types, powerful query languages, high protection.
 Persistent-programming-language-based OODBs
• complex data types, integration with programming language, high
performance.
 Object-relational systems
• complex data types, powerful query languages, high protection.
 Object-relational mapping systems
• complex data types integrated with programming language, but built as
a layer on top of a relational database system
 Note: Many real systems blur these boundaries
• E.g., persistent programming language built as a wrapper on a
relational database offers first two benefits, but may have poor
performance.

Outline
 Structure of XML Data
 XML Document Schema
 Querying and Transformation
 Application Program Interfaces to XML
 Storage of XML Data
 XML Applications

Introduction
 XML: Extensible Markup Language
 Defined by the WWW Consortium (W3C)
 Derived from SGML (Standard Generalized Markup Language), but
simpler to use than SGML
 Documents have tags giving extra information about sections of the
document
• E.g., <title> XML </title> <slide> Introduction …</slide>
 Extensible, unlike HTML
• Users can add new tags, and separately specify how the tag should
be handled for display

XML Introduction (Cont.)
 The ability to specify new tags, and to create nested tag structures make
XML a great way to exchange data, not just documents.
• Much of the use of XML has been in data exchange applications, not
as a replacement for HTML
 Tags make data (relatively) self-documenting
• E.g.,
<university>
<department>
<dept_name> Comp. Sci. </dept_name>
<building> Taylor </building>
<budget> 100000 </budget>
</department>
<course>
<course_id> CS-101 </course_id>
<title> Intro. to Computer Science </title>
<dept_name> Comp. Sci </dept_name>
<credits> 4 </credits>
</course>
</university>

Comparison with Relational Data
 Inefficient: tags, which in effect represent schema information, are
repeated
 Better than relational tuples as a data-exchange format
• Unlike relational tuples, XML data is self-documenting due to presence
of tags
• Non-rigid format: tags can be added
• Allows nested structures
• Wide acceptance, not only in database systems, but also in browsers,
tools, and applications

Structure of XML Data
 Tag: label for a section of data
 Element: section of data beginning with <tagname> and ending with
matching </tagname>
 Elements must be properly nested
• Proper nesting
 <course> … <title> …. </title> </course>
• Improper nesting
 <course> … <title> …. </course> </title>
• Formally: every start tag must have a unique matching end tag, that is
in the context of the same parent element.
 Every document must have a single top-level element

30.6
Example of Nested Elements
<purchase_order>
<identifier> P-101 </identifier>
<purchaser> …. </purchaser>
<itemlist>
<item>
<identifier> RS1 </identifier>
<description> Atom powered rocket sled </description>
<quantity> 2 </quantity>
<price> 199.95 </price>
</item>
<item>
<identifier> SG2 </identifier>
<description> Superb glue </description>
<quantity> 1 </quantity>
<unit-of-measure> liter </unit-of-measure>
<price> 29.95 </price>
</item>
</itemlist>
</purchase_order>

30.7
Structure of XML Data (Cont.)
 Mixture of text with sub-elements is legal in XML.
• Example:
<course>
This course is being offered for the first time in 2009.
<course id> BIO-399 </course id>
<title> Computational Biology </title>
<dept name> Biology </dept name>
</course>
• Useful for document markup, but discouraged for data representation

30.8
Attributes
 Elements can have attributes
<course course_id= “CS-101”>
<title> Intro. to Computer Science</title>
<dept name> Comp. Sci. </dept name>
</course>
 Attributes are specified by name=value pairs inside the starting tag of an
element
 An element may have several attributes, but each attribute name can only
occur once
<course course_id = “CS-101” credits=“4”>

Attributes vs. Subelements
 Distinction between subelement and attribute
• In the context of documents, attributes are part of markup, while
subelement contents are part of the basic document contents
• In the context of data representation, the difference is unclear and may
be confusing
 Same information can be represented in two ways
• <course course_id= “CS-101”> … </course>
• <course>
<course_id>CS-101</course_id> …
</course>
• Suggestion: use attributes for identifiers of elements, and use
subelements for contents

Namespaces
 XML data has to be exchanged between organizations
 Same tag name may have different meaning in different organizations,
causing confusion on exchanged documents
 Specifying a unique string as an element name avoids confusion
 Better solution: use unique-name:element-name
 Avoid using long unique names all over document by using XML
Namespaces
<university xmlns:yale=“http://www.yale.edu”>
…
<yale:course>
<yale:course_id> CS-101 </yale:course_id>
<yale:title> Intro. to Computer Science</yale:title>
<yale:dept_name> Comp. Sci. </yale:dept_name>
<yale:credits> 4 </yale:credits>
</yale:course>
…
</university>

XML Document Schema
 Database schemas constrain what information can be stored, and the data
types of stored values
 XML documents are not required to have an associated schema
 However, schemas are very important for XML data exchange
• Otherwise, a site cannot automatically interpret data received from
another site
 Two mechanisms for specifying XML schema
• Document Type Definition (DTD)
 Widely used
• XML Schema
 Newer, increasing use

Document Type Definition (DTD)
 The type of an XML document can be specified using a DTD
 DTD constraints structure of XML data
• What elements can occur
• What attributes can/must an element have
• What subelements can/must occur inside each element, and how
many times.
 DTD does not constrain data types
• All values represented as strings in XML
 DTD syntax
• <!ELEMENT element (subelements-specification) >
• <!ATTLIST element (attributes) >

Element Specification in DTD
 Subelements can be specified as
• names of elements, or
• #PCDATA (parsed character data), i.e., character strings
• EMPTY (no subelements) or ANY (anything can be a subelement)
 Example
<! ELEMENT department (dept_name building, budget)>
<! ELEMENT dept_name (#PCDATA)>
<! ELEMENT budget (#PCDATA)>
 Subelement specification may have regular expressions
<!ELEMENT university ( ( department | course | instructor | teaches )+)>
 Notation:
• “|” - alternatives
• “+” - 1 or more occurrences
• “*” - 0 or more occurrences

30.14
XML data with ID and IDREF attributes
<university-3>
<department dept name=“Comp. Sci.”>
<building> Taylor </building>
</department>
<department dept name=“Biology”>
<building> Watson </building>
</department>
<course course id=“CS-101” dept name=“Comp. Sci”
instructors=“10101 83821”>
<title> Intro. to Computer Science </title>
</course>
….
<instructor IID=“10101” dept name=“Comp. Sci.”>
<name> Srinivasan </name>
<salary> 65000 </salary>
</instructor>
….
</university-3>

Limitations of DTDs
 No typing of text elements and attributes
• All values are strings, no integers, reals, etc.
 Difficult to specify unordered sets of subelements
• Order is usually irrelevant in databases (unlike in the document-layout
environment from which XML evolved)
• (A | B)* allows specification of an unordered set, but
 Cannot ensure that each of A and B occurs only once
 IDs and IDREFs are untyped
• The instructors attribute of an course may contain a reference to
another course, which is meaningless
 instructors attribute should ideally be constrained to refer to
instructor elements

XML Schema
 XML Schema is a more sophisticated schema language which addresses
the drawbacks of DTDs. Supports
• Typing of values
 E.g., integer, string, etc
 Also, constraints on min/max values
• User-defined, comlex types
• Many more features, including
 uniqueness and foreign key constraints, inheritance
 XML Schema is itself specified in XML syntax, unlike DTDs
• More-standard representation, but verbose
 XML Scheme is integrated with namespaces
 BUT: XML Schema is significantly more complicated than DTDs.

Querying and Transforming XML Data
 Translation of information from one XML schema to another
 Querying on XML data
 Above two are closely related, and handled by the same tools
 Standard XML querying/translation languages
• XPath
 Simple language consisting of path expressions
• XSLT
 Simple language designed for translation from XML to XML and
XML to HTML
• XQuery
 An XML query language with a rich set of features

Tree Model of XML Data
 Query and transformation languages are based on a tree model of XML
data
 An XML document is modeled as a tree, with nodes corresponding to
elements and attributes
• Element nodes have child nodes, which can be attributes or
subelements
• Text in an element is modeled as a text node child of the element
• Children of a node are ordered according to their order in the XML
document
• Element and attribute nodes (except for the root node) have a single
parent, which is an element node
• The root node has a single child, which is the root element of the
document

XPath
 XPath is used to address (select) parts of documents using
path expressions
 A path expression is a sequence of steps separated by “/”
• Think of file names in a directory hierarchy
 Result of path expression: set of values that along with their containing
elements/attributes match the specified path
 E.g., /university-3/instructor/name evaluated on the university-3 data
we saw earlier returns
<name>Srinivasan</name>
<name>Brandt</name>
 E.g., /university-3/instructor/name/text( )
returns the same names, but without the enclosing tags

30.20
XPath (Cont.)
 The initial “/” denotes root of the document (above the top-level tag)
 Path expressions are evaluated left to right
• Each step operates on the set of instances produced by the previous
step
 Selection predicates may follow any step in a path, in [ ]
• E.g., /university-3/course[credits >= 4]
 returns account elements with a balance value greater than 400
 /university-3/course[credits] returns account elements containing
a credits subelement
 Attributes are accessed using “@”
• E.g., /university-3/course[credits >= 4]/@course_id
 returns the course identifiers of courses with credits >= 4
• IDREF attributes are not dereferenced automatically (more on this
later)

Functions in XPath
 XPath provides several functions
• The function count() at the end of a path counts the number of
elements in the set generated by the path
 E.g., /university-2/instructor[count(./teaches/course)> 2]
• Returns instructors teaching more than 2 courses (on
university-2 schema)
• Also function for testing position (1, 2, ..) of node w.r.t. siblings
 Boolean connectives and and or and function not() can be used in
predicates
 IDREFs can be referenced using function id()
• id() can also be applied to sets of references such as IDREFS and
even to strings containing multiple references separated by blanks
• E.g., /university-3/course/id(@dept_name)
 returns all department elements referred to from the dept_name
attribute of course elements.

30.22
Sorting in XQuery
 The order by clause can be used at the end of any expression.
E.g., to return instructors sorted by name
for $i in /university/instructor
order by $i/name
return <instructor> { $i/* } </instructor>
 Use order by $i/name descending to sort in descending order
 Can sort at multiple levels of nesting (sort departments by dept_name,
and by courses sorted to course_id within each department)
<university-1> {
for $d in /university/department
order by $d/dept name
return
<department>
{ $d/* }
{ for $c in /university/course[dept name = $d/dept name]
order by $c/course id
return <course> { $c/* } </course> }
</department>
} </university-1>

Storage of XML Data
 XML data can be stored in
• Non-relational data stores
 Flat files
• Natural for storing XML
• But has all problems discussed in Chapter 1 (no concurrency,
no recovery, …)
 XML database
• Database built specifically for storing XML data, supporting
DOM model and declarative querying
• Currently no commercial-grade systems
• Relational databases
 Data must be translated into relational form
 Advantage: mature database systems
 Disadvantages: overhead of translating data and queries

XML Applications
 Storing and exchanging data with complex structures
• E.g., Open Document Format (ODF) format standard for storing Open
Office and Office Open XML (OOXML) format standard for storing
Microsoft Office documents
• Numerous other standards for a variety of applications
 ChemML, MathML
 Standard for data exchange for Web services
• remote method invocation over HTTP protocol
• More in next slide
 Data mediation
• Common data representation format to bridge different systems

Outline
 Relevance Ranking Using Terms
 Relevance Using Hyperlinks
 Synonyms., Homonyms, and Ontologies
 Indexing of Documents
 Measuring Retrieval Effectiveness
 Web Search Engines
 Information Retrieval and Structured Data
 Directories

Information Retrieval Systems
 Information retrieval (IR) systems use a simpler data model than
database systems
• Information organized as a collection of documents
• Documents are unstructured, no schema
 Information retrieval locates relevant documents, on the basis of user
input such as keywords or example documents
• e.g., find documents containing the words “database systems”
 Can be used even on textual descriptions provided with non-textual data
such as images
 Web search engines are the most familiar example of IR systems

31.3
Information Retrieval Systems (Cont.)
 Differences from database systems
• IR systems don’t deal with transactional updates (including
concurrency control and recovery)
• Database systems deal with structured data, with schemas that define
the data organization
• IR systems deal with some querying issues not generally addressed
by database systems
 Approximate searching by keywords
 Ranking of retrieved answers by estimated degree of relevance

Keyword Search
 In full text retrieval, all the words in each document are considered to be
keywords.
• We use the word term to refer to the words in a document
 Information-retrieval systems typically allow query expressions formed using
keywords and the logical connectives and, or, and not
• Ands are implicit, even if not explicitly specified
 Ranking of documents on the basis of estimated relevance to a query is critical
• Relevance ranking is based on factors such as
 Term frequency
– Frequency of occurrence of query keyword in document
 Inverse document frequency
– How many documents the query keyword occurs in
» Fewer  give more importance to keyword
 Hyperlinks to documents
– More links to a document  document is more important

Relevance Ranking Using Terms
 TF-IDF (Term frequency/Inverse Document frequency) ranking:
• Let n(d) = number of terms in the document d
• n(d, t) = number of occurrences of term t in the document d.
• Relevance of a document d to a term t
 The log factor is to avoid excessive weight to frequent terms
• Relevance of document to query Q
n(d)
n(d, t)
1 +
TF (d, t) = log
r (d, Q) =  TF (d, t)
n(t)
tQ

31.6
Relevance Ranking Using Terms (Cont.)
 Most systems add to the above model
• Words that occur in title, author list, section headings, etc. are given
greater importance
• Words whose first occurrence is late in the document are given lower
importance
• Very common words such as “a”, “an”, “the”, “it” etc. are eliminated
 Called stop words
• Proximity: if keywords in query occur close together in the document,
the document has higher importance than if they occur far apart
 Documents are returned in decreasing order of relevance score
• Usually only top few documents are returned, not all

31.7
Similarity Based Retrieval
 Similarity based retrieval - retrieve documents similar to a given document
• Similarity may be defined on the basis of common words
 E.g., find k terms in A with highest TF (d, t ) / n (t ) and use these
terms to find relevance of other documents.
 Relevance feedback: Similarity can be used to refine answer set to
keyword query
• User selects a few relevant documents from those retrieved by
keyword query, and system finds other documents similar to these
 Vector space model: define an n-dimensional space, where n is the
number of words in the document set.
• Vector for document d goes from origin to a point whose i th coordinate
is TF (d,t ) / n (t )
• The cosine of the angle between the vectors of two documents is used
as a measure of their similarity.

31.8
Relevance Using Hyperlinks
 Number of documents relevant to a query can be enormous if only term
frequencies are taken into account
 Using term frequencies makes “spamming” easy
 E.g., a travel agency can add many occurrences of the words
“travel” to its page to make its rank very high
 Most of the time people are looking for pages from popular sites
 Idea: use popularity of Web site (e.g., how many people visit it) to rank site
pages that match given keywords
 Problem: hard to find actual popularity of site
• Solution: next slide

31.9
Relevance Using Hyperlinks (Cont.)
 Solution: use number of hyperlinks to a site as a measure of the
popularity or prestige of the site
• Count only one hyperlink from each site (why? - see previous slide)
• Popularity measure is for site, not for individual page
 But, most hyperlinks are to root of site
 Also, concept of “site” difficult to define since a URL prefix like
cs.yale.edu contains many unrelated pages of varying popularity
 Refinements
• When computing prestige based on links to a site, give more weight
to links from sites that themselves have higher prestige
 Definition is circular
 Set up and solve system of simultaneous linear equations
• Above idea is basis of the Google PageRank ranking mechanism

31.10
Relevance Using Hyperlinks (Cont.)
 Connections to social networking theories that ranked prestige of
people
• E.g., the president of the U.S.A has a high prestige since many
people know him
• Someone known by multiple prestigious people has high prestige
 Hub and authority based ranking
• A hub is a page that stores links to many pages (on a topic)
• An authority is a page that contains actual information on a topic
• Each page gets a hub prestige based on prestige of authorities that
it points to
• Each page gets an authority prestige based on prestige of hubs
that point to it
• Again, prestige definitions are cyclic, and can be got by
solving linear equations
• Use authority prestige when ranking answers to a query

31.11
Synonyms and Homonyms
 Synonyms
• E.g., document: “motorcycle repair”, query: “motorcycle
maintenance”
 Need to realize that “maintenance” and “repair” are synonyms
• System can extend query as “motorcycle and (repair or
maintenance)”
 Homonyms
• E.g., “object” has different meanings as noun/verb
• Can disambiguate meanings (to some extent) from the context
 Extending queries automatically using synonyms can be problematic
• Need to understand intended meaning in order to infer synonyms
 Or verify synonyms with user
• Synonyms may have other meanings as well

31.12
Concept-Based Querying
 Approach
• For each word, determine the concept it represents from context
• Use one or more ontologies:
 Hierarchical structure showing relationship between concepts
 E.g., the ISA relationship that we saw in the E-R model
 This approach can be used to standardize terminology in a specific field
 Ontologies can link multiple languages
 Foundation of the Semantic Web (not covered here)

31.13
Indexing of Documents
 An inverted index maps each keyword Ki to a set of documents Si that
contain the keyword
• Documents identified by identifiers
 Inverted index may record
• Keyword locations within document to allow proximity based ranking
• Counts of number of occurrences of keyword to compute TF
 and operation: Finds documents that contain all of K1, K2, ..., Kn.
• Intersection S1 S2 .....  Sn
 or operation: documents that contain at least one of K1, K2, …, Kn
• union, S1 S2 .....  Sn,.
 Each Si is kept sorted to allow efficient intersection/union by merging
• “not” can also be efficiently implemented by merging of sorted lists

Measuring Retrieval Effectiveness
 Information-retrieval systems save space by using index structures that
support only approximate retrieval. May result in:
• false negative (false drop) - some relevant documents may not be
retrieved.
• false positive - some irrelevant documents may be retrieved.
• For many applications a good index should not permit any false
drops, but may permit a few false positives.
 Relevant performance metrics:
• precision - what percentage of the retrieved documents are relevant
to the query.
• recall - what percentage of the documents relevant to the query
were retrieved.

31.15
Measuring Retrieval Effectiveness (Cont.)
 Recall vs. precision tradeoff:
 Can increase recall by retrieving many documents (down to a low
level of relevance ranking), but many irrelevant documents would
be fetched, reducing precision
 Measures of retrieval effectiveness:
• Recall as a function of number of documents fetched, or
• Precision as a function of recall
 Equivalently, as a function of number of documents fetched
• E.g., “precision of 75% at recall of 50%, and 60% at a recall of 75%”
 Problem: which documents are actually relevant, and which are not

31.16
Web Search Engines
 Web crawlers are programs that locate and gather information on the
Web
• Recursively follow hyperlinks present in known documents, to find
other documents
 Starting from a seed set of documents
• Fetched documents
 Handed over to an indexing system
 Can be discarded after indexing, or store as a cached copy
 Crawling the entire Web would take a very large amount of time
• Search engines typically cover only a part of the Web, not all of it
• Take months to perform a single crawl

31.17
Web Crawling (Cont.)
 Crawling is done by multiple processes on multiple machines, running in
parallel
• Set of links to be crawled stored in a database
• New links found in crawled pages added to this set, to be crawled later
 Indexing process also runs on multiple machines
• Creates a new copy of index instead of modifying old index
• Old index is used to answer queries
• After a crawl is “completed” new index becomes “old” index
 Multiple machines used to answer queries
• Indices may be kept in memory
• Queries may be routed to different machines for load balancing

31.18
Information Retrieval and Structured Data
 Information retrieval systems originally treated documents as a collection
of words
 Information extraction systems infer structure from documents, e.g.:
• Extraction of house attributes (size, address, number of bedrooms,
etc.) from a text advertisement
• Extraction of topic and people named from a new article
 Relations or XML structures used to store extracted data
• System seeks connections among data to answer queries
• Question answering systems

Directories
 Storing related documents together in a library facilitates browsing
• Users can see not only requested document but also related ones.
 Browsing is facilitated by classification system that organizes logically
related documents together.
 Organization is hierarchical: classification hierarchy

A Classification Hierarchy For A Library System

A Classification DAG For a Library Information
Retrieval System

31.22
Web Directories
 A Web directory is just a classification directory on Web pages
• E.g., Yahoo! Directory, Open Directory project
• Issues:
 What should the directory hierarchy be?
 Given a document, which nodes of the directory are categories
relevant to the document
• Often done manually
 Classification of documents into a hierarchy may be done based
on term similarity

20CS402 - DATABASE MANAGEMENT SYSTEMS NOTES

Recommended

More Related Content

Similar to 20CS402 - DATABASE MANAGEMENT SYSTEMS NOTES (20)

Recently uploaded (20)

20CS402 - DATABASE MANAGEMENT SYSTEMS NOTES