Final Review
Final Review
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing -Data Storage
-Hashing & Indexing Structures
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & - Presentations
Applications
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline
File-based Approach
Database Approach
Three-Schema Architecture and Data Independence
Database Languages
Data Models, Database Schema and Database State
Data Management Systems Framework
4
File-based Approach and
Database Approach
File-based approach:
• Problems
• Shared file approach
Database approach
• DBMS
• Characteristics of the Database Approach
5
Three-Schema Architecture and Data
Independence
▪ Three-level architecture and data independence
6
Three-Schema Architecture and Data
Independence
7
Three-Schema Architecture and Data
Independence
8
Database Languages
Data Definition Language (DDL)
Data Manipulation Language (DML)
Data Control Language (DCL)
9
Data Models, Database Schema and
Database State
Data Model
Database Schema
Schema Diagram
Database State (Snapshot)
10
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline
What is ER Model? And Why?
Overview of Database Design Process
Example COMPANY Database
ER Model Concepts
ER Diagram
Alternative Diagrammatic Notations
Problems with ER Models
12
What is ER Model? And Why?
ER model is a logical organisation of data within a
database system
ER model technique is based on relational data model
Why use ER data modelling
13
Overview of Database Design Process
14
Example COMPANY Database
15
ER Model Concepts
Entities and Attributes
Types of Attributes:
• Simple
• Composite
• Multi-valued
• Derived Attribute
Entity Types and Key Attributes
Relationships and Relationship Types
Weak Entity Types
Recursive relationships
16
ER Model Concepts
Structural constraints: one way to express semantics
of relationship: cardinality ratio and membership class
Cardinality ratio (functionality): It specifies the
number of relationship instances that an entity can participate in a
binary relationship
• one-to-one (1:1)
• one-to-many (1:M) or many-to-one (M:1)
• many-to-many (M:N)
Membership class (participation
constraint):
• Mandatory (total participation)
• Optional (partial participation)
17
Summary of the Notation for ER Diagrams
18
ER Diagram
(min, max) notation for relationship structural constraints
(0,1) (1,1)
(1,1) (4,N)
19
ER diagrams for the COMPANY schema, with structural
constraints specified using (min, max) notation
20
Alternative Diagrammatic Notations
Symbols for entity type / class, Displaying attributes
attribute and relationship
21
Problems with ER Models
Fan Trap
Chasm Trap
22
An Example of a Fan Trap
25
ER Model restructured to remove Chasm Trap
ER Model restructured to remove Chasm Trap
27
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline
Limitations of Basic Concepts of the ER Model
Enhanced-ER (EER) Model Concepts
Subclasses and Superclasses
Specialization and Generalization
Specialization / Generalization Hierarchies, Lattices
and Shared Subclasses
Categories
Formal Definitions of EER Model
Database Design Modeling Tools
29
Subclasses and Superclasses
30
Subclasses and Superclasses
These are also called IS-A (IS-AN)
relationships
Superclass/subclass relationship is one-to-one
(1:1)
Inheritance in Superclass/Subclass
Relationships: attributes and relationships
Generalization and Specialization
31
Constraints on Specialization and
Generalization
Two basic conditions: disjointness and
completeness constraints
Disjointness Constraint:
• Disjoint and overlap
Completeness Constraint:
• Total and partial
32
Example of Disjoint Partial
Specialization
33
Example of Overlapping Total
Specialization
34
Specialization / Generalization Hierarchies,
Lattices and Shared Subclasses
A subclass may itself have further subclasses
specified on it, forming a hierarchy or a lattice
Hierarchy has a constraint that every subclass
has only one superclass (called single
inheritance)
In a lattice, a subclass can be subclass of
more than one superclass (called multiple
inheritance)
A subclass with more than one superclass is
called a shared subclass
35
Specialization / Generalization Lattice Example (UNIVERSITY)
36
Categories
Two categories (union types):
OWNER and REGISTERED_VEHICLE
37
Database Design Modeling Tools
38
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Relational Data Model
and
ER-/EER-to-Relational Mapping
Basic Concepts
Relational data model
Relation schema: R(A1, A2,…, An)
The degree of a relation
Domain D
Tuple
Cardinality
Database schema S = {R1, R2,…, Rm}
42
ER- & EER-to-Relational Mapping
ER-
• Step 1: Mapping of Regular Entity Types
• Step 2: Mapping of Weak Entity Types
• Step 3: Mapping of Binary 1:1 Relationship Types
• Step 4: Mapping of Binary 1:N Relationship Types
• Step 5: Mapping of Binary M:N Relationship Types
• Step 6: Mapping of Multivalued attributes
• Step 7: Mapping of N-ary Relationship Types
EER-
• Step 8: Options for Mapping Specialization or Generalization.
• Step 9: Mapping of Union Types (Categories)
43
The ERD for the COMPANY database
44
Result of mapping the COMPANY ER schema into a
relational schema
45
ER-to-Relational Mapping
Correspondence between ER and Relational
Models
ER Model Relational Model
Entity type “Entity” relation
1:1 or 1:N relationship type Foreign key (or “relationship” relation)
M:N relationship type “Relationship” relation and two foreign
keys
n-ary relationship type “Relationship” relation and n foreign keys
Simple attribute Attribute
Composite attribute Set of simple component attributes
Multivalued attribute Relation and foreign key
Value set Domain
Key attribute Primary (or secondary) key
72
Q&A
86
Review
88
Review
Gọi R là kiểu mối liên kết 1-n giữa hai kiểu thực
thể A và B (n bên phía B theo tập ký hiệu của
Chen). Những phát biểu nào sau đây là SAI
A. Một thực thể a của A có thể liên kết với n thực
thể b của B
B. Một thực thể a của A có thể tham gia vào n
thể hiện (ri) của R
C. Một thực thể b của B có thể tham gia vào n
thể hiện (ri) của R
D. Mọi thực thể b của B đều phải tham gia vào ít
nhất 1 thể hiện (ri) của R
89
Review
90
Review
92
Review
95
Review
n 1
R1 R2
X
A. R1 (A, B) C. R1 (A)
R2 (B) R2 (B)
B. R1 (A) X (A, B)
R2 (B, A) D. R1 (A)
R2 (B)
X (A, B)
98
Review
Lần Đơn Ngày Số
Quận Đườn Số
giá đặt hàng lượng
g nhà
Đơn Ngày
Lần
Địa Email giá hiệu
chỉ lực
n n
n n
Room
Sub-
Employee
Works Project
Has
Code Name
1
Code
Project
First Last
Name
100
101
Review
102
Review
103
Review
105
Review
106
Review
107
Review
110
Employee
Review
111
Review
A B
n 1
R1 R2
X
A. R1 (A, B) C. R1 (A)
R2 (B) R2 (B)
B. R1 (A) X (A, B)
R2 (B, A) D. R1 (A)
R2 (B)
X (A, B)
112
Review
Code
Phone Emai Hours
l
n n
Room
Sub-
Employee
Works Project
Has
Code Name
1
Code
Project
First Last
Name
113
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline
Relational Algebra
• Unary Relational Operations
• Relational Algebra Operations from Set Theory
• Binary Relational Operations
• Additional Relational Operations
Brief Introduction to Relational Calculus
115
Relational Algebra Overview
Relational algebra is the basic set of operations for the
relational model
The result of an operation is a new relation
A sequence of relational algebra operations forms a
relational algebra expression
116
Relational Algebra Overview
Relational Algebra consists of several groups of
operations
• Unary Relational Operations
• SELECT (symbol: (sigma))
• PROJECT (symbol: (pi))
• RENAME (symbol: (rho))
• Relational Algebra Operations from Set Theory
• UNION ( ), INTERSECTION ( ), DIFFERENCE (or MINUS, –
)
• CARTESIAN PRODUCT ( x )
• Binary Relational Operations
• JOIN (several variations of JOIN exist)
• DIVISION
• Additional Relational Operations
• OUTER JOINS, OUTER UNION
• AGGREGATE FUNCTIONS (SUM, COUNT, AVG, MIN, MAX)
117
Unary Relational Operations: SELECT
▪ Select operation is denoted by
<selection condition>(R)
Examples:
• Select the EMPLOYEE tuples whose department number is 4:
DNO = 4 (EMPLOYEE)
• Select the employee tuples whose salary is greater than
$30,000:
SALARY > 30,000 (EMPLOYEE)
118
Unary Relational Operations:
PROJECT
PROJECT Operation is denoted by (pi)
<attribute list>(R)
Example: To list each employee’s first and last
name and salary, the following is used:
LNAME, FNAME,SALARY(EMPLOYEE)
119
Unary Relational Operations:
RENAME
RENAME operation (rho) can be expressed
by any of the following forms:
• S (B1, B2, …, Bn )(R) changes both:
• the relation name to S, and
• the column (attribute) names to B1, B1, …..Bn
• S(R) changes:
• the relation name only to S
• (B1, B2, …, Bn )(R) changes:
• the column (attribute) names only to B1, B1,
…..Bn
120
Relational Algebra Operations from
Set Theory: UNION
UNION Operation
• Binary operation, denoted by
• The result of R S, is a relation that includes
all tuples that are either in R or in S or in both R
and S
• Duplicate tuples are eliminated
• The two operand relations R and S must be “type
compatible” (or UNION compatible)
121
Relational Algebra Operations from Set
Theory: INTERSECTION
INTERSECTION is denoted by
The result of the operation R S, is a
relation that includes all tuples that are in
both R and S
The two operand relations R and S must be
“type compatible”
122
Relational Algebra Operations from
Set Theory: SET DIFFERENCE (cont.)
SET DIFFERENCE (also called MINUS or
EXCEPT) is denoted by –
The result of R – S, is a relation that includes all
tuples that are in R but not in S
The two operand relations R and S must be “type
compatible”
123
Relational Algebra Operations from
Set Theory: CARTESIAN PRODUCT
CARTESIAN (or CROSS) PRODUCT
Operation
• Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . .,
Bm)
• Result is a relation Q with degree n + m
attributes:
• Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that
order.
• Hence, if R has nR tuples (denoted as |R| = nR ),
and S has nS tuples, then R x S will have nR *
nS tuples
• The two operands do NOT have to be "type
124
compatible”
Binary Relational Operations: JOIN
JOIN Operation (denoted by )
R <join condition>S
126
Additional Relational Operations
128
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline
DDL: Data Definition Language
• Create
• Alter
• Drop
DML: Data Manipulation Language
• Select
• Insert
• Update
• Delete
DCL: Data Control Language
• Commit
• Rollback
• Grant,
• Revoke
Dr. Dang Tran Khanh, Faculty of CSE, HCMUT ([email protected]) 130
DDL: Create, Alter, Drop
CREATE SCHEMA
131
DDL: Create, Alter, Drop
CREATE TABLE
CREATE TABLE TableName
{(colName dataType [NOT NULL] [UNIQUE]
[DEFAULT defaultOption]
[CHECK searchCondition] [,...]}
[PRIMARY KEY (listOfColumns),]
{[UNIQUE (listOfColumns),] […,]}
{[FOREIGN KEY (listOfFKColumns)
REFERENCES ParentTableName [(listOfCKColumns)],
[ON UPDATE referentialAction]
[ON DELETE referentialAction ]] [,…]}
{[CHECK (searchCondition)] [,…] })
132
DDL: Create, Alter, Drop
CREATE TABLE
Default values
Primary key and referential integrity constraints
• referential triggered action clause of FK constraint:
ON DELETE <action>
ON UPDATE <action>
<action>: SET NULL, CASCADE, SET DEFAULT
Giving names to constraints
Specifying constraints on tuples using CHECK
133
DDL: Create, Alter, Drop
DROP Command
134
DDL: Create, Alter, Drop
ALTER Command
Base tables: adding or dropping a column or
constraints, changing a column definition. Example:
ALTER TABLE Company.Employee ADD Job VARCHAR(15)
NOT NULL;
135
DML: Select, Insert, Update, Delete
SELECT
SELECT [DISTINCT | ALL]
{* | [columnExpression [AS newName]] [,...] }
FROM TableName [alias] [, ...]
[WHERE condition]
[GROUP BY columnList] [HAVING condition]
[ORDER BY columnList]
136
DML: Select, Insert, Update, Delete
Insert
137
DML: Select, Insert, Update, Delete
Delete
Removes tuples from a relation
Includes a WHERE-clause to select the tuples to be deleted
Tuples are deleted from only one table at a time (unless
CASCADE is specified on a referential integrity constraint)
A missing WHERE-clause specifies that all tuples in the relation
are to be deleted; the table then becomes an empty table
The number of tuples deleted depends on the number of tuples in
the relation that satisfy the WHERE-clause
138
DML: Select, Insert, Update, Delete
Update
Used to modify attribute values of one or more
selected tuples
A WHERE-clause selects the tuples to be modified
An additional SET-clause specifies the attributes to be
modified and their new values
Each command modifies tuples in the same relation
Referential integrity should be enforced
139
Advanced DDL: Assertions & Triggers
140
VIEWs
SQL command: CREATE VIEW
• a view (table) name
• a possible list of attribute names
• a query to specify the view contents
Specify a different WORKS_ON table (view)
CREATE VIEW WORKS_ON_NEW AS
SELECT FNAME, LNAME, PNAME, HOURS
FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE SSN=ESSN AND PNO=PNUMBER
141
DCL: Commit, Rollback, Grant,
Revoke
Chapter 17: Transaction Processing
Chapter 23: DB security
142
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline
Introduction
Functional dependencies (FDs)
• Definition of FD
• Direct, indirect, partial dependencies
• Inference Rules for FDs
• Equivalence of Sets of FDs
• Minimal Sets of FDs
Normalization
• 1NF and dependency problems
• 2NF – solves partial dependency
• 3NF – solves indirect dependency
• BCNF – well-normalized relations
144
Introduction
“Goodness” measures:
• Redundant information in tuples
• Update anomalies: modification, deletion,
insertion
• Reducing the NULL values in tuples
• Disallowing the possibility of generating
spurious tuples
145
Functional Dependencies
(FDs)
Functional dependencies (FDs) are used to
specify formal measures of the "goodness" of
relational designs
FDs and keys are used to define normal
forms for relations
X -> Y holds if whenever two tuples have the
same value for X, they must have the same
value for Y
Examples:
• social security number determines employee name:
SSN -> ENAME
• project number determines project name and location:
PNUMBER -> {PNAME, PLOCATION}
• employee ssn and project number determines the hours per
week that the employee works on the project:
{SSN, PNUMBER}
146 -> HOURS
Functional Dependencies
(FDs)
Direct dependency (fully functional dependency): All
attributes in a R must be fully functionally dependent
on the primary key (or the PK is a determinant of all
attributes in R)
Performer-id Performer-
name
Performer-type
Performer-
location
147
Functional Dependencies
(FDs)
Indirect dependency (transitive
dependency): Value of an attribute is not
determined directly by the primary key
Performer-
Performer-id
name
Performer-
Fee
type
Performer-
location
148
Functional Dependencies
(FDs)
Partial dependency
• Composite determinant - more than one value is required to
determine the value of another attribute, the combination of values is
called a composite determinant
EMP_PROJ(SSN, PNUMBER, HOURS, ENAME, PNAME, PLOCATION)
{SSN, PNUMBER} -> HOURS
149
Functional Dependencies (FDs)
Inference Rules for FDs
Armstrong's inference rules:
IR1. (Reflexive) If Y X, then X -> Y
IR2. (Augmentation) If X -> Y, then XZ -> YZ
(Notation: XZ stands for X U Z)
IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z
150
Functional Dependencies (FDs)
151
Functional Dependencies (FDs)
152
Normalization
Normalization: The process of decomposing
unsatisfactory "bad" relations by breaking up
their attributes into smaller relations
Two new concepts:
• A Prime attribute must be a member of some
candidate key
• A Nonprime attribute is not a prime attribute: it
is not a member of any candidate key
153
Normalization
1NF and dependency problems
2NF – solves partial dependency
3NF – solves indirect dependency
BCNF – well-normalized relations
154
Normalization
First normal form (1NF): Disallows composite
attributes, multivalued attributes, and nested
relations
155
Normalization
Second normal form (2NF) - all attributes must
be fully functionally dependent on the
primary key
156
Normalization
A relation schema R is in third normal form
(3NF) if it is in 2NF and no non-prime attribute
A in R is transitively dependent on the
primary key
NOTE:
In X -> Y and Y -> Z, with X as the primary key, we
consider this a problem only if Y is not a candidate
key. When Y is a candidate key, there is no problem
with the transitive dependency .
E.g., Consider EMP (SSN, Emp#, Salary ).
Here, SSN -> Emp# -> Salary and Emp# is a candidate
key
157
General Normal Form Definitions
A relation schema R is in second normal form (2NF)
if every non-prime attribute A in R is fully functionally
dependent on every key of R
A relation schema R is in third normal form (3NF) if
whenever a FD X -> A holds in R, then either:
(a) X is a superkey of R, or
(b) A is a prime attribute of R
158
Normalization
A relation schema R is in Boyce-Codd
Normal Form (BCNF) if whenever an
FD X -> A holds in R, then X is a
superkey of R
159
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline
Data Storage
• Disk Storage Devices
• Files of Records
• Operations on Files
• Unordered Files
• Ordered Files
• Hashed Files
• RAID Technology
Indexing Structures for Files
• Types of Single-level Ordered Indexes
• Multilevel Indexes
• Dynamic Multilevel Indexes Using B-Trees and B+-Trees
• Indexes on Multiple Keys
161
Disk Storage Devices
Disks are divided into concentric circular
tracks on each disk surface.
A track is divided into smaller blocks or
sectors
A read-write head moves to the track that
contains the block to be transferred
A physical disk block address consists of:
• a cylinder number
• the track number or surface number (within the
cylinder)
• and block number (within track).
162
Disk Storage Devices (contd.)
163
Disk Storage Devices (contd.)
164
Disk Storage Devices
Records: Fixed and variable length records
• Records contain fields which have values of a
particular type
Blocking: Refers to storing a number of
records in one block on the disk.
• Blocking factor (bfr) refers to the number of
records per block.
Spanned Records:
• Refers to records that exceed the size of one or
more blocks and hence span a number of
blocks.
165
Files of Records
166
Unordered Files
167
Ordered Files
Also called a sequential file.
File records are kept sorted by the values of an
ordering field.
Insertion is expensive: records must be inserted in
the correct order.
• It is common to keep a separate unordered overflow (or transaction)
file for new records to improve insertion efficiency; this is periodically
merged with the main ordered file.
A binary search can be used to search for a record
on its ordering field value.
• This requires reading and searching log2 of the file blocks on the
average, an improvement over linear search.
Reading the records in order of the ordering field is
quite efficient.
168
Indexes as Access Paths
A single-level index is an auxiliary file that
makes it more efficient to search for a record in
the data file.
The index is usually specified on one field of
the file (although it could be specified on
several fields)
One form of an index is a file of entries <field
value, pointer to record>, which is ordered by
field value
The index is called an access path on the field.
169
Types of Single-Level Indexes
Primary Index
• Defined on an ordered data file
• The data file is ordered on a key field
• Includes one index entry for each block in the
data file; the index entry has the key field value
for the first record in the block, which is called
the block anchor
• A similar scheme can use the last record in a
block.
• A primary index is a nondense (sparse) index,
since it includes an entry for each disk block of
the data file and the keys170
of its anchor record
rather than for every search value.
Primary index on the ordering key field
171
Types of Single-Level Indexes
Clustering Index
• Defined on an ordered data file
• The data file is ordered on a non-key field
unlike primary index, which requires that the
ordering field of the data file have a distinct
value for each record.
• Includes one index entry for each distinct value
of the field; the index entry points to the first
data block that contains records with that field
value.
• It is another example of nondense index where
Insertion and Deletion is relatively
172
straightforward with a clustering index.
A Clustering Index Example
FIGURE 14.2
A clustering
index on the
DEPTNUMBER
ordering non-key
field of an
EMPLOYEE file.
173
Another Clustering Index Example
174
Types of Single-Level Indexes
Secondary Index
• A secondary index provides a secondary means of
accessing a file for which some primary access
already exists.
• The secondary index may be on a field which is a
candidate key and has a unique value in every
record, or a non-key with duplicate values.
• The index is an ordered file with two fields.
• The first field is of the same data type as some non-
ordering field of the data file that is an indexing field.
• The second field is either a block pointer or a record
pointer.
• There can be many secondary indexes (and hence,
indexing fields) for the same file.
• Includes one entry for each record in the data file;
hence, it is a dense index 175
Example of a Dense Secondary Index
176
An Example of a Secondary Index
177
Properties of Index Types
178
Multi-Level Indexes
Because a single-level index is an ordered file, we can
create a primary index to the index itself;
• In this case, the original index file is called the first-level
index and the index to the index is called the second-
level index.
We can repeat the process, creating a third, fourth, ...,
top level until all entries of the top level fit in one disk
block
A multi-level index can be created for any type of first-
level index (primary, secondary, clustering) as long as
the first-level index consists of more than one disk
block
179
A Two-level Primary Index
180
Multi-Level Indexes
181
A Node in a Search Tree with Pointers
to Subtrees below It
FIGURE 14.8
182
Difference between B-tree and B+-tree
183
Q&A
184
Review
185
Review
186
Review
Phép kết (join) nào sau đây đòi hỏi mỗi cặp
thuộc tính kết (join attributes) phải có cùng tên
trên hai quan hệ tham gia vào phép kết:
A.Theta join
B. Equijoin
C. Natural join
D. B và C đúng
187
Review
188
Review
189
Review
190
Review
192
Review
193
Review
194
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications