0% found this document useful (0 votes)
2 views156 pages

Final Review

The document outlines a comprehensive course on database systems, covering topics such as data models, database architecture, SQL, database design theory, data storage, indexing, security, and emerging technologies. It includes detailed lectures on the Entity-Relationship (ER) model, relational models, and methodologies for database design, along with practical applications and tools. Additionally, it addresses common problems in ER modeling and provides insights into advanced concepts like Enhanced ER (EER) models.

Uploaded by

lttin605
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views156 pages

Final Review

The document outlines a comprehensive course on database systems, covering topics such as data models, database architecture, SQL, database design theory, data storage, indexing, security, and emerging technologies. It includes detailed lectures on the Entity-Relationship (ER) model, relational models, and methodologies for database design, along with practical applications and tools. Additionally, it addresses common problems in ER modeling and provides insights into advanced concepts like Enhanced ER (EER) models.

Uploaded by

lttin605
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 156

Review

Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing -Data Storage
-Hashing & Indexing Structures
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & - Presentations
Applications
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline
 File-based Approach
 Database Approach
 Three-Schema Architecture and Data Independence
 Database Languages
 Data Models, Database Schema and Database State
 Data Management Systems Framework

4
File-based Approach and
Database Approach
 File-based approach:
• Problems
• Shared file approach
 Database approach
• DBMS
• Characteristics of the Database Approach

5
Three-Schema Architecture and Data
Independence
▪ Three-level architecture and data independence

6
Three-Schema Architecture and Data
Independence

7
Three-Schema Architecture and Data
Independence

8
Database Languages
 Data Definition Language (DDL)
 Data Manipulation Language (DML)
 Data Control Language (DCL)

9
Data Models, Database Schema and
Database State
 Data Model
 Database Schema
 Schema Diagram
 Database State (Snapshot)

10
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline
 What is ER Model? And Why?
 Overview of Database Design Process
 Example COMPANY Database
 ER Model Concepts
 ER Diagram
 Alternative Diagrammatic Notations
 Problems with ER Models

12
What is ER Model? And Why?
 ER model is a logical organisation of data within a
database system
 ER model technique is based on relational data model
 Why use ER data modelling

13
Overview of Database Design Process

14
Example COMPANY Database

15
ER Model Concepts
 Entities and Attributes
 Types of Attributes:
• Simple
• Composite
• Multi-valued
• Derived Attribute
 Entity Types and Key Attributes
 Relationships and Relationship Types
 Weak Entity Types
 Recursive relationships

16
ER Model Concepts
 Structural constraints: one way to express semantics
of relationship: cardinality ratio and membership class
 Cardinality ratio (functionality): It specifies the
number of relationship instances that an entity can participate in a
binary relationship
• one-to-one (1:1)
• one-to-many (1:M) or many-to-one (M:1)
• many-to-many (M:N)
 Membership class (participation
constraint):
• Mandatory (total participation)
• Optional (partial participation)

17
Summary of the Notation for ER Diagrams

18
ER Diagram
(min, max) notation for relationship structural constraints

(0,1) (1,1)

(1,1) (4,N)

19
ER diagrams for the COMPANY schema, with structural
constraints specified using (min, max) notation

20
Alternative Diagrammatic Notations
Symbols for entity type / class, Displaying attributes
attribute and relationship

Notations for displaying


Various (min, Displaying
specialization / generalization
max) notations cardinality
ratios

21
Problems with ER Models

 Fan Trap
 Chasm Trap

22
An Example of a Fan Trap

At which branch office does staff number SG37 work?


23
Restructuring ER model to remove Fan Trap

SG37 works at branch


24
B003
An Example of a Chasm Trap

At which branch office is property PA14 available?

25
ER Model restructured to remove Chasm Trap
ER Model restructured to remove Chasm Trap

27
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline
 Limitations of Basic Concepts of the ER Model
 Enhanced-ER (EER) Model Concepts
 Subclasses and Superclasses
 Specialization and Generalization
 Specialization / Generalization Hierarchies, Lattices
and Shared Subclasses
 Categories
 Formal Definitions of EER Model
 Database Design Modeling Tools

29
Subclasses and Superclasses

30
Subclasses and Superclasses
 These are also called IS-A (IS-AN)
relationships
 Superclass/subclass relationship is one-to-one
(1:1)
 Inheritance in Superclass/Subclass
Relationships: attributes and relationships
 Generalization and Specialization

31
Constraints on Specialization and
Generalization
 Two basic conditions: disjointness and
completeness constraints
 Disjointness Constraint:
• Disjoint and overlap
 Completeness Constraint:
• Total and partial

32
Example of Disjoint Partial
Specialization

33
Example of Overlapping Total
Specialization

34
Specialization / Generalization Hierarchies,
Lattices and Shared Subclasses
 A subclass may itself have further subclasses
specified on it, forming a hierarchy or a lattice
 Hierarchy has a constraint that every subclass
has only one superclass (called single
inheritance)
 In a lattice, a subclass can be subclass of
more than one superclass (called multiple
inheritance)
 A subclass with more than one superclass is
called a shared subclass

35
Specialization / Generalization Lattice Example (UNIVERSITY)

36
Categories
Two categories (union types):
OWNER and REGISTERED_VEHICLE

37
Database Design Modeling Tools

COMPANY TOOL FUNCTIONALITY


Embarcadero ER Studio Database Modeling in ER and IDEF1X
Technologies
DB Artisan Database administration and space and security
management
Oracle Developer 2000 and Database modeling, application development
Designer 2000
Popkin Software System Architect 2001 Data modeling, object modeling, process
modeling, structured analysis/design
Platinum Technology Platinum Enterprice Data, process, and business component modeling
(Computer Modeling Suite: Erwin,
BPWin, Paradigm Plus
Associates)
Persistence Inc. Pwertier Mapping from O-O to relational model
Rational (IBM) Rational Rose Modeling in UML and application generation in
C++ and JAVA
Rogue Ware RW Metro Mapping from O-O to relational model
Resolution Ltd. Xcase Conceptual modeling up to code maintenance
Sybase Enterprise Application Data modeling, business logic modeling
Suite
Visio (Microsoft) Visio Enterprise Data modeling, design and reengineering Visual
Basic and Visual C++

38
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Relational Data Model
and
ER-/EER-to-Relational Mapping
Basic Concepts
 Relational data model
 Relation schema: R(A1, A2,…, An)
 The degree of a relation
 Domain D
 Tuple
 Cardinality
 Database schema S = {R1, R2,…, Rm}

Relational data model


Database schema
Relation schema
Relation
Tuple
41
Relational Integrity Constraints

 Constraints are conditions that must hold on


all valid relation instances. There are three
main types of constraints:
1.Key constraints
2.Entity integrity constraints
3.Referential integrity constraints
 Other types:
• Semantic Integrity Constraints
• State/static constraints (so far)
• Transition/dynamic constraints

42
ER- & EER-to-Relational Mapping
 ER-
• Step 1: Mapping of Regular Entity Types
• Step 2: Mapping of Weak Entity Types
• Step 3: Mapping of Binary 1:1 Relationship Types
• Step 4: Mapping of Binary 1:N Relationship Types
• Step 5: Mapping of Binary M:N Relationship Types
• Step 6: Mapping of Multivalued attributes
• Step 7: Mapping of N-ary Relationship Types
 EER-
• Step 8: Options for Mapping Specialization or Generalization.
• Step 9: Mapping of Union Types (Categories)

43
The ERD for the COMPANY database

44
Result of mapping the COMPANY ER schema into a
relational schema

45
ER-to-Relational Mapping
Correspondence between ER and Relational
Models
ER Model Relational Model
Entity type “Entity” relation
1:1 or 1:N relationship type Foreign key (or “relationship” relation)
M:N relationship type “Relationship” relation and two foreign
keys
n-ary relationship type “Relationship” relation and n foreign keys
Simple attribute Attribute
Composite attribute Set of simple component attributes
Multivalued attribute Relation and foreign key
Value set Domain
Key attribute Primary (or secondary) key

72
Q&A

86
Review

Những phát biểu nào sau đây là SAI


A. Khóa chính của lược đồ quan hệ sinh ra từ
mối liên kết 2 ngôi 1-n sẽ là khóa chính của
lược đồ quan hệ tương ứng bên phía 1 của
mối liên kết trên
B. Khóa chính của lược đồ quan hệ sinh ra từ
thực thể yếu là khóa riêng phần của kiểu thực
thể yếu đó
C. Lược đồ quan hệ được sinh ra từ thuộc tính
đa trị sẽ không có khóa chính
D. Mọi kiểu mối liên kết trong ERD/EERD đều
được biến đổi thành một lược đồ quan hệ
tương ứng
87
Review

Trong sơ đồ ERD/EERD, những phát biểu nào


sau đây SAI
A. Bậc của mối liên kết là số lượng các kiểu thực
thể (entity type) tham gia vào mối liên kết
B. Một kiểu thực thể có thể có nhiều thuộc tính
khóa
C. Kiểu thực thể yếu phải luôn luôn có khóa riêng
phần (partial key)
D. Kiểu thực thể yếu phải luôn luôn có mối liên
kết với một kiểu thực thể mạnh nào đó

88
Review

Gọi R là kiểu mối liên kết 1-n giữa hai kiểu thực
thể A và B (n bên phía B theo tập ký hiệu của
Chen). Những phát biểu nào sau đây là SAI
A. Một thực thể a của A có thể liên kết với n thực
thể b của B
B. Một thực thể a của A có thể tham gia vào n
thể hiện (ri) của R
C. Một thực thể b của B có thể tham gia vào n
thể hiện (ri) của R
D. Mọi thực thể b của B đều phải tham gia vào ít
nhất 1 thể hiện (ri) của R
89
Review

Chuyển đổi từ mô hình ERD/EERD sang lược đồ


cơ sở dữ liệu quan hệ nằm ở giai đọan nào trong
quá trình thiết kế một cơ sở dữ liệu
A. Thiết kế ý niệm (conceptual design)
B. Thiết kế luận lý (logical design)
C. Thiết kế vật lý (physical design)
D. Phân tích chức năng của hệ thống (functional
analysis)

90
Review

Khi thiết kế ERD/EERD cho một thư viện sách,


những nhóm đối tượng nào sau đây có thể được
xem là các kiểu thực thể.
A. Sách, tác giả, nhà xuất bản, nhà cung cấp,
độc giả, phiếu mượn
B. Sách, thư viện, tác giả, độc giả, nhà cung cấp,
phiếu mượn
C. Sách, tác giả, nhà xuất bản, nhà cung cấp,
độc giả, quốc tịch
D. Thư viện, sách, độc giả, tác giả, phiếu mượn,
phiếu trả
91
Review

Những phát biểu nào sau đây là ĐÚNG trong mô


hình dữ liệu quan hệ
A. Tuple, row, record là thuật ngữ tương đương
trong mô hình CDSL quan hệ
B. Một quan hệ có thể có nhiều khóa chính
(primary key)
C. Một quan hệ phải có ít nhất 1 khóa dự tuyển
(candidate key)
D. Một quan hệ có thể có nhiều khóa ngoại
(foreign key)

92
Review

Gọi R là mối liên kết 1-n của A và B (n bên phía


B và option ở cả 2 phía). Những chọn lựa sau
đây là có thể chấp nhận được
A. Biến đổi R thành một quan hệ
B. Không biến đổi R thành một quan hệ mà đặt
khóa chính của B vào quan hệ tương ứng của
A để làm khóa ngọai
C. Không biến đổi R thành một quan hệ mà đặt
khóa chính của A vào quan hệ tương ứng của
B để làm khóa ngọai
D. Tất cả các câu trên đều sai
93
Những đối tượng nào sau đây trong ERD/EERD
luôn được chuyển thành một quan hệ
A. Thực thể mạnh
B. Thực thể yếu
C. Mối liên kết 2 ngôi 1-1
D. Mối liên kết 2 ngôi 1-n
E. Mối liên kết 2 ngôi n-n
F. Mối liên kết 3 ngôi
G. Thuộc tính đa trị
H. Thuộc tính kết hợp
I. Thuộc tính dẫn xuất
94
Review

Một ERD gồm 2 kiểu thực thể A và B liên kết n-n


với nhau và mối liên kết này có thuộc tính đa trị.
Theo cách biến đổi thông thường thì lược đồ
CSDL tương ứng của ERD này có bao nhiêu
lược đồ quan hệ:
A. 2
B. 1
C. 4
D. 5

95
Review

 Cho lược đồ ERD như hình sau


 Phát biểu nào sau đây đúng

A. Có thể có nhân viên vừa làm toàn thời gian vừa


làm bán thời gian
B. Một nhân viên nào đó có thể vừa làm kỹ sư, vừa
làm văn phòng, vừa làm quản lý
C. Nếu một nhân viên không làm văn phòng và
không làm quản lý thì phải làm kỹ sư
D. Tổng số nhân viên toàn thời gian và bán thời gian
có thể nhỏ hơn số nhân viên trong công ty 96
97
Review
A B

n 1

R1 R2
X

A. R1 (A, B) C. R1 (A)
R2 (B) R2 (B)
B. R1 (A) X (A, B)
R2 (B, A) D. R1 (A)
R2 (B)
X (A, B)
98
Review
Lần Đơn Ngày Số
Quận Đườn Số
giá đặt hàng lượng
g nhà

Đơn Ngày
Lần
Địa Email giá hiệu
chỉ lực

n n

Khách Mặt hàng


Đặt
hàng
hàng

Mã Tên Điện Mã Tên


KH KH thoại MH MH
99
Review
Code
Phone Emai Hours
l

n n
Room
Sub-
Employee
Works Project

Has
Code Name

1
Code

Project
First Last

Name

100
101
Review

Choose the WRONG answers


A. Primary key of a relation generated from the 1-
n binary relationship is the primary key of a
relation that is 1 side of that relationship
B. Primary key of a relation generated from weak
entity is the partial key of that weak entity
C. A relation generated from multivalued
attributes has no primary key
D. Every relationship in ERD/EERD must be
mapped to a relation

102
Review

In ERD/EERD, choose the WRONG answers


A. The degree of a relationship is the number of
entity types joining that relationship
B. An entity type may have many key attributes
C. A weak entity type must always have a partial
key
D. A weak entity type must always have a
relationship with a strong entity type

103
Review

Let R be 1-n relationship type between the two


entity types A and B (n is on the side of B with
regards to Chen’s notation). Choose the
WRONG answers:
A. An entity of A can participate in a relationship
with many entities of B
B. An entity of A can participate in n instances (ri)
of R
C. An entity of B can participate in n instances (ri)
of R
D. All entities of B must participate in at least one
instance (ri) of R 104
Review

Mapping ERD/EERD to relational database


schema belongs to which phase of database
design process
A. Conceptual design
B. Logical design
C. Physical design
D. Functional analysis

105
Review

When designing ERD/EERD for a library, which


group of objects are considered as entity types?
A. Book, author, publisher, provider, reader,
library ticket
B. Book, library, author, reader, provider, library
ticket
C. Book, author, publisher, provider, author,
nationality
D. Library, book, reader, author, library ticket

106
Review

Choose the CORRECT answers in relational


data model
A. Tuple, row, record are equivalent terms
B. One relation may have many primary keys
C. One relation must have at least one candidate
key
D. One relation may have many foreign keys

107
Review

Let R be 1-n relationship of A and B (n is on B’


side and optional participation constraints are on
both sides). Which design is acceptable?
A. Mapping R into a relation
B. Do not map R into a relation. Instead, putting
the primary key of B into the relation of A as a
foreign key
C. Do not map R into a relation. Instead, putting
the primary key of A into the relation of B as a
foreign key
D. All are incorrect
108
Which objects in ERD/EERD are always mapped
into relations
A. Strong entity type
B. Weak entity type
C. 1-1 relationship
D. 1-n relationship
E. n-n relationship
F. Ternary relationship
G. Multivalued attributes
H. Composite attributes
I. Derrived attributes
109
Review

An ERD has 2 entity types A and B with n-n


relationship that has a multivalued attribute.
According to the mapping rules, how many
relations are there?
A. 2
B. 1
C. 4
D. 5

110
Employee
Review

 Cho lược đồ ERD như hình sau


Engineer Manager
 Phát biểu nào sau đây đúng Full-time Part-time
Officer

A. There are some employees who are full-time and part-time


B. There are some employees who are engineers, officers, and
managers at the same time
C. If an employee who is neither officer nor manager must be
engineer
D. The number of full-time and part-time employees may be
smaller than the total number of employees

111
Review
A B

n 1

R1 R2
X

A. R1 (A, B) C. R1 (A)
R2 (B) R2 (B)
B. R1 (A) X (A, B)
R2 (B, A) D. R1 (A)
R2 (B)
X (A, B)
112
Review
Code
Phone Emai Hours
l

n n
Room
Sub-
Employee
Works Project

Has
Code Name

1
Code

Project
First Last

Name

113
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline

 Relational Algebra
• Unary Relational Operations
• Relational Algebra Operations from Set Theory
• Binary Relational Operations
• Additional Relational Operations
 Brief Introduction to Relational Calculus

115
Relational Algebra Overview
 Relational algebra is the basic set of operations for the
relational model
 The result of an operation is a new relation
 A sequence of relational algebra operations forms a
relational algebra expression

116
Relational Algebra Overview
 Relational Algebra consists of several groups of
operations
• Unary Relational Operations
• SELECT (symbol:  (sigma))
• PROJECT (symbol:  (pi))
• RENAME (symbol:  (rho))
• Relational Algebra Operations from Set Theory
• UNION (  ), INTERSECTION (  ), DIFFERENCE (or MINUS, –
)
• CARTESIAN PRODUCT ( x )
• Binary Relational Operations
• JOIN (several variations of JOIN exist)
• DIVISION
• Additional Relational Operations
• OUTER JOINS, OUTER UNION
• AGGREGATE FUNCTIONS (SUM, COUNT, AVG, MIN, MAX)

117
Unary Relational Operations: SELECT
▪ Select operation is denoted by
<selection condition>(R)

 Examples:
• Select the EMPLOYEE tuples whose department number is 4:
 DNO = 4 (EMPLOYEE)
• Select the employee tuples whose salary is greater than
$30,000:
 SALARY > 30,000 (EMPLOYEE)

118
Unary Relational Operations:
PROJECT
 PROJECT Operation is denoted by  (pi)
<attribute list>(R)
 Example: To list each employee’s first and last
name and salary, the following is used:
LNAME, FNAME,SALARY(EMPLOYEE)

119
Unary Relational Operations:
RENAME
 RENAME operation  (rho) can be expressed
by any of the following forms:
• S (B1, B2, …, Bn )(R) changes both:
• the relation name to S, and
• the column (attribute) names to B1, B1, …..Bn
• S(R) changes:
• the relation name only to S
• (B1, B2, …, Bn )(R) changes:
• the column (attribute) names only to B1, B1,
…..Bn

120
Relational Algebra Operations from
Set Theory: UNION
 UNION Operation
• Binary operation, denoted by 
• The result of R  S, is a relation that includes
all tuples that are either in R or in S or in both R
and S
• Duplicate tuples are eliminated
• The two operand relations R and S must be “type
compatible” (or UNION compatible)

121
Relational Algebra Operations from Set
Theory: INTERSECTION

 INTERSECTION is denoted by 
 The result of the operation R  S, is a
relation that includes all tuples that are in
both R and S
 The two operand relations R and S must be
“type compatible”

122
Relational Algebra Operations from
Set Theory: SET DIFFERENCE (cont.)
 SET DIFFERENCE (also called MINUS or
EXCEPT) is denoted by –
 The result of R – S, is a relation that includes all
tuples that are in R but not in S
 The two operand relations R and S must be “type
compatible”

123
Relational Algebra Operations from
Set Theory: CARTESIAN PRODUCT
 CARTESIAN (or CROSS) PRODUCT
Operation
• Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . .,
Bm)
• Result is a relation Q with degree n + m
attributes:
• Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that
order.
• Hence, if R has nR tuples (denoted as |R| = nR ),
and S has nS tuples, then R x S will have nR *
nS tuples
• The two operands do NOT have to be "type
124
compatible”
Binary Relational Operations: JOIN
 JOIN Operation (denoted by )
R <join condition>S

 The general case of JOIN operation is called a


Theta-join: R S
theta
 A join, where the only comparison operator
used is =, is called an EQUIJOIN
 NATURAL JOIN Operation - denoted by *
• NATURAL JOIN was created to get rid of the
second (superfluous) attribute in an EQUIJOIN
condition
 The OUTER JOIN Operation
125
Binary Relational Operations:
DIVISION
 DIVISION Operation
• The division operation is applied to two
relations R(Z)S(X), where Z = X  Y (Y is the
set of attributes of R that are not attributes of S
• The result of DIVISION is a relation T(Y) that
includes a tuple t if tuples tR appear in R with tR
[Y] = t, and with
tR [X] = ts for every tuple ts in S, i.e., for a tuple t
to appear in the result T of the DIVISION, the
values in t must appear in R in combination with
every tuple in S

126
Additional Relational Operations

 Aggregate Functions and Grouping


• Common functions applied to collections of
numeric values include SUM, AVERAGE,
MAXIMUM, and MINIMUM. The COUNT
function is used for counting tuples or values

128
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline
 DDL: Data Definition Language
• Create
• Alter
• Drop
 DML: Data Manipulation Language
• Select
• Insert
• Update
• Delete
 DCL: Data Control Language
• Commit
• Rollback
• Grant,
• Revoke
Dr. Dang Tran Khanh, Faculty of CSE, HCMUT ([email protected]) 130
DDL: Create, Alter, Drop
CREATE SCHEMA

 CREATE SCHEMA SchemaName


AUTHORIZATION AuthorizationIdentifier;

CREATE SCHEMA Company AUTHORIZATION


JSmith;

131
DDL: Create, Alter, Drop
CREATE TABLE
CREATE TABLE TableName
{(colName dataType [NOT NULL] [UNIQUE]
[DEFAULT defaultOption]
[CHECK searchCondition] [,...]}
[PRIMARY KEY (listOfColumns),]
{[UNIQUE (listOfColumns),] […,]}
{[FOREIGN KEY (listOfFKColumns)
REFERENCES ParentTableName [(listOfCKColumns)],
[ON UPDATE referentialAction]
[ON DELETE referentialAction ]] [,…]}
{[CHECK (searchCondition)] [,…] })

132
DDL: Create, Alter, Drop
CREATE TABLE
 Default values
 Primary key and referential integrity constraints
• referential triggered action clause of FK constraint:
ON DELETE <action>
ON UPDATE <action>
<action>: SET NULL, CASCADE, SET DEFAULT
 Giving names to constraints
 Specifying constraints on tuples using CHECK

133
DDL: Create, Alter, Drop
DROP Command

 Used to drop named schema elements: tables,


domains, constraints, and the schema itself
DROP SCHEMA Company CASCADE;
(RESTRICT)
DROP TABLE Dependent CASCADE;
(RESTRICT)
▪ Similarly, we can drop constraints & domains

134
DDL: Create, Alter, Drop
ALTER Command
 Base tables: adding or dropping a column or
constraints, changing a column definition. Example:
ALTER TABLE Company.Employee ADD Job VARCHAR(15)
NOT NULL;

135
DML: Select, Insert, Update, Delete
SELECT
SELECT [DISTINCT | ALL]
{* | [columnExpression [AS newName]] [,...] }
FROM TableName [alias] [, ...]
[WHERE condition]
[GROUP BY columnList] [HAVING condition]
[ORDER BY columnList]

136
DML: Select, Insert, Update, Delete
Insert

 In its simplest form, it is used to add one or


more tuples to a relation
INSERT INTO TABLE_NAME
VALUES (LIST_OF_VALUES)

137
DML: Select, Insert, Update, Delete
Delete
 Removes tuples from a relation
 Includes a WHERE-clause to select the tuples to be deleted
 Tuples are deleted from only one table at a time (unless
CASCADE is specified on a referential integrity constraint)
 A missing WHERE-clause specifies that all tuples in the relation
are to be deleted; the table then becomes an empty table
 The number of tuples deleted depends on the number of tuples in
the relation that satisfy the WHERE-clause

138
DML: Select, Insert, Update, Delete
Update
 Used to modify attribute values of one or more
selected tuples
 A WHERE-clause selects the tuples to be modified
 An additional SET-clause specifies the attributes to be
modified and their new values
 Each command modifies tuples in the same relation
 Referential integrity should be enforced

139
Advanced DDL: Assertions & Triggers

 ASSERTIONs to express constraints that do


not fit in the basic SQL categories
 Mechanism: CREATE ASSERTION
• components include: a constraint name,
followed by CHECK, followed by a condition
 TRIGGERs: to specify the type of action to be
taken as certain events occur & as certain
conditions are satisfied

140
VIEWs
 SQL command: CREATE VIEW
• a view (table) name
• a possible list of attribute names
• a query to specify the view contents
 Specify a different WORKS_ON table (view)
CREATE VIEW WORKS_ON_NEW AS
SELECT FNAME, LNAME, PNAME, HOURS
FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE SSN=ESSN AND PNO=PNUMBER

141
DCL: Commit, Rollback, Grant,
Revoke
 Chapter 17: Transaction Processing
 Chapter 23: DB security

142
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline
 Introduction
 Functional dependencies (FDs)
• Definition of FD
• Direct, indirect, partial dependencies
• Inference Rules for FDs
• Equivalence of Sets of FDs
• Minimal Sets of FDs
 Normalization
• 1NF and dependency problems
• 2NF – solves partial dependency
• 3NF – solves indirect dependency
• BCNF – well-normalized relations

144
Introduction
 “Goodness” measures:
• Redundant information in tuples
• Update anomalies: modification, deletion,
insertion
• Reducing the NULL values in tuples
• Disallowing the possibility of generating
spurious tuples

145
Functional Dependencies
(FDs)
 Functional dependencies (FDs) are used to
specify formal measures of the "goodness" of
relational designs
 FDs and keys are used to define normal
forms for relations
 X -> Y holds if whenever two tuples have the
same value for X, they must have the same
value for Y
 Examples:
• social security number determines employee name:
SSN -> ENAME
• project number determines project name and location:
PNUMBER -> {PNAME, PLOCATION}
• employee ssn and project number determines the hours per
week that the employee works on the project:
{SSN, PNUMBER}
146 -> HOURS
Functional Dependencies
(FDs)
 Direct dependency (fully functional dependency): All
attributes in a R must be fully functionally dependent
on the primary key (or the PK is a determinant of all
attributes in R)

Performer-id Performer-
name
Performer-type

Performer-
location

147
Functional Dependencies
(FDs)
 Indirect dependency (transitive
dependency): Value of an attribute is not
determined directly by the primary key
Performer-
Performer-id
name

Performer-
Fee
type

Performer-
location

148
Functional Dependencies
(FDs)
 Partial dependency
• Composite determinant - more than one value is required to
determine the value of another attribute, the combination of values is
called a composite determinant
EMP_PROJ(SSN, PNUMBER, HOURS, ENAME, PNAME, PLOCATION)
{SSN, PNUMBER} -> HOURS

• Partial dependency - if the value of an attribute does not depend on


an entire composite determinant, but only part of it, the relationship is
known as the partial dependency
SSN -> ENAME
PNUMBER -> {PNAME, PLOCATION}

149
Functional Dependencies (FDs)
 Inference Rules for FDs
Armstrong's inference rules:
IR1. (Reflexive) If Y  X, then X -> Y
IR2. (Augmentation) If X -> Y, then XZ -> YZ
(Notation: XZ stands for X U Z)
IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z

Some additional inference rules that are useful:


(Decomposition) If X -> YZ, then X -> Y and X -> Z
(Union) If X -> Y and X -> Z, then X -> YZ
(Psuedotransitivity) If X -> Y and WY -> Z, then WX -> Z

150
Functional Dependencies (FDs)

 Two sets of FDs F and G are equivalent if F+ = G+


 Definition: F covers G if G+  F+. F and G are
equivalent if F covers G and G covers F
 There is an algorithm for checking equivalence of sets
of FDs (see chapter 10 [1])

151
Functional Dependencies (FDs)

 A set of FDs is minimal if it satisfies the


following conditions:
(1)Every dependency in F has a single attribute for its RHS.
(2)We cannot remove any dependency from F and have a set of
dependencies that is equivalent to F.
(3)We cannot replace any dependency X -> A in F with a
dependency Y -> A, where Y proper-subset-of X ( Y subset-of
X) and still have a set of dependencies that is equivalent to F

152
Normalization
 Normalization: The process of decomposing
unsatisfactory "bad" relations by breaking up
their attributes into smaller relations
 Two new concepts:
• A Prime attribute must be a member of some
candidate key
• A Nonprime attribute is not a prime attribute: it
is not a member of any candidate key

153
Normalization
 1NF and dependency problems
 2NF – solves partial dependency
 3NF – solves indirect dependency
 BCNF – well-normalized relations

154
Normalization
 First normal form (1NF): Disallows composite
attributes, multivalued attributes, and nested
relations

155
Normalization
 Second normal form (2NF) - all attributes must
be fully functionally dependent on the
primary key

156
Normalization
 A relation schema R is in third normal form
(3NF) if it is in 2NF and no non-prime attribute
A in R is transitively dependent on the
primary key
NOTE:
In X -> Y and Y -> Z, with X as the primary key, we
consider this a problem only if Y is not a candidate
key. When Y is a candidate key, there is no problem
with the transitive dependency .
E.g., Consider EMP (SSN, Emp#, Salary ).
Here, SSN -> Emp# -> Salary and Emp# is a candidate
key

157
General Normal Form Definitions
 A relation schema R is in second normal form (2NF)
if every non-prime attribute A in R is fully functionally
dependent on every key of R
 A relation schema R is in third normal form (3NF) if
whenever a FD X -> A holds in R, then either:
(a) X is a superkey of R, or
(b) A is a prime attribute of R

158
Normalization
 A relation schema R is in Boyce-Codd
Normal Form (BCNF) if whenever an
FD X -> A holds in R, then X is a
superkey of R

159
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications
Outline
 Data Storage
• Disk Storage Devices
• Files of Records
• Operations on Files
• Unordered Files
• Ordered Files
• Hashed Files
• RAID Technology
 Indexing Structures for Files
• Types of Single-level Ordered Indexes
• Multilevel Indexes
• Dynamic Multilevel Indexes Using B-Trees and B+-Trees
• Indexes on Multiple Keys

161
Disk Storage Devices
 Disks are divided into concentric circular
tracks on each disk surface.
 A track is divided into smaller blocks or
sectors
 A read-write head moves to the track that
contains the block to be transferred
 A physical disk block address consists of:
• a cylinder number
• the track number or surface number (within the
cylinder)
• and block number (within track).

162
Disk Storage Devices (contd.)

163
Disk Storage Devices (contd.)

164
Disk Storage Devices
 Records: Fixed and variable length records
• Records contain fields which have values of a
particular type
 Blocking: Refers to storing a number of
records in one block on the disk.
• Blocking factor (bfr) refers to the number of
records per block.
 Spanned Records:
• Refers to records that exceed the size of one or
more blocks and hence span a number of
blocks.

165
Files of Records

 A file is a sequence of records, where each


record is a collection of data values (or data
items).
 A file descriptor (or file header) includes
information that describes the file
 A file can have fixed-length records or
variable-length records.
 File records can be unspanned or spanned

166
Unordered Files

 Also called a heap or a pile file.


 New records are inserted at the end of the file.
 A linear search through the file records is
necessary to search for a record.
• This requires reading and searching half the file
blocks on the average, and is hence quite
expensive.
 Record insertion is quite efficient.
 Reading the records in order of a particular
field requires sorting the file records.

167
Ordered Files
 Also called a sequential file.
 File records are kept sorted by the values of an
ordering field.
 Insertion is expensive: records must be inserted in
the correct order.
• It is common to keep a separate unordered overflow (or transaction)
file for new records to improve insertion efficiency; this is periodically
merged with the main ordered file.
 A binary search can be used to search for a record
on its ordering field value.
• This requires reading and searching log2 of the file blocks on the
average, an improvement over linear search.
 Reading the records in order of the ordering field is
quite efficient.
168
Indexes as Access Paths
 A single-level index is an auxiliary file that
makes it more efficient to search for a record in
the data file.
 The index is usually specified on one field of
the file (although it could be specified on
several fields)
 One form of an index is a file of entries <field
value, pointer to record>, which is ordered by
field value
 The index is called an access path on the field.

169
Types of Single-Level Indexes

 Primary Index
• Defined on an ordered data file
• The data file is ordered on a key field
• Includes one index entry for each block in the
data file; the index entry has the key field value
for the first record in the block, which is called
the block anchor
• A similar scheme can use the last record in a
block.
• A primary index is a nondense (sparse) index,
since it includes an entry for each disk block of
the data file and the keys170
of its anchor record
rather than for every search value.
Primary index on the ordering key field

171
Types of Single-Level Indexes

 Clustering Index
• Defined on an ordered data file
• The data file is ordered on a non-key field
unlike primary index, which requires that the
ordering field of the data file have a distinct
value for each record.
• Includes one index entry for each distinct value
of the field; the index entry points to the first
data block that contains records with that field
value.
• It is another example of nondense index where
Insertion and Deletion is relatively
172
straightforward with a clustering index.
A Clustering Index Example

 FIGURE 14.2
A clustering
index on the
DEPTNUMBER
ordering non-key
field of an
EMPLOYEE file.

173
Another Clustering Index Example

174
Types of Single-Level Indexes
 Secondary Index
• A secondary index provides a secondary means of
accessing a file for which some primary access
already exists.
• The secondary index may be on a field which is a
candidate key and has a unique value in every
record, or a non-key with duplicate values.
• The index is an ordered file with two fields.
• The first field is of the same data type as some non-
ordering field of the data file that is an indexing field.
• The second field is either a block pointer or a record
pointer.
• There can be many secondary indexes (and hence,
indexing fields) for the same file.
• Includes one entry for each record in the data file;
hence, it is a dense index 175
Example of a Dense Secondary Index

176
An Example of a Secondary Index

177
Properties of Index Types

178
Multi-Level Indexes
 Because a single-level index is an ordered file, we can
create a primary index to the index itself;
• In this case, the original index file is called the first-level
index and the index to the index is called the second-
level index.
 We can repeat the process, creating a third, fourth, ...,
top level until all entries of the top level fit in one disk
block
 A multi-level index can be created for any type of first-
level index (primary, secondary, clustering) as long as
the first-level index consists of more than one disk
block

179
A Two-level Primary Index

180
Multi-Level Indexes

 Such a multi-level index is a form of search


tree
• However, insertion and deletion of new index
entries is a severe problem because every level
of the index is an ordered file.

181
A Node in a Search Tree with Pointers
to Subtrees below It
 FIGURE 14.8

182
Difference between B-tree and B+-tree

 In a B-tree, pointers to data records exist at all


levels of the tree
 In a B+-tree, all pointers to data records exists
at the leaf-level nodes
 A B+-tree can have less levels (or higher
capacity of search values) than the
corresponding B-tree

183
Q&A

184
Review

Trong đại số quan hệ, những toán tử nào sau


đây có tính giao hoán (commutative)
A. Hội (UNION)
B. Giao (INTERSECTION)
C. Hiệu (DIFFERENCE)
D. Tích đề-các (CARTESIAN PRODUCT)
E. Chọn (SELECT)
F. Chiếu (PROJECT)

185
Review

Chọn phát biểu ĐÚNG. Hai quan hệ R(A1, A2,


...Am) và S(B1, B2, ..., Bn) là tương thích kiểu (
type compatible) nếu:
A. m = n
B. R và S có cùng số lượng record
C. m = n và dom(Ai) = dom(Bi) với i = 1..m
D. Tất cả câu trên đều đúng

186
Review

Phép kết (join) nào sau đây đòi hỏi mỗi cặp
thuộc tính kết (join attributes) phải có cùng tên
trên hai quan hệ tham gia vào phép kết:
A.Theta join
B. Equijoin
C. Natural join
D. B và C đúng

187
Review

Trong SQL, khi áp dụng hàm AVG (tính giá trị


trung bình), giá trị NULL sẽ được xử lý:
A.Thay giá trị null bằng 0 trước rồi mới thực
hiện phép tính
B. Bỏ qua các hàng có giá trị null
C.Báo lỗi là không tính được do có giá trị Null
D. Tất cả đều sai

188
Review

Những đặc điểm (tiêu chí) nào sao đây dùng để


đánh giá một cơ sở dữ liệu được thiết kế tốt
(goodness):
A.Loại bỏ hoàn toàn giá trị NULL trong các
bảng
B.Giảm thiểu giá trị NULL trong các bảng
C.Không xảy ra những bất thường khi cập nhật
dữ liệu (update anomalies)
D.Giảm thiểu những bất thường khi cập nhật
dữ liệu

189
Review

SQL là ngôn ngữ:


A.Thủ tục
B.Phi thủ tục

190
Review

Primary index là chỉ mục được chỉ định trên:


A. Trường khóa (key field) với các mẩu tin (record)
được sắp thứ tự vật lý trên trường này, và dữ liệu
trên trường này được phép trùng
B. Trường khóa (key field) với các mẩu tin (record)
được sắp thứ tự vật lý trên trường này, và dữ liệu
trên trường này không được trùng
C. Trường khóa (key field) với các mẩu tin (record)
không được sắp thứ tự vật lý trên trường này, và
dữ liệu trên trường này được phép trùng
D. Trường khóa (key field) với các mẩu tin (record)
không được sắp thứ tự vật lý trên trường này, và
dữ liệu trên trường này không được phép trùng
191
Review

Blocking factor (bfr) là:


A.Số field trong từng record
B.Số record trong từng block
C.Số block trong từng track
D.Số field trong từng block

192
Review

Chọn các phát biểu ĐÚNG:


A. Clustering index chỉ được áp dụng đối với file
có sắp xếp thứ tự theo thuộc tính cần đánh
index
B. Chỉ có thể đánh tối đa 3 index trên cùng 1 file
(ứng với 3 loại index)
C. Clustering index là loại dense index
D. Với clustering index, số hàng trong index file
bằng với số block trong data file

193
Review

 Cho quan hệ R(A, B, C, D, E, F, G, H) và tập


phụ thuộc hàm sau:
1) A, B, C → F
2) F → G
3) G → H
4) B, C → D, E
 Tìm khoá, chuẩn hoá: 2NF, 3NF

194
Course Outline
Lecture Content
Database System Concepts & -Introduction to Data Models, Database Systems
Architecture -Three-Level Architecture & Data Independence
-Modern Database Applications
Entity-Relationship (ER) -ER Model
Model -Introduction to Enhanced ER (EER) Model
Relational Model -Relational Data Model
-ER- & EER-to-Relational Mapping
-Relational Algebra
SQL -Data Definition Language (DDL)
-Data Manipulation Language (DML)
-Introduction to Triggers & Stored Procedures
Database Design Theory & -Functional Dependencies & Normalization
Methodology -Relational Database Design: Algorithms
Data Storage, Indexing, Query -Hashing & Indexing Structures (B-tree & R-tree families)
Processing & Physical Design -Physical Database Design
Database Security -Discretionary & Mandatory Access Control (DAC & MAC)
-Flow Control, Inference Problem
-Security Issues in Modern Data Management Systems
Emerging Technologies & -Introduction to XML, Data Mining & Data Warehousing, GIS,
Applications M-Commerce & LBS
-Emerging Database Technologies & Applications

You might also like