0% found this document useful (0 votes)
16 views

DBI202

Uploaded by

thailqhe181201
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

DBI202

Uploaded by

thailqhe181201
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 83

COURSE: Introduction to

Database System
(COURSE CODE: DBI202)

Course Objectives:

● Understand the database concepts and database


management system software
● Understand the relation model of data and Algebraic
Query Language
● Understand the data normalization and apply
normalization techniques in database design
● Be able to model an application’s data requirements using
conceptual modeling tools like ER diagrams and design
database schemas based on the conceptual model
● Be proficient in structure query language including Data
Definition Language(DDL) and Data Manipulation
Language(DML)
● Understand PL/SQL concepts and manipulate with View,
Cursors, Stored Procedures, Functions, Database Triggers
● Apply the Indexes in database design and query
optimization

Recommend Documents: Fundamentals of Database Systems


Seventh Edition

Contents

Chapter 1. The Worlds of Database Systems


Chapter 2. The Relational Model of Data
Chapter 3. Design Theory for Relational Databases
Chapter 4. High-Level Database Models
Chapter 5. Self study
Chapter 6. The Database Language (SQL)
Chapter 7. Practical Issues of database application
Chapter 8. Constraints and T-SQL Programming

Assessment

1) On-going Assessment
- At least 2 progress tests: 10%
- Labs (5): 10%
- 1 assignment: 20%
- 1 practical exam: 30%
2) Final exam (60'): 30%
3) Final Result: 100%

Completion Criteria:
1) Every on-going assessment component >0
2) Final Exam Score >=4 & Final Result >=5

Materials

1. Text Book: First Course in Database Systems - Jeffrey D.


Ullman - Prentice Hall - Third edition
2. Slides, labs, assignment: LMS
Chapter 1: The Worlds of Database
Systems
Objectives

Understand concepts of:


● Information, Data, Database
● Database Management System (DBMS)
● Database System

Contents

1.1 The Evolution of Database Systems


1.2 Overview of Database Management System

1.1 The Evolution of Database Systems


Data:
- Raw fact of things
- No contextual meaning
- Number, text, …
Information:
- Data with exact meaning
- Processed data and organized context

Database
- A collection of information that exists over a long period of
time.
- A collection of related data.
- Managed by a DBMS
Database Management System (DBMS)
- A software package/system to facilitate the creation and
maintenance of a computerized database
Database System
- The DBMS software together with the data itself.
Sometimes, the applications are also included
The DBMS is expected to Early DBMS
1) Allow users to create 1960s, the first DBMS based
new databases and on file system
specify their schemas
2) Give users the ability to Responsibility Yes/no
query the data
3) Support the storage of (1) Limited
very large amounts of
data
4) Enable durability (2) Not directly
5) Control access to data supported
from many users at (3) Yes
once
(4) Not always
supported

(5) No

Hierarchical data model (tree-based model)


- Was used in early mainframe DBMS
- The IBM Information Management System (IMS) is
example of a hierarchical database system
Network data model (graph-
based model)
- Charles Bachman
invented in the late 1960s
- standard specification
published in 1969 by the
Conference on Data
Systems Languages
(CODASYL) Consortium
- The network model
allows each record to
have multiple parent and
child records
→ Not support high-level
query language

Relational Database Systems


- 1970s, Edgar Frank "Ted" Codd defined relational model
based on relations (*)
1. Revolutionary idea of DBMS activity
2. at IBM (System R, DB2)
3. at Universities like Berkeley (Ingres)
- SQL, the most important query language, was developed
by IBM in 1974
- 1979, Oracle v.2, the first commercial RDBMS product using
SQL

Book relation example


→ 2000s-now: NoSQL , newSQL

Smaller and Smaller Systems


- Originally, DBMS’s were large, expensive software running
on large computers
- Today, DBMS can run on PC, Mobile, …
⇒DB systems based on the relational model are available for
even very small machines
Bigger and Bigger Systems
- Size of data has been increasingly continuously
- Many databases store petabytes and serve it all to users

Information Integration
- Join the information contained in many related databases
into a whole
- Example: a large company has many divisions, each
division have built its own database of products and
employees on different DBMS’s and different structures
- How we join these databases without any matters
- Need to build structures on top of existing databases, with
the goal of integrating the information distributed among
them

Information Integration (con’t.)


● Two popular approaches
- Creation of data warehouses, where information from
many databases is copied periodically, with the
appropriate translation, to a central database
- Implementation of a middleware (mediator) that
support an integrated model of the data of the
various databases, while translating between this
model and the actual models used by each database

1.2 Overview of DBMS


Database Management System
- DBMS components
- Database Users
- Database language
- Relational databases
DBMS components

- Single box: system


component
- Double box: memory
data structure
- Solid line: control &
data flow
- Dashed line: data
flow only

Database Users
- Database Administrators, authorize access to database,
coordinate, monitor its use, acquiring software, and
hardware resources, …
- Database Designers, define the content, the structure, the
constraints, and functions or transactions against the
database
- Database End users, use data for queries, reports and
some of them actually update the database content

DDL - Data Definition Language Commands


- DBA needs special authority to execute schema-altering
commands
- Schema-altering commands are known as DDL
commands, and used for defining data structure
- These commands are parsed by a DDL compiler and
passed to the execution engine, then goes through the
index/file/record manager to alter the metadata (schema
information for the database)
- Examples: CREATE, ALTER, DROP
DML - Data Manipulation Language Commands
- Are used by computer programs or DB users to retrieve,
insert, delete, and update data
- Not affect the schema of the database, but affect the
content of the database or extract data from database
- DML has two separate subsystems
- Answering the query
- Transaction processing

1. Answering the query


- Query is parsed and optimized by the query compiler
which the result is query plan
- Query plan is passed to execution engine to execute
2. Transaction processing (will be discussed in the next
chapters)
- Transaction is a group of some database operations.
- Transaction is processed by transaction manager.

The trends of DB design and DBMS


- Non relational databases (NoSQL)
- MongoDB
- Redis
- Multi-model databases
- Oracle database
- Arango DB

Chapter 2: The Relational Model of


Data
Objectives

- Understand what is the relational model and database


design basing relational model.
- Conceptualize data using the relational model.
- Understand what basic relational algebra operators under
set semantics.
- Express queries using relational algebra.

Contents

2.1 An Overview of Data Models


2.2 Basics of the Relational Model
2.3 An Algebraic Query Language

2.1 An Overview of Data Models


● Data model: a collection of concepts for describing data,
including 3 parts:
- Structure of the data
Ex: arrays or objects
- Operations on the data
Queries and modification on data
- Constraints on the data
Limitations on the data

- The relational model, including object-relational


extensions
- The semi-structured data model, including XML and
related standards
- Semi-structured data resembles trees or graphs rather
than tables or arrays
- XML, a way to represent data by hierarchically nested
tagged elements
- Operations involve following paths in tree from an element
to one or more of its nested sub elements, and so on
- Constraints involve the data type of values associated with
a nested tag
2.2 Basics of the Relational Model
● Relational model
- A relation is made up from 2 parts:
1. Schema: specifies name of relation, name of
attributes and domain/type of one’s.
2. Ex: Student(StudentID: string, Name: string,
Registered: int, CounsellorNo: int, Region: int)
- Instance: a table with rows and columns
Rows ~ cardinality; columns ~ degree/arity
● A simple thinking: a relation as a set of distinct rows or
tuples

- Database schema: a set of schemas for the relations of a


database
- An example of DB schema:

● Entity: là đối tượng chính mà ta thu thập thông tin xoay


quanh chúng
Example: STUDENT, COURSE, …
- Entity Type: là một tập hợp các thực thể có cùng tính chất
- Entity Instance: là một trường hợp cụ thể của Entity type
- Strong entity type: thực thể tồn tại độc lập với những
Entity type khác
- Weak entity type: thực thể phụ thuộc vào Entity type khác
Example: EMPLOYEE is Strong entity type
DEPENDENT is Weak entity type, dependence on EMPLOYEE
● Attribute: là một đặc tính của một thực thể hay mối liên
kết
Example: The STUDENT entity has some attributes, such as
Student_ID, Student_Name, Address, Major, …
- Simple attribute: là thuộc tính không thể phân nhỏ được
Example: color, weight, …
- Composite attribute: là thuộc tính có thể phân thành
nhiều thành phần
Example: the Address attribute includes some components:
Street, District, City, …
- Multi-valued attribute: thuộc tính đa giá trị
Example: the COURSE entity has Teacher, one multivalued
attribute. Một môn học có thể được dạy bởi nhiều hơn 1
giảng viên
- Key attribute: is an attribute or a combination of attributes
that can determine the individual instance of an entity
type
Example: Student_ID is an key attribute of the STUDENT entity
- Non-key attribute: là một thuộc tính mà không phải là
một phần của Key attribute hay Candidate key
Example: a relation STUDENT
Key attributes or Candidate keys can be included
StudnetID, Email, ..
FirstName, Lastname, DateofBirth, .. are non-key attributes
- Derived- attribute: thuộc tính suy diễn - thuộc tính được
suy diễn từ các thuộc tính khác trong cùng một bảng

- Candidate key: là một hoặc một nhóm các thuộc tính


trong một bảng mà có thể được sử dụng để định rõ và duy
nhất xác định mỗi hàng trong bảng. Một bảng có thể có
nhiều Candidate key, nhưng thường chỉ một trong số
chúng được chọn làm Primary key để đại diện cho mỗi
hàng trong bảng
Example: a relation EMPLOYEES, there are some Candidate
keys, such as: EmployeeID, Email, SocialSecurityNumber, …
- Primary key: là một thuộc tính hoặc một tập hợp các
thuộc tính trong một bảng mà định rõ và duy nhất xác
định mỗi hàng (record) trong bảng đó
Đặc điểm của Primary key: Unique (duy nhất), Immutable
(không thể thay thế), Not Null (không rỗng)
Example: in a relation STUDENT, there is an Primary key -
StudentID

- Foreign key: là một hoặc một tập hợp các thuộc tính trong
một bảng mà là Primary của một Relation khác

2.3 An ALgebraic Query Language


Relational Algebra
- An algebra consists of operators and atomic operands
- Relational algebra is an example of an algebra, its atomic
operands are
1. Variables that stand for relations
2. Constants, which are finite relations
- Relational algebra is a set of operations on relations
- Operations operate on one or more relations to create new
relation

Relational algebra fall into four classes


- Set operations – union, intersection, difference
- Selection and projection
- Cartesian product and joins
- Rename

● Set operations
R and S must be ‘type compatible’:
1. The same number of attributes
2. The domain of corresponding attributes must be
compatible
- Union
R ∪ S = { t | t ∈ R ∨ t ∈ S}
- Intersection
R ∩ S = { t | t ∈ R ⴷ t ∈ S}
- Difference
R \ S = { t | t ∈ R ⴷ t ∉ S}
- Intersection can be expressed in terms of set
difference
R∩S=R\(R\S)

Set operations Example:

Relation R

Relation S

R∪S

R∩S
R\S

Selection and Projection


● SELECTION
- R1 := σC (R2) with C illustrated conditions
- ex: s <C1>(s < C2> ( R)) = s <C2> (s < C1> ( R)) = s <C1> AND < C2>

Movies

σlength>100(Movies)

● PROJECTION
- S := πA1,A2,…,An (R)
- A1,A2,…,An are attributes of R
- S relation schema S(A1,A2,…,An)
Movies

𝞹title,year,length(Movies)

𝞹genre(Movies)
● Cartesian product and Joins
Cartesian product R3 := R1 X R2

Relation R Relation S Cartesian Product R X S


Theta joins R3 := R1 ⋈<join condition> R2

Relation U Relation V

Figure 2.17: Result of U⋈<A<D> V

Result of U⋈<A<D AND U.B≠V.B> V

Theta joins R3 := R1 ⋈R2

Relation R Relation S Natural Join R ⋈ S


● Rename
- The ρ operation gives a new schema to a relation
- ρS(A1,…,An)(R) makes S be a relation with attributes A1,…,An
and the same tuples as R
- Simplified notation: S:=R (A1,A2,…,An)

Relation R Relation S R X ρS(X,C,D) (S)


Relation Expression
- How we need relational expression
- Relational algebra allows us to form expressions
- Relational expression is constructed by applying
operations to the result of other operations
- Expressions can be presented as expression tree

The role of relational algebra in a DBMS


Example: What are the titles and years of movies made by Fox
that are at least 100 minutes long?
(1) Select those Movies tuples that have length ≥ 100
(2)Select those Movies tuples that have studioName=’Fox’
(3)Compute the intersection of (1) and (2)
(4) Project the relation from (3) onto attributes title and
year

Relation Expression

Figure 2.18: Expression tree for a relational algebra expression

𝞹title,year(𝝈length³100 (Movies) ∩ 𝝈studioName=‘Fox’ (Movies))


𝞹title,year(𝝈length³100 AND studioName=‘Fox’ (Movies))
Exercise 1:

a) What PC models have a speed of at least 3.00?


b) Which manufacturers make laptops with a hard disk of at
least 100GB?
c) Find the model number and price of all products (of any
type) made by manufacturer B.
d) Find the model numbers of all colour laser printers.
e) Find those manufacturers that sell Laptops, but not PC’s
f) Find those manufacturers that sell all models of PC’s and
laser printer
g) Find those manufacturers whose laptops have all ram sizes
that manufacturers B’s laptop have

Exercise 2:
Product (ProductCode, Name, PurchasePrice, SellPrice, Type,
SupplierCode)
Supplier (SupplierCode, SupplierName, Address)
Employee (EmployID, FullName, Gender, BirthDate, Address)
Invoice (InvoiceID, SellDate, EmployeeID)
InvoiceLine (ProductCode, InvoiceID, Quantity)

Write expressions of relational algebra to answer the following


queries:
a. Find name and sell price of televisions supplied by
Samsung
b. Find name and address of all suppliers who supply
television product
c. Find name of all employee who were born in 1983
d. Find name and type of all products sold in ‘23/05/2018’
e. Find name of female employees who sold televisions
f. Find name and address of suppliers who supply both
television and mobile
g. List name and price of all product sold bt employee
“Nguyen Van A” in April 2018
h. Find name and price of all mobile products of Samsung
sold in April 2018
i. Find the product with highest SellPrice
j. Find the amount (quantity * sellPrice) of each invoice line
of product sold in 30/04/2018
Exercise 3: We have the database schema consists of
five relations:
Movies (title, year, length, genre, studioName, produce#)
StarIn (movieTitle, movieYear, starName)
MovieStar (name, address, gender, birthdate)
MovieExec (producer#, name, address netWorth)
Studio (name, address, presC#)
Write expressions of relational algebra to answer the following
queries:
a. Find the title and genre of movies produced by Disney in
1990
b. Find address and date of birth of Tom Cruise who is a
movie star
c. Find the address of studio Film City
d. List name of the female star in the film ‘Bloody Moon’
e. Find name of the producer of Star Wars
f. Find name of executives that are worth more than
Subhash Ghai
g. Find the title of movies that are no longer than ‘One Piece
Live Action’
h. List all the titles and years of movies that appeared in both
the Movies and StarIn relation

Chapter 3: Design Theory for


Relational Databases
Objectives

Understand concepts of:


- Functional Dependencies
- Normalization
- Decomposition
- Multi-valued Dependencies
Contents

3.1 Functional Dependencies & Rules about FDs


3.2 Key & Super-Key
3.3 Normal forms

3.1 Functional dependency


- A functional dependency: constraint between two sets of
attributes in a relation
- A set of attributes X (include A1A2…An) in R functionally
determine another attribute Y (include B1B2…Bm), also in R,
(written X → Y) if and only if each X value is associated with
precisely one Y value
- A functional dependency A1A2…An → B1B2…Bm holds on
relation R if two tuples of R agree on all of the attributes A1,
A2, …, An then they must also agree on all of the attributes
B1, B2, …, Bm

Easy to see that: the following FD is true


- title,year → length, genre, studioName
Exercise: How about the FD
- title,year → starName

Title, year → starName does not hold in Movies1 relation

3.2. Key and Super-key


- Review key of relation, candidate keys(alternate keys),
primary key

Definition of Candidate Key in DBMS:


- A super key with no redundant attribute is known as
candidate key. Candidate keys are selected from the set of
super keys, the only thing we take care while selecting
candidate key is that the candidate key should not have
any redundant attributes. That’s the reason they are also
termed as minimal super key.
- Candidate Key Example:
Let's take an example of table “Employee”. This table has
three attributes: Emp_Id, Emp_Number & Emp_Name. Here
Emp_Id & Emp_Number will have unique values and
Emp_Name can have duplicate values as more than one
employee can have the same name.

Emp_Id Emp_Number Emp_Name

E01 2264 Steve

E22 2278 Ajeet

E23 2288 Chaitanya

E45 2290 Robert

- Super-key
1. A set of attributes that contains a key is called a super-key
2. Every super-key satisfies the first condition of a key: it
functionally determines all other attributes of the relation
3. If K is a key, L is a super-key, then: K ⊆ L
4. A key is also a super key

Armstrong’s Axioms
- Fundamental Rules: Let X, Y, Z are sets of attributes
1. Reflexivity: if X is a subset of Y, then Y → X
2. Augmentation: if X → Y, then XZ → YZ for any Z
3. Transitivity: if X → Y and Y → Z, then X → Z

- Additional rules: Let X, Y, Z, W are sets of attributes


1. Union/Combining: if X → Y and X → Z then X → YZ
2. Decomposition/Splitting: if X → YZ,then X → Y and X → Z
3. Pseudo transitivity: if X → Y and WY → Z then WX → Z

- Trivial FDs: right side is a subset of left side


Ex: FLD → FD
- A set of FD’s S follows from a set of FD’s T if every relation
instance that satisfies all the FD’s in T also satisfies all the
FD’s in S
- Two sets of FD’s S and T are equivalent if and only if S
follows from T, and T follows S

The Closure of Attributes


- The closure of a set of attributes {A1, A2, …, An} under FD’s in
+
S (denoted {A1, A2, …, An} ) is the set of attributes B such
that every relation that satisfies all the FD’s in set also
satisfies A1A2…An → B
- That is, A1A2…An → B follows from the FD’s of S
- A1, A2, …, An ∈ {A1, A2, …, An}+, because A1A2…An → Ai is trivial

Algorithm 3.7: Closure of a set of attributes


- Input: A set of attributes {A1, A2, …, An} and a set of FD’s S
+
- Output: The closure {A1, A2, …, An}
1. If necessary, split the FD’s of S, so each FD in S have
singleton right side
2. Let X be a set of attributes that will become the
closure. Initialize X to be {A1, A2, …, An}
3. Repeatedly search for some FD:B1B2…Bm → C, such
that B1, B2, …, Bm are in X, but C is not
a) If such C is found, add to X, and repeat the
search
b) If such C is not found, no more attributes can be
added to X
4. The set X is the correct value of {A1, A2, …, An}+

Example: R(A, B, C, D)
S = {A → B, B → C, C → D, D → A}
+ +
Compute {A} ? {B} ?
What are some the keys of R?

Closing Sets of Functional Dependencies

Suppose a set of FD’s S, any set of FD’s T equivalent to S is said


to be a basis for S
Then we say T is a basis for S

Just work with only FD’s that have singleton right sides
A minimal basis for FD’s S is a basis B that satisfies three
conditions:
- All the FD’s in B have singleton right sides
- If any FD is removed from, the result is no longer a basis
- If for any FD in B we remove one or more attributes from
the left side, the result is no longer a basis

Example:
- R(A, B, C)
- S = {A → B, A → C, B → A, B → C, C → A, C → B, AB → C, BC → A, AC
→ B, A → BC, B → AC, C → AB}
- R and its FD’s have several minimal basis
1. {A → B, B → A, B → C, C → B}, or
2. {A → B, B → C, C → A}

What happens to …
… a set of FD’s S of R when we project R on some attributes?
That is, suppose a relation R with set of FD’s S, and R1=πL(R).
What FD’s hold in R1?

Projecting Functional Dependencies


● To find a functional dependencies of projection, we
- Follow from S, and
- Involve only attributes of R1
● Algorithm 3.12: Projecting a Set of FD’s
- Input: R, R1=πL(R), S a set of FD’s that hold in R
- Output: the set of FD’s that hold in R1
- Method:
1. T is the set of FD’s that hold in R1. Initially, T is
empty
2. For each set of attributes X or R1, compute X+.
Add to T all non-trivial FD’s X → A such that A is
both in X+ and an attribute of R1
3. Construct a minimal basis from T

● Algorithm 3.12: Projecting a Set of FD’s (cont)


- Compute a minimal basis from T
1. If there is an FD F in T that follows from other
FD’s in T, then remove F from T
2. Let Y → B is a FD in T, with at least two attributes
in Y, and let Z is Y with one of its attributes
removed:
If Z → B follows from the other FD’s in T
(including Y →B), then replace Y → B by Z → B
3. Repeat the above steps in all possible ways until
no more changes to T can be made
● Two notations
(1) Closing the empty set and the set of all attributes cannot
yield a nontrivial FD
(2) If we have already knew that the closure of some set X is
all attributes, then we cannot discover any new FD’s by
closing superests of X

Example: Suppose R(A,B,C,D) has FD’s A→B, B→C, and C→D.


R1=πA,C,D(R). Find the FD’s of R1?
- Compute the closure of the singleton set
1. {A}+={A, B, C, D}, and B is not in R1, then new FD’s A→C,
A→D
2. {C}+={C,D}, then new FD’s C → D
3. {D}+={D}, no new FD’s
- Compute the closure of the doubleton set
1. Since {A}+ include all attributes, no care any more for
supersets of {A}
2. {C,D}+={C,D}, no new FD’s holds in R1
- Finally, there are three FD’s A→C, A→D, C→D hold in R1
- A→D is transitive from A→C, and C→D
- So, minimal basis is {A→C, C→D}

Anomalies introduction
- Careless selection of a relational database schema can lead
to redundancy and related anomalies
- So, in this session we shall tackle the problems of relational
database designing
- Problems such as redundancy that occur when we try to
cram too much into a single relation are called “anomalies”

The principal kinds of anomalies that we encounter are:


Redundancy: information maybe repeated unnecessarily in
several tuples (exp: the length and genre)

Update Anomalies: We may change information in one tuple


but leave the same information unchanged in another (exp: if
we found that Star Wars is 125 minutes long, we may change
the length in the first tuple but not in the second and third
tuples)

Deletion Anomalies: If a set of values becomes empty, we may


lose other information as a side effect (exp: if we delete “Fox”
from the set of studios, then we have no more studios for the
movie “Star Wars”

Decomposition
● The accepted way to eliminate anomalies í the
decomposition of relations
● Decomposition of a relation R involves splitting the
attributes of R to make the schemas of 2 new relations
- Definition: Given a relation R(A1,..,An), we say R is
decomposed into S(B1,..,Bm) and T(C1,..,Ck) if:
1. {A1,..,An} = {B1,..,Bm} U {C1,..,Ck}
2. S = ∏B1,..Bm(R)
3. T = ∏C1,..,Ck(R)
Example:


And

Discuss:
- The redundancy is eliminated (the length of each film
appears only once)
- The risk of an update anomaly is gone (we only have to
change the length of Star Wars in one tuple)
- The risk of a deletion anomaly is gone (if we delete all the
stars for Gone with the wind, that deletion makes the
movie disappear from the right but still be found in the
left)

Decomposition: The Good, Bad and Ugly


- We observed that before we decompose a relation schema
into BCNF, it can exhibit anomalies; That’s the “Good”

- However, decomposition can also have some bad:


1. Maybe we can’t recovery the original information; OR
2. After reconstruction, the FDs maybe not hold

Example: Loss of information after decomposition




- Suppose we have R(A,B,C) but neither of the FD’s B->A nor
B->C holds.
- R is decomposed into R1 and R2 as above
- When we try to re-construct R by Natural Join of R1 and R2,
we have: R3 = R1 X R2 (but R3 <> R1 => We lost information)

Example: Dependency Loss


If we check the projected FD’s in the relations of the
decomposition, can we can be sure that when we reconstruct
the original relation from the decomposition by joining, the
result will satisfy the original FD’s?

3.3 Normal Forms


Definitions:
- Multivalued Attributes (or repeating groups): non-key
attributes or groups of non-key attributes the values of
which are not uniquely identified by (directly or indirectly)
(not functionally dependent on) the value of the Primary
Key (or its part).
- Partial Dependency – when a non-key attribute is
determined by a part, but not the whole, of a COMPOSITE
primary key.
- Transitive Dependency – when a non-key attribute
determines another non-key attribute.

1NF
1NF A relation R is in first normal form (1NF) if and only if all
underlying domains contain atomic values only
Take the following table:

StudentID is the primary key.


Is it 1NF?
No. There are repeating groups (subject, subjectcost, grade)

How can you make it 1NF?

Create new rows so each cell contains only one value

But now look – is the studentID primary key still valid?


No – the studentID no longer uniquely identifies each row.

You now need to declare studentID and subject together to


uniquely identify each row.
So the new key is StudentID and Subject.
So. We now have 1NF. Is it 2NF?

2NF
A relation R is in second normal form (2NF) if and only if it is in
1NF and every non-key attribute is fully dependent on the
primary key

StudentName & Address are dependent on studentID (which is


part of the key)
But they are not dependent on Subject (the other part of the
key)

And 2NF requires…

All non-key fields are dependent on the ENTIRE key (studentID


+ subject)
So it’s not 2NF

How can we fix it?


Make new tables
1. Make a new table for each primary key field
2. Give each new table its own primary key
3. Move columns from the original table to the new table that
matches their primary key…

STEP 1

STUDENT TABLE (key = StudentID)


________________________________________________________________
STEP 2

STUDENT TABLE (key = StudentID)


SUBJECTS TABLE (key = Subject)
________________________________________________________________
STEP 3

STUDENT TABLE (key = StudentID)

SUBJECTS TABLE (key = Subject)

RESULTS TABLE (key = StudentID+Subject)


________________________________________________________________
STEP 3

STUDENT TABLE (key = StudentID)

SUBJECTS TABLE (key = Subject)

RESULTS TABLE (key = StudentID+Subject)

________________________________________________________________
STEP 4 - cardinality
STUDENT TABLE (key = StudentID)
SUBJECTS TABLE (key = Subject)

RESULTS TABLE (key = StudentID+Subject)

*NOTES:
1. Each student can only appear ONCE in the student table
2. Each subject can only appear ONCE in the subjects table
3. A subject can be listed MANY times in the results table (for
different students)
4. A student can be listed MANY times in the results table (for
different subjects)

A 2NF check:
STUDENT TABLE (key = StudentID)

SUBJECTS TABLE (key = Subject)

RESULTS TABLE (key = StudentID+Subject)


- SubjectCOst is only dependent on the primary key, Subject
- Grade is only dependent on the primary key (studentID +
subject)
- Name, Address are only dependent on the primary key
(StudentID)
So it is 2NF ! But is it 3NF?

3NF
A relation R is in third normal form (3NF) if and only if it is in
2NF and every non-key attribute is non-transitively dependent
on the primary key.
An attribute C is transitively dependent on attribute A if there
exists an attribute B such that: A->B and B->C

Note that 3NF is concerned with transitive dependencies


which do not involve candidate keys. A relation with more than
one candidate key will clearly have transitive dependencies of
the form: primary_key -> other_candidate_key -> any_non-
key_column

A 3NF check:
STUDENT TABLE (key = StudentID)

SUBJECTS TABLE (key = Subject)

RESULTS TABLE (key = StudentID+Subject)


- HouseName is dependent on both StudentID +
HouseColour
- Or HouseColour is dependent on both
StudentID+HouseName
- But either way, non-key fields are dependent on MORE
THAN THE PRIMARY KEY (studentID)
- And 3NF says that non-key fields must depend on nothing
but the key

Again, carve off the offending fields


A 3NF fix:
A 3NF win:
The reveal:

3NF - No transitive dependencies


Table contains data from an embedded entity with non-key
attributes.

BCNF is the same, but the embedded table may involve key
attributes.

BCNF
A relation R is in BCNF if and only if: Whenever there is a Non-
Trivial FD
A1A2..An -> B1B2..Bm for R, it is the case that:
{A1,..,An} is a super-key for R

That is: the left side of every Non-Trivial FD must be a super-


key

Example: BCNF or not

The above relation is not in BCNF because:


Consider the FD:
{title,year} -> {length, genre, studioName}
We know that {title, year} is not a super-key

The above relation is in BCNF because: it has no Non-Trivial FD

Differences between BCNF and 3NF


BCNF decomposition algorithm (self studying)

● Input: A relation R with a set of FD’s F


● Output: A BCNF decomposition of R with lossless join
● Method:
- At each step compute the key for the sub-relation R
- if not in BCNF, pick any FD X->Y which violates
- break the relation into 2 sub-relations
1. R1(XY)
2. R2(S - Y)
3. this has a lossless join
4. project FD's onto each sub-relation
- continue until no more offending FD's

3NF decomposition algorithm (self studying)


● Input: A relation R with a set of FD’s F
● Output: A decomposition of R into a collection of relations,
all of which are in 3NF. This decomposition has a lossless
join and dependency-preservation.
● Method:
- Find minimal basic for F, say G.
- ∀ X-A ∈ G, use XA as the schema of one relations in the
decomposition.
- If none of the sets of relations from Step 2 is a super key for
R, add another relation whose schema is a key for R.

Summary 1
Decompose a relation into BCNF is a solution for eliminating
anomalies
But BCNF can cause information loss and dependency loss
3NF is a relax solution of BCNF that keep loss-less join and
dependency-preservation properties
Summary 2
Chapter 4: High - Level
Database Model
Objectives

- Understand the Database Design Process


- Understand data modeling basing on entity relationship
- Design a suitable database adapted business
requirements in reality

Contents

- Database design process


- Entity relational model
- What are entity, entity set, attribute, relationship?
- Entity Relationship Diagram(ERD)
- Attributes on Relationships
- Weak Entities
- Sub-class

Data model - Overview


Database modeling and implementation process

Steps in Database Design


1. Requirements Analysis
- user needs; what must the database do?
2. Conceptual Design
- high level description (Entity Relationship diagram)
3. Logical Design
- translate ERD into DBMS data model
4. Schema Refinement
- consistency, normalization
5. Physical Design
- indexes, disk layout
6. Security Design
- who accesses what, and how

ERD - How to construct


1. Gather all the data that needs to be modeled.
2. Identify data that can be modeled as real world entities.
3. Identify the attributes for each entity.
4. Sort entity sets as weak or strong entity sets.
5. Sort entity attributes as key attributes, multi-valued
attributes, composite attributes, derived attributes.
6. Identify the relations between the different entities.
7. Using the different symbols draw the entities, their
attributes and their relationships. Use appropriate symbols
while drawing attributes.

Entity Relationship Diagram Notations


Comparison of E-R Modeling notations
ERD - Entity
● Entity:
- Real-world thing, distinguishable from other objects.
- Noun phrase
- Entity described by a set of attributes.
● Entity Set: A collection of similar entities. E.g., all
employees.
- All entities in an entity set have the same set of
attributes. (Until we consider hierarchies, anyway!)
- Each attribute has a domain.
Relationship
● Relationship: Association among two or more entities
- relationships can have their own attributes
(descriptive attributes).
- verb phrases

- 1-1
- 1-M/M-1
- M-M
- Degree Constraints
- Recursive relationship
- Unary, Binary, Ternary relationship

● A referential integrity constraints


- A value appearing in one context must also appear in
another
Type of Attributes

Key attribute

Multivalued attribute

Derived attribute

Composite attribute
Weak Entity Sets

Consider the relationship

- An entity set’s key to be composed of attributes, some or


all of which belong to another entity set. Such an entity set
is called a weak entity set.

Requirements for Weak Entity Sets


● R is a relationship from E to F
● R is called supporting relationship if
- R must be a binary, many-one relationship from E to F
- R must have referential integrity from E to F
- The attributes that F supplies for the key of E must be
key attributes of F
Weak Entity Sets
Example weak entity set

Subclasses in E/R Model


Consider Cartoons and Murder Mysteries are the special kinds
of movies, with some special properties

Example COMPANY Database – Construct ERD


● Requirements of the Company (oversimplified for
illustrative purposes)
- The company is organized into DEPARTMENTs. Each
department has a name, number and an employee
who manages the department. We keep track of the
start date of the department manager.
- Each department controls a number of PROJECTs.
Each project has a name, number and is located at a
single location.

Example COMPANY Database (Cont.)


- We store each EMPLOYEE’s social security number,
address, salary, sex, and birthdate. Each employee works
for one department but may work on several projects. We
keep track of the number of hours per week that an
employee currently works on each project. We also keep
track of the direct supervisor of each employee.
- Each employee may have a number of DEPENDENTs. For
each dependent, we keep track of their name, sex,
birthdate, and relationship to employees.

From ER Diagram to Relational Model


● Overview:
- 1 entity = 1 relation
- attributes of entity ~ attributes of relation
- key of entity ~ key of relation
● Convert 1-1 relationship
● Convert 1-M relationship
- Put key attribute of one-side to M-side
● Convert M-M relationship
- Generate 1 relation, Primary key of this relation
combined from two relations. Attributes of new
relation ~ attributes of relationship (if have)

From ER Diagram to Relational Model


Convert 1-1 relationship
For one-to-one relationship w/out total participation
- Build a table with two columns, one column for each
participating entity set’s primary key. Add successive
columns, one for each descriptive attributes of the
relationship set (if any).
For one-to-one relationship with one entity set having total
participation
- Augment one extra column on the right side of the table of
the entity set with total participation, put in there the
primary key of the entity set without complete
participation as per to the relationship.

Convert N-ary Relationship Set

P - Key1 P - Key2 P - Key3 A - Key D - Attribute

9999 8888 7777 6666 Yes

1234 5678 9012 3456 No


* Primary key of this table is P-Key1 + P-Key2 + P-Key3

Representing Composite Attribute


- Relational Model Indivisibility Rule Applies
- One column for each component attribute
- NO column for the composite attribute itself

Representing Multivalue Attribute


● For each multivalue attribute in an entity set/relationship
set
- Build a new relation schema with two columns
- One column for the primary keys of the entity
set/relationship set that has the multivalue attribute
- Another column for the multivalue attributes. Each
cell of this column holds only one value. So each
value is represented as an unique tuple
- Primary key for this schema is the union of all
attributes
Example - Multivalue Attribute
Representing Class Hierarchy
Two general approaches depending on disjointness and
completeness
● For non-disjoint and/or non-complete class hierarchy:
- create a table for each super class entity set
according to normal entity set translation method.
- Create a table for each subclass entity set with a
column for each of the attributes of that entity set
plus one for each attributes of the primary key of the
super class entity set
- This primary key from super class entity set is also
used as the primary key for this new table

Example
Representing Class Hierarchy
Two general approaches depending on disjointness and
completeness
- For disjoint AND complete mapping class hierarchy:
- DO NOT create a table for the super class entity set
- Create a table for each subclass entity set include all
attributes of that subclass entity set and attributes of the
superclass entity set

Simple and Intuitive enough, need example?


Representing Aggregation

From E/R Relationship to Relations


Contracts(starName, title,year, studioOfStar_name,
producingStudio_name)

Combining Relations

- Suppose an entity set E and a many-one relationship R


from E to F. We can combine two relations E and R into
one relation with a schema consisting of:
1. All attributes of E,
2. The key attributes of F, and all own attributes
belonging to relationship R

Handling Weak Entity Sets


● If W is a weak entity set, construct for W a relation whose
schema consists of:
- All attributes of W
- All own attributes of supporting relationships for W
- For each supporting relationship for W, say a many-
one relationship from W to entity set E, all the key
attributes of E
● Rename attributes, if necessary, to avoid name conflicts
● Do not construct a relation for any supporting relationship
for W

SUBCLASS STRUCTURES TO RELATIONS

Converting Subclass Structures to Relations


How do we convert this structure to relations?

The principal conversion strategies


● Follow E/R viewpoint
- For each entity set E in the hierarchy, create a relation
that includes the key attributes from the root and any
attributes belong to E
● Treat entities as object-oriented
- For each possible subtree that includes the root,
create one relation, whose schema includes all the
attributes of all the entity sets in the subtree
● Use null values
- Create only one relation with all attributes of all entity
sets in the hierarchy. Each entity is represented by
one tuple, and that tuple has a NULL value for
whatever attributes the entity does not have

E/R Style Conversion


An Object-Oriented Approach

Using Null Values

Unified Modeling Language –self studying


Introduction
- UML is designed to model software in an object-oriented
style, but has been adapted as a database modeling
language
- UML offers much the same capabilities as the E/R model,
with the exception of multi-way relationships, only binary
relationships in UML.

UML vs. E/R Model

UML Classes

Associations
Consider an associations between Movies, Stars, and Studios
in UML
Comparison with E/R Multiplicities
Self-Associations
An association can have both ends at the same class; such an
association is called a self-association
Example

Association Classes
Subclasses in UML
Consider Movies and its three subclasses.
Figure 4.40: Cartoons and murder mysteries as disjoint
subclasses of movies.
Aggregations and Compositions
UML-to-Relations Basics
Classes to Relations
- For each class, create a relation
1. name is the name of the class
2. attributes are the attributes of the class
Associations to Relations
- For each association, create a relation
1. name is the name of that association
2. attributes are the key attributes of the two connected
classes

From UML Subclasses to Relations


We can use any of the three strategies outlined for E/R to
convert a class and its subclasses to relations.
- E/R-style: each subclass’ relation stores only its own
attributes, plus key
- OO-style: relations store attributes of subclass and all
super-classes
- Nulls: One relation, with NULL’s as needed
From Aggregations and Composition to Relation

No relation for the aggregation or composition.


Add to the relation for the class at the non-diamond end the
key attribute(s) of the class at the diamond end.
- In the case of an aggregation, it is possible that these
attributes can be null.

The UML Analog of Weak Entity Sets


We use the composition, which goes from the weak class to
the supporting class, for a weak entity set.
Example:
Chapter 6: The Database Language
SQL
Objectives

- Student can write a SQL script


- Student can compose SQL queries using set (and bag)
operators, correlated subqueries, aggregation queries
- Student can manipulate proficiently on complex queries

Contents

- Integrity constraints
- Structure Query Language
- DDL
- DML
- DCL (self studying)
- Sub query

Review

Studied:
- ER diagram
- Relational model
- Convert ERD → Relational model
Now: we learn how to set up a relational database on DBMS

REVIEW - Entity Relationship Diagram

COMPANY Database:

6.1 Integrity constraints


- Purpose: prevent semantic inconsistencies in data
- Kinds of integrity constraints:
1. Key Constraints (1 table): Primary key, Candidate key
(Unique)
2. Attribute Constraints (1 table): NULL / NOT NULL;
CHECK
3. Referential Integrity Constraints (2 tables): FOREIGN
KEY
4. Global Constraints (n tables): CHECK or CREATE
ASSERTION (self studying)
We will implement these constraints by SQL
Comparison of Strings
Two strings are equal (=) if they are the same sequence of
characters
Other comparisons: <, >, ≤, ≥, ≠
Suppose a = a1a2…an and b = b1b2…bm are two strings, the first is
less than the second if ∃ k ≤ min(n,m):
- ∀i, 1 ≤ i ≤ k: ai = bi and
- ak+1 < bk+1
Example
- Fodder < foo
- Bar < bargain

Pattern Matching in SQL


Like or Not Like
SELECT SELECT
FROM FROM
WHERE s LIKE p; WHERE s NOT LIKE p:

Two special characters


- % means any sequence of 0 or more characters
- _ means any one character

Example 5.1:
Find all employees named as ‘Võ Việt Anh’

Example 5.2:
Find all employees whose name is ended at ‘Anh’
USING ESCAPE keyword
- SQL allows us to specify any one character we like as the
escape character for a single pattern
- Example
- WHERE s LIKE ‘%20!%%’ ESCAPE !
- Or WHERE s LIKE ‘%20@%%’ ESCAPE @
➡ Matching any s string that begins and ends with the
character %
- WHERE s LIKE ‘x%%x%’ ESCAPE X
➡ Matching any s string that begins and ends with the
character %

Dates and Times


Dates and times are special data types in SQL
A date constraint’s presentation
- DATE ‘1948-05-14’
A time constant’s presentation
- TIME ‘15:00:02.5’
A combination of dates and times
- TIMESTAMP ‘1948-05-14 12:00:00’
Operations on date and time
- Arithmetic operations
- Comparison operations

Null Values
Null value: special value in SQL
Some interpretations
- Value unknown: there is, but i don’t know what it is
- Value inapplicable: there is no value that makes sense here
- Value withheld: we are not entitled to know the value that
belongs here
Null is not a constant
Two rules for operating upon a NULL value in WHERE clause
- Arithmetic operators on NULL values will return a NULL
value
- Comparisons with NULL values will return UNKNOWN

The Truth-Value UNKNOWN


Truth table for True, False, and Unknown
We can think of TRUE = 1; FALSE = 0; UNKNOWN = ½, so
- x AND y = MIN(x,y); x OR y = MAX(x,y); NOT x = 1 - x

SQL conditions in Where clause produce three truth values:


True, False, and Unknown
Those tuples which condition has the value True become part
of the answer
Those tuples thich condition has the value False or Unknown
are excluded from the answer
6.2 SQL Overview
- SQL (sequel) is a database language designed for managing
data in relational database management systems, and originally
based upon relational algebra.
- There are many different dialects of SQL
1. Ansi SQL (or SQL - 86), SQL - 92, SQL - 99
2. SQL: 2003, SQL: 2006, SQL: 2008, SQL: 2009
- Transact-SQL (T-SQL) is Microsoft’s and Sybase’s proprietary
extension to SQL
- PL/SQL (Procedural Language/Structured Query Language) is
Oracle Corporation’s procedural extension for SQL and the
Oracle relational database
- Today, SQL is accepted as the standard RDBMS language

Data Definition Language - CREATE


- Database schema
Simple syntax: CREATE DATABASE dbname
Full syntax: : https://docs.microsoft.com/en-us/sql/database
- Relational schema ~ table

CREATE TABLE tableName


(
fieldname1 datatype [integrity_constraints],
fieldname2 datatype [integrity_constraints],

)
Full syntax: https://docs.microsoft.com/en-us/sql/table
Data Definition Language - Demo

Data Definition Language - ALTER, DROP


Used to modify the structure of table, database
- Add more columns
ALTER TABLE tableName
ADD columnName datatype [constraint]
- Remove columns
ALTER TABLE tableName
DROP columnName datatype [constraint]
- Modify data type
ALTER TABLE tableName
ALTER columnName datatype [constraint]

- Add/remove constraints
ALTER TABLE tablename
ADD CONSTRAINT constraintName PRIMARY KEY(<attribute
list>);
ALTER TABLE tablename
ADD CONSTRAINT constraintName FOREIGN KEY (<attribute
list>)
REFERENCES parentTableName (<attribute list>);
ALTER TABLE tablename
ADD CONSTRAINT constraintName CHECK
(expressionChecking)
ALTER TABLE tablename
DROP CONSTRAINT constraintName

- DROP TABLE tableName


- DROP DATABASE dbName

The 21th slice in chapter 6

You might also like