DBMS-UNIT 2
DBMS-UNIT 2
Ordering of attributes in a relation schema R: We will consider the attributes in R(A1, A2, ..., An)
and the values in t=<v1, v2, ..., vn> to be ordered .
Values in a tuple: All values are considered atomic (indivisible). A special null value is used to
represent values that are unknown or inapplicable to certain tuples
Notation: We refer to component values of a tuple t by t[Ai] = vi (the value of attribute Ai for
tuple t).Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing the values of attributes
Au, Av, ..., Aw,respectively
II. Key constraints : These constraint defines the restriction on a particular attribute (column) or
set of attributes
a) Super key: A set of attributes SK of R such that no two tuples in any valid relation
instance r(R) will have the same value for SK.That is, for any distinct tuples t1 and t2 in
r(R), t1[SK]t2[SK].
(Subset of attributes which give unique tuples i.e no distinct tuples can have the same values)
Eg:(USN,NAME,RNO,CLASS),(NAME,RNO,CLASS),(USN,NAME),
(RNO,CLASS),(NAME,RNO,PER)
Note: Set of all attributes always form a super key
b) Key: A minimal subset of super key which is still a superkey is called a key
Example: The CAR relation schema:
CAR(State, Reg#, SerialNo, Make, Model, Year)
Key1 = {State, Reg#},
Key2 = {SerialNo}, which are also superkeys.
{SerialNo, Make} is a superkey but not a key.
c) Candidate keys: A relation schema may have more than one key,each of the keys is called a
candidate key
d) Primary key: one of the candidate key is chosen arbitrary & is designated as primary key (An
attribute which has unique values for each tuple,it is not null & used to recognized to each Tuple)
e) Foreign key : An attribute in a relation is foreign key if it is primary key of another relation
III) Null constraints :specify whether an attribute should have a value also.
RELATIONAL DATABASES SCHEMAS
Relational Database Schema: A set S of relation schemas that belong to the same database. S is
the name of the database. S = {R1, R2, ..., Rn}
4. Entity integrity constraints:it states that every relation has a primary key which are
unique,not null The primary key attributes PK of each relation schema R in S cannot have null
values in any tuple of r(R). This is because primary key values are used to identify the individual
tuples. t[PK] null for any tuple t in r(R)
Note: Other attributes of R may be similarly constrained to disallow null values, even though
they are not members of the primary key.
5)Referential integrity constraints: it maintains common value among the rows of two relation.
A constraint involving two relations (the previous constraints involve a single relation).
Used to specify a relationship among tuples in two relations: the referencing relation and
the referenced relation.
Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that
reference the primary key attributes PK of the referenced relation R2. A tuple t1 in R1 is
said to reference a tuple t2 in R2 if t1[FK] = t2[PK].
A referential integrity constraint can be displayed in a relational database schema as a
directed arc from R1.FK to R2.
EMPLOYEE
Ename Ssn Bdate Address Dnumber
DEPARTMENT
Dname Dnumber Dmgr_ssn
DEPT_LOCATIONS
Dnumber Dlocation
PROJECT
Pname Pnumber Plocation Dnum
Guideline 1: Design a relation schema so that it is easy to explain its meaning. Do not combine
attributes from multiple entity types and relationship types into a single relation. Otherwise, if the
relation corresponds to a mixture of multiple entities and relationships, semantic ambiguities will
result and the relation cannot be easily explained
EMP_DEPT
Ename Ssn Bdate Address Dnumber Dname Dmgr_ssn
EMP_PROJ
Ssn Pnumber Hours Ename Pname Plocation
For the EMP_PROJ relation in Figure 14.3(b), each tuple relates an employee to a project but also include
the employee name (Ename), project name (Pname), and project location (Plocation). Although there is
nothing wrong logically with these two relations, they violate Guideline 1 by mixing attributes from
distinct real-world entities: EMP_DEPT mixes attributes of employees and departments, and EMP_PROJ
mixes attributes of employees and projects and the WORKS_ON relationship
For example, compare the space used by the two base relations EMPLOYEE and
DEPARTMENT in above Figure with that for an EMP_DEPT base relation in above Figure
.which is the result of applying the NATURAL JOIN operation to EMPLOYEE and
DEPARTMENT
Storing natural joins of base relations leads to an additional problem referred to as update
anomalies. These can be classified into insertion anomalies, deletion anomalies, and modification
anomalies.
Insertion Anomalies. Insertion anomalies can be differentiated into two types, illustrated by the
following examples based on the EMP_DEPT relation:
To insert a new employee tuple into EMP_DEPT, we must include either the attribute
values for the department that the employee works for, or NULLs (if the employee does
not work for a department as yet).
For example, to insert a new tuple for an employee who works in department number 5,
we must enter all the attribute values of department 5 correctly so that they are consistent
with the corresponding values for department 5 in other tuples in EMP_DEPT.
In the design of Figure 14.2, we do not have to worry about this consistency problem
because we enter only the department number in the employee tuple; all other attribute
values of department 5 are recorded only once in the database, as a single tuple in the
DEPARTMENT relation.
It is difficult to insert a new department that has no employees as yet in the EMP_DEPT
relation. The only way to do this is to place NULL values in the attributes for employee.
This violates the entity integrity for EMP_DEPT because its primary key Ssn cannot be
null.
Deletion Anomalies.
If we delete from EMP_DEPT an employee tuple that happens to represent the last
employee working for a particular department, the information concerning that department
is lost inadvertently from the database.
This problem does not occur in the database of Figure 14.2 because DEPARTMENT
tuples are stored separately.
Modification Anomalies.
In EMP_DEPT, if we change the value of one of the attributes of a particular
department—say, the manager of department 5—we must update the tuples of all
employees who work in that department; otherwise, the database will become inconsistent.
If we fail to update some tuples, the same department will be shown to have two different
values for manager in different employee tuples, which would be wrong.
Guideline 2. Design the base relation schemas so that no insertion, deletion, or modification
anomalies are present in the relations. If any anomalies are present, note them clearly and
make sure that the programs that update the database will operate correctly
Guideline 3 As far as possible, avoid placing attributes in a base relation whose values may
frequently be NULL. If NULLs are unavoidable, make sure that they apply in exceptional cases
only and do not apply to a majority of tuples in the relation.
Guideline 4
Design relation schemas so that they can be joined with equality conditions on attributes that are
appropriately related (primary key, foreign key) pairs in a way that guarantees that no spurious
tuples are generated. Avoid relations that contain matching attributes that are not (foreign key,
primary key) combinations because joining on such attributes may produce spurious tuples.
FD is denoted by XY,between two sets of attributes X and Y that are subsets of R which
specifies a constraint on the possible tuples that can form a relation state R
The constraint is that ,For any two tuples t1 and t2 in any relation instance r(R): If
t1[X]=t2[X], then t1[Y]=t2[Y]
This means that the values of the Y component of a tuple in r depend on are determined by
the values of the X component .
X -> Y holds if whenever two tuples have the same value for X, they must have the
same value for Y
X -> Y in R specifies a constraint on all relation instances r(R)
FDs are derived from the real-world constraints on the attributes
Note:
1. If a constraint on R states that there cannot be more than one tuple with a given X value
in any relation instance(R). i.e x is a candidate key of R,which implies that
X -> Y holds if whenever two tuples have the same value for X, they must have the same
value for Y
Examples of FD constraints
i. the lossless join or non additive join property :which guarantees that the spurious tuple
generation
ii. the dependency preservation property:Which ensures that each functional dependency is
represented in some individual relation resulting after decomposition
Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is
in a particular normal form
Denormalization: The process of storing the join of higher normal form relations as a base
relation—which is in a lower normal form
o One of the candidate keys is arbitrarily designated to be the primary key, and the
others are called secondary keys.
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attribute—that is, it is not a member of any
candidate key.
FIRST NORMAL FORM:
It states that the domain of an attribute must include only atomic value & that the value of
any attribute in a tuple in a tuple must be a single value from the domain of that attribute
Therefore 1 NF Disallows
composite attributes
multivalued attributes
nested relations; attributes whose values for an individual tuple are non-atomic
1NF : permits the attribute values which are single atomic values
Normalization into 1NF
A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A
in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE: In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is
not a candidate key.
When Y is a candidate key, there is no problem with the transitive dependency.
E.g., Consider EMP (SSN, Emp#, Salary ).
Here, SSN -> Emp# -> Salary and Emp# is a candidate key.
{student, course} is a candidate key for this relation and that the dependencies shown follow the
pattern in Figure 10.12 (b).
A relation NOT in BCNF should be decomposed so as to meet this property, while possibly
forgoing the preservation of all functional dependencies in the decomposed relations.
We have to settle for sacrificing the functional dependency preservation. But we cannot
sacrifice the non-additivity property after decomposition.
Out of the above three, only the 3rd decomposition will not generate spurious tuples after
join.(and hence has the non-additivity property).
ttA test to determine whether a binary decomposition (decomposition into two relations) is
non-additive (lossless)
Verify that the third decomposition above meets the property.
• X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F