0% found this document useful (0 votes)
5 views

chap 5 dbms

Uploaded by

pavan
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

chap 5 dbms

Uploaded by

pavan
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 6

5.

FUNCTIONAL DEPENDENCY AND NAMALIZATION FOR RELATIONAL DATABASES


Design guidelines for relation schemes.
The 4 guidelines are:-
1) Semantics of the attributes
2) Reducing the redundant values in tuples.
3) Reducing null values in tuples.
4) Disallowing the possibility of generating spurious tuples.

1) Semantics a meaning of the attributes :


Whenever we group attributes’ to form a relation schema, we assume that a certain meaning is associated with
the attributes. This meaning specifies how the attribute values in a tuple relate to one another.
For eg: A student relation has attributes regno, name, age, marks, each of which has a meaning and are related
to each other.
 The meaning of the relation should be easy to explain & attribute values should relate to each other in a
tuple.
Guideline 1:- Design a relation such that it is easy to explain its meaning.Do not combine attributes from
multiple entity types into a single relation.

2) Reducing the redundant values in tuples.


The schema design should minimize the redundancy of data. This minimizes the storage space occupied by the
table
For eg:-
Consider the relations as shown in the diagrams (1) and (2)
Employee
Emp_no Name Dob Salary Address Dept_id
Department
Dept_num Dept_name Dept_location
Dept-employees
Emp_no Name Dob Salary Dept_id Dept_name Dept_location

Diagram-2
 The relation dept_employees would cause serious redundancy storage problem if at least 5 to 10 employees
are working in each department. That is for n number of employees working for the same department the
values for dept_id, dept_name and dept_location remains same and will be repeated n times
 Where as in dig(1) only dept_id value repeated n times for n employees working for the same department in
employee relation and all other information regarding department is maintained only once in department
relation.
Another serious problem that we come across while working with dept_employees relation is
Update anomalies :-
Update anomalies can be classified as:
1) Insertion anomalies
2) Deletion anomalies
3) Modification anomalies

(1) Insertion anomaly :-


A sort of irregularity may occur while inserting tuples in dept_employees relation in dig(2).

 Whenever a new department comes into existence and yet employees are to be appointed- how to fill values
for attributes such as emp_no, name, dob, salary & only informationfoe dept_id,dept_name etc are known
assuming emp_no as the primary key, this situation introduce a serious insertion anomaly problem, since
primary key of a relation cannot be null.
But if we go with separate relations employees and department as shown in diagram(1) the department
details could be entered in department relation irrespective of employee details.
 Similarly if is difficult to enter new employees details in dept_employe’s relation, where in for such
employees get the department has not been allocated.

(2) Deletion anomaly :-


Assuming a single employee is working for the department, now deleting this last tuple also erases the
existence of a particular department from dept_employees relation in dig(2). Such irregularites may not
occurs if tables of fig(1) is employed. Since tuples pertaining to department exist separately in department
table although none of the employees are working for it.

(3) Modification anomaly :-


Since many employees may work for a particular department, if attribute values such as dept_location a
dept_id is to be modified, then such updation must be reflected on all employee tuples who work for that
department other wise, the d/b suffers from consistency problems. i.e the same dept may have different
loctions or different deptartment of no’ s may be occupied for same department name and so on.
To avoid the above problems. The employee and department table must be separated.

Guideline 2: Design the base table such that it avoids insertion, deletion and modification anomalies.

3. Reducing null values in tuples :-

Attributes whose values are unknown or is not distinct from others in many rows then such attributes
should be avoided null values lead to wastage of storage space.

Guideline 3:- Design the table such that all of its attributes assume distinct values other that null values.
If at all null values are to be used for a particular attribute, if must be in exceptional cases only.

4. Avoid generation of spurious tuples.

While joining tables, if we do not use attributes that represent either primary key or foreign key to join then
the resulting relation from such in appropriate grouping of attributes give rise to large no of take tuples such
tuples represent most of the time wrong information. So that the end user cannot make useful conclusions by
employing queries on such relations.

Guideline 4:-Design tables such that they can be joined on attributes that are primary key of foreign key. If
joined on some other attributes, it may result in spurious tuples.

Functional Dependencies :-

A functional dependency (FD) is a constraint between two sets of attributes from the given database.
 A functional dependency is denoted as X Y
i.e FD : X Y
 This notation is read as “Y is functionally dependent on X”
 The set of attributes X is termed as left hand side of FD
 X or LHS is sometimes referred to as determinant.
 The set of attributes Y is termed right hand side of FD
 Y or RHS is sometimes referred to as dependent
 The meaning of this constraint is that whenever any two tuples t1 and t2 of a relation T agree on this X
values i.e t1[x]=t2[x], then they must also agree upon their values i.e t1[y]=t2[y].
 In other words, a functional dependencies refers to the meaning or semantics of the attributes. i.e. , as &
when the semantics of 2 sets of attributes in relation R specify that a FD should exist, then this indicates as a
constraint.
 Relation states r of R satisfying the constraints of functional dependency are termed as legal relation states
of legal extensions of R.
 Thus FD’s help in assisting the database designers to understand and interpret as to how sets of attributes
relate to one another such that this dependency could be maintained on all relation instances r of R.

Eg of functional dependencies:
1. Emp_id  emp_name
2. Reg_no  std_name

Note that in each of the abore the LHS uniquely determines the right hand side.
- For instance the values of emp_id is distinct for each employee and hence uniquely identifies individual
employee names.
- Similarly in eg-2 the students reg_no uniquely determines each and every student.
Also, alternatively it is said that student name or employee name is functionally determined by their
respectime reg_no or emp_id.

Advantages of FD’s are :


- To specify constraints on the set of legal relations
- To verify relations to see whether they are valid under a given set of FD constraints.

Inference rules for functional dependencies.


Rules of inference (reasoning or deducing by reasoning ) or axiome ( given facts or proven facts) offer a
simple & straight forward approach to infer (deduce) new dependencies from a given set of dependencies F, that
are specified from a relation R.
A set of inference rules were developed for the first time by Armstrong.

The following six rules are well known inference rules for functional dependencies
IR1 (Reflexive rule) : x  y, thenx y.
IR2(Augmentation rule) : {x y }
IR3(Transitive rule) : { x y, yz} xz
IR4(Decomposition or projective rule) : { x yz} xy
IR5(Union or additive rule) : { xy ,xz} xyz
IR6(Pseudotransitive rule) : { x y ,wyz } wx z.

Proof of IR1 :-
Suppose that x  y and that 2 tuples t1 and t2 exist in some relation instance r of R such that t1[x]=t2[x].
Then t1[y]=t2[y], because x  y, hence xy must hold in r.

Proof of IR2 :- (By contradiction).


Assume that xy holds in a relation instance r or R but that xzyz does not hold.Then there must exist
two tuples t1 & t2 in r such that (1) t1[x]=t2[x], (2) t1[y]=t2[y], (3) t1[xz]=t2[xz] and (4) t1[yz] t2[yz]. This is not
possible because from (1) & (3) we deduce (5) t1[z]=t2[z] & from (2) & (5) we deduce (6) t1[yz]=t2[yz],
contradicting (4).

Proof of IR3 :-
Assume that (1) xy and (2) xz both hold in a relation r. Then for any two tuples t1 & t2 in r such that
t1[x]=t2[x], we must have (3) t1[y]=t2[y], from assumption(1), hence we must also have (4) t1[z]=t2[z], from (3)
& assumption (2), hence xz must hold in r.

Proof of IR4 ( using IR, through IR3)


1. xyz(given)
2. yzy ( using IR1 & knowing that yzy)
3. xy ( using IR3 on 1 & 2)

Proof of IR5 (using IR, through IR3)


1. xy (given)
2. xz (given)
3. xxy (using IR2 on 1 by augmenting with x, notice that xx=x)
4. xyyz (using IR2 on 2 by augmenting with y)
5. xyz (using IR3 on 3 & 4).

Proof of IR6 (using IR1, through IR3)


1. xy (given)
2. wyz(given)
3. wxwy (using IR2 on 1 by augmenting with w)
4. wx z (using IR3 on 3 & 2).

Normal forms based on primary keys :

The main objective of normalization theory is to eliminate or minimize undesirable properties in a


given relation such as (1) data redundancy and (2) update anomalies. Process of normalization is applied on
relations based on their functional dependencies & primary keys.
 The normalization process via respective normal forms when applied on relations as series of tests,
guarantee a reasonably good database design. Normalization theory is built around the concept of normal
forms E.F codd was the first person to propose initially, in 1972, the theory of normalization process and
the propose the first 3 normal forms codd defined them as first, second and third normal forms.
First normal form:-
A relation R is said to be in INF, if the domain of attributes for each tuple contains atomic values and each
attribute has single values only i.e no multi valued attributes are allowed.
For eg :
Consider the department relation
Department
Dname Dnumber Dmgrssn Dlocations
Research 5 123 {bellary, Hassan,
Tumkur }
Administration 4 234 Mysore
Head quarters 1 345 bangalore

We assume that each dept can have a number of locations. The relation is not in INF because dlocation
is not single valued attribute.
There are 3 main techniques to achieve INF for such a relation.
1) Remove the attribute dlocations that violates INF and place it in a separate relation dept_locations along
with the primary key as show in fig.
Department
Dname Dnumber Dmgrssn

Dept_locations
Dnumber Dlocation
Primary key is a combination of dnumber and dlocation a distinct tuple in dept_locations exists for
each location of a department. This decomposes non 1NF relation into 2 1NF relations.

2) Expand the key so that there will be separate tuple in the original dept relation for each location of dept as
shown in the diagram.

Dname Dnumber Dmgrssn Dlocation


Research 5 123 Bellary
Research 5 123 Hassan
Research 5 123 Tumkur
Admin 4 234 Mysore
Headquarters 1 345 Bangalore

In this care primary key becomes { dnumbers’,Dlocation}. But this solution has the disadvantage of
introducing redundancy in the relation.
3) If a max no of values is known for the attribute for eg if it is known that at most 3 locations can exist for a
department- replace dlocations attribute by 3 atomic attributes dloc1,dloc2, dloc3.
Dname Dnumber Dmgrssn Dloc1 Dloc2 Dloc3

This solution has the disadvantage of introducing null values if dept’s have fewer than 3 locations of the
3 solutions, first is superb because if does not suffer from redundancy.

Second normal form :


A relation R is said to be in 2NF if and only if it is 1NF and every non key attribute is fully functionally
dependent on the primary key.
In other words, a table is in 2NF if every non key field is dependent on the enter primary key and not on a
part of primary key.
 If the database consist of relations with only one primary key, then it is automatically in 2NF.
For eg : consider the following table employee-dept, the primary key is a composite key on emp_id and
dept_num. However, value of dept_name depends only on dept_num and not on entire primary key.

Employee_dept

Emp_id Dept_num Dept_name


4030 212 DRDO
2017 101 ISRO
1140 177 HAL

This is not in 2NF.


 To normalize this relation, perform grouping of dept_num and dept_name into a separate table.
Now every non_key field is fully functionally dependent on primary key in each relation.

Emp_id Dept_num Dept_num Dept_name


4030 212 101 DRDO
2017 101 177 ISRO
1140 177 212 HAL

Emp_dept
Third normal form :
A relation R is said to be in 3NF if and only if it is 2NF and a non key field should not be detemined by another
non-key field.
 In other words, a non_key field cannot depend upon another non-key field in a given relation R.
3NF is based on eliminating transitive dependency
For eg : consider the following relation.

HOSPITAL
Ward_no Ward_name Ward_capacity Unit in charge
36 Mortury 100 Dr.vijaya
41 Canality 10 Dr.rinay
82 Labour 150 Dr.shantala
90 Paediatics 50 Dr.sanjeev

In this eg, a particular ward in a hospital should be headed by respective specialist doctors
Only
It is observed that although ward_no is the primary key and ward_name is fully dependent on ward_no,
wardname and unit in charge fields are dependent on each other (non-keys)
i.e ward_no  ward_name.
ward_name unit in charge.
Both unit in charge and ward name fields are non-key fields one depends over the other for thir
existence.
Hence this relation is not in 3NF.

 To normalize this relation into 3NF, perform grouping of ward_no ward_name and ward_caparity into
HOSPITAL relation and ward_name and unit incahrge into specialists relation.

Ward_no Ward_name Ward capacity


36 Mortury 100
41 Canality 10 Ward_name Unit in charge
82 Labour 150 Mortury Dr.vijaya
90 Paediatics 50 Canality Dr.rinay
Labour Dr.shantala
Paediatics Dr.sanjeev

Hospital
Specialists.
Boyce codd normal form(BCNF):
BCNF definition makes no explicit reference to 1NF and 2 NF nor to the concepts of full functional
dependency and transitive dependency.
“ A relation R is said to be in boyce _codd normal form (BCNF) if and only if every determinant
(i.e LHS of FD) is a candidate key”
For eg : consider student relation.
Student
Reg_no Name Class Telephone
Telephone no is as good as contact_id that distinguish each and every student. Let us assume reg_no, name
and telephone attributes are distinct or unique.

Now the following FD’s can be interred for the STUDENT relation.
Fd : Reg_no  name
Fd : Reg_noclass
Fd3: Reg_notelephone.
Fd4: name class
Fd5: name telephone.
Fd6: telephonename
Fd7 : telephoneclass.

Now student relation is said to be in BCNF, since all contain a candidate key as the determinant (LHS).
BCNF was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF because every
relation in BCNF is also in3NF but vice versa is not true.

You might also like