Chapter 5 - Logical and Physical Database Design
Chapter 5 - Logical and Physical Database Design
Chapter Five
Logical and Physical Database Design
5.1 Logical Database Design
Logical design is the process of constructing a model of the information used in an
enterprise based on a specific data model (e.g. relational, hierarchical or network or
object), but independent of a particular DBMS and other physical considerations.
There are a collection of rules to be maintained in logical database design that helps us to
discover new entities and Revise attributes based on the rules and the discovered Entities.
This rule is called Normalization process.
The first step before applying the rules in relational data model is converting the
conceptual design to a form suitable for relational logical model, which is in a form of
tables.
5.1.2 Normalization
A relational database is merely a collection of data, organized in a particular manner. As
the father of the relational database approach, Codd created a series of rules called
normal forms that help define that organization.
Deletion Anomalies:
If employee with ID 16 is deleted then ever information about skill C++ and the type of
skill is deleted from the database. Then we will not have any information about C++ and
its skill type.
Insertion Anomalies:
What if we have a new employee with a skill called Pascal? We cannot decide whether
Pascal is allowed as a value for skill and we have no clue about the type of skill that
Pascal should be categorized as.
Since the type of Wine served depends on the type of Dinner, we say Wine is
functionally dependent on Dinner.
Dinner Wine
Dinner Type of Wine Type of Fork
Course
Meat Red Meat fork
Fish White Fish fork
Cheese Rose Cheese fork
Since both Wine type and Fork type are determined by the Dinner type, we say Wine is
functionally dependent on Dinner and Fork is functionally dependent on Dinner.
Dinner Wine
Dinner Fork
Partial Dependency
If an attribute which is not a member of the primary key is dependent on some part of the
primary key (if we have composite primary key) then that attribute is partially
functionally dependent on the primary key.
Let {A,B} is the Primary Key and C is no key attribute.
C and B
Then if {A,B} C
Then C is partially functionally dependent on {A,B}
Full Dependency
If an attribute which is not a member of the primary key is not dependent on some part of
the primary key but the whole key (if we have composite primary key) then that attribute
is fully functionally dependent on the primary key.
Let {A,B} is the Primary Key and C is no key attribute
C and B
Then if {A,B} C and A
C does not hold
Then C Fully functionally dependent on {A,B}
Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of the following form:
"If A implies B, and if also B implies C, then A implies C."
First Normal Form (1nf): Remove all repeating groups. Distribute the multi-valued
attributes into different rows and identify a unique identifier for the relation so that is can
be said is a relation in relational database.
Emp First LastName Skill Skill SkillType Schoo SchoolAdd Skill
ID Name ID l Level
EMP_PROJ rearranged
EmpID Proj EmpName ProjName Proj ProjFund ProjMangID Incentiv
No Loc e
Business rule: Whenever an employee participates in a project, he/she will be entitled for
an incentive.
This schema is in its 1NF since we don’t have any repeating groups or attributes with
multi-valued property. To convert it to a 2NF we need to remove all partial dependencies
of non key attributes on part of the primary key.
{EmpID, ProjNo}EmpName, ProjName, ProjLoc, ProjFund, ProjMangID, Incentive
But in addition to this we have the following dependencies
FD1: {EmpID}EmpName
FD2: {ProjNo}ProjName, ProjLoc, ProjFund, ProjMangID
FD3: {EmpID, ProjNo}Incentive
PROJECT
ProjNo ProjName ProjLoc ProjFund ProjMangID
EMP_PROJ
EmpID ProjNo Incentive
Generally, even though there are other four additional levels of Normalization, a table is
said to be normalized if it reaches 3NF. A database with all tables in the 3NF is said to be
Normalized Database.
Mnemonic for remembering the rationale for normalization up to 3NF could be the
following:
No Repeating or Redunduncy: no repeting fields in the table.
The Fields Depend Upon the Key: the table should solely depend on the key.
The Whole Key: no partial keybdependency.
And Nothing But The Key: no inter data dependency.
So Help Me Codd: since Coddcame up with these rules.
It is considered desirable to keep these three levels quite separate -- one of Codd's requirements for an
RDBMS is that it should maintain logical-physical data independence. The generality of the relational
model means that RDBMSs are potentially less efficient than those based on one of the older data
models where access paths were specified once and for all at the design stage. However the relational
data model does not preclude the use of traditional techniques for accessing data - it is still essential to
exploit them to achieve adequate performance with a database of any size.
We can consider the topic of physical database design from three aspects:
• What techniques for storing and finding data exist
• Which are implemented within a particular DBMS
• Which might be selected by the designer for a given application knowing the properties of the
data
Thus the purpose of physical database design is:
1. How to map the logical database design to a physical database design.
2. How to design base relations for target DBMS.
3. How to design enterprise constraints for target DBMS.
4. How to select appropriate file organizations based on analysis of transactions.
5. When to use secondary indexes to improve performance.
6. How to estimate the size of the database
7. How to design user views
8. How to design security mechanisms to satisfy user requirements.
Physical database design is the process of producing a description of the implementation of the database
on secondary storage.
Physical design describes the base relation, file organization, and indexes used to achieve efficient
access to the data, and any associated integrity constraints and security measures.
c) Choose indexes
To determine whether adding indexes will improve the performance of the system.
One approach is to keep tuples unordered and create as many secondary indexes as necessary.
Another approach is to order tuples in the relation by specifying a primary or clustering index.
In this case, choose the attribute for ordering or clustering the tuples as:
• Attribute that is used most often for join operations - this makes join operation more efficient, or
• Attribute that is used most often to access the tuples in a relation in order of that attribute.
If ordering attribute chosen is key of relation, index will be a primary index; otherwise, index will be a
clustering index.
Each relation can only have either a primary index or a clustering index.
Secondary indexes provide a mechanism for specifying an additional key for a base relation that can be
used to retrieve data more efficiently.
Overhead involved in maintenance and use of secondary indexes that has to be balanced against
performance improvement gained when retrieving data.
This includes:
• Adding an index record to every secondary index whenever tuple is inserted;
• Updating a secondary index when corresponding tuple is updated;
• Increase in disk space needed to store the secondary index;
• Possible performance degradation during query optimization to consider all secondary
indexes.