100% found this document useful (1 vote)
43 views

Unit 3

Uploaded by

atharvnawale05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
43 views

Unit 3

Uploaded by

atharvnawale05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT 2- RELATIONAL DATABASE

DESIGN

Unit 2 Relational Database Design 1


• Basic concepts
• CODD's Rules, Relational Integrity: Domain,
Referential Integrities, Enterprise Constraints.
• Database Design: Features of Good Relational
Designs, Normalization.
• Atomic Domains and First Normal Form
• Decomposition using Functional Dependencies
• Algorithms for Decomposition, 2NF, 3NF, 4NF
and
BCNF.

Unit 2 Relational Database Design 2


Basic concepts
• Relational Model Concepts
• Attribute: Each column in a Table. Attributes are the properties which
define a relation. e.g., Student_Rollno, NAME,etc.
• Tables – In the Relational model the, relations are saved in the table
format. It is stored along with its entities. A table has two properties rows
and columns. Rows represent records and columns represent attributes.
• Tuple – It is nothing but a single row of a table, which contains a single
record.
• Relation Schema: A relation schema represents the name of the relation
with its attributes.
• Degree: The total number of attributes which in the relation is called the
degree of the relation.
• Cardinality: Total number of rows present in the Table.
• Column: The column represents the set of values for a specific attribute.
• Relation instance – Relation instance is a finite set of tuples in the RDBMS
system. Relation instances never have duplicate tuples.
• Relation key - Every row has one, two or multiple attributes, which is
called relation key.
• Attribute domain – Every attribute has some pre-defined value and scope
which is known as attribute domain
Unit 2 Relational Database Design 3
Unit 2 Relational Database Design 4
CODD's Rules
• Every database has tables, and constraints cannot be referred to as a
rational database system. And if any database has only relational data
model, it cannot be a Relational Database System (RDBMS). So, some
rules define a database to be the correct RDBMS. These rules were
developed by Dr. Edgar F. Codd (E.F. Codd) in 1985, who has vast research
knowledge on the Relational Model of database Systems. Codd presents
his 13 rules for a database to test the concept of DBMS against his
relational model, and if a database follows the rule, it is called a true
relational database (RDBMS). These 13 rules are popular in RDBMS,
known as Codd's 12 rules.

Unit 2 Relational Database Design 5


Unit 2 Relational Database Design 6
Rule 0: The Foundation Rule
• The database must be in relational form. So that the system can handle
the database through its relational capabilities.
Rule 1: Information Rule
• A database contains various information, and this information must be
stored in each cell of a table in the form of rows and columns.
Rule 2: Guaranteed Access Rule
• Every single or precise data (atomic value) may be accessed logically
from a relational database using the combination of primary key value,
table name, and column name.
Rule 3: Systematic Treatment of Null Values
• This rule defines the systematic treatment of Null values in database
records. The null value has various meanings in the database, like missing
the data, no value in a cell, inappropriate information, unknown data and
the primary key should not be null.
Rule 4: Active/Dynamic Online Catalog based on the relational model
• It represents the entire logical structure of the descriptive database that
must be stored online and is known as a database dictionary. It
authorizes users to access the database and implement a similar query
language to access the database.

Unit 2 Relational Database Design 7


Rule 5: Comprehensive Data Sub Language Rule
• The relational database supports various languages, and if we want to
access the database, the language must be the explicit, linear or
well-defined syntax, character strings and supports the comprehensive:
data definition, view definition, data manipulation, integrity constraints,
and limit transaction management operations. If the database allows
access to the data without any language, it is considered a violation of the
database.

Rule 6: View Updating Rule


• All views table can be theoretically updated and must be practically
updated by the database systems.

Rule 7: Relational Level Operation (High-Level Insert, Update and delete)


Rule
• A database system should follow high-level relational operations such as
insert, update, and delete in each level or a single row. It also supports
union, intersection and minus operation in the database system.

Unit 2 Relational Database Design 8


Rule 8: Physical Data Independence Rule
• All stored data in a database or an application must be physically independent to access
the database. Each data should not depend on other data or an application. If data is
updated or the physical structure of the database is changed, it will not show any effect
on external applications that are accessing the data from the database.
Rule 9: Logical Data Independence Rule
• It is similar to physical data independence. It means, if any changes occurred to the
logical level (table structures), it should not affect the user's view (application). For
example, suppose a table either split into two tables, or two table joins to create a
single table, these changes should not be impacted on the user view application.
Rule 10: Integrity Independence Rule
• A database must maintain integrity independence when inserting data into table's cells
using the SQL query language. All entered values should not be changed or rely on any
external factor or application to maintain integrity. It is also helpful in making the
database-independent for each front-end application.
Rule 11: Distribution Independence Rule
• The distribution independence rule represents a database that must work properly,
even if it is stored in different locations and used by different end-users. Suppose a user
accesses the database through an application; in that case, they should not be aware
that another user uses particular data, and the data they always get is only located on
one site. The end users can access the database, and these access data should be
independent for every user to perform the SQL queries.
Rule 12: Non Subversion Rule
• The non-submersion rule defines RDBMS as a SQL language to store and manipulate the
data in the database. If a system has a low-level or separate language other than SQL to
access the database system, it should not subvert or bypass integrity to transform data.

Unit 2 Relational Database Design 9


Relational Integrity: Domain,
Referential Integrities, Enterprise
Integrity Constraints Constraints
• Integrity constraints are a set of rules. It is used to maintain the quality of
information.
• Integrity constraints ensure that the data insertion, updating, and other
processes have to be performed in such a way that data integrity is not
affected.
• Thus, integrity constraint is used to guard against accidental damage to
the database.
Types of Integrity Constraint

Unit 2 Relational Database Design 10


Domain constraints
• Domain constraints can be defined as the
definition of a valid set of values for an
attribute.
• The data type of domain includes string,
character, integer, time, date, currency, etc.
The value of the attribute must be available
in the corresponding domain.

Unit 2 Relational Database Design 11


2. Entity integrity constraints
• The entity integrity constraint states that
primary key value can't be null.
• This is because the primary key value is used
to identify individual rows in relation and if
the primary key has a null value, then we
can't identify those rows.
• A table can contain a null value other than
the primary key field.

Unit 2 Relational Database Design 12


Referential Integrity Constraints
• A referential integrity constraint is specified
between two tables.
• In the Referential integrity constraints, if a
foreign key in Table 1 refers to the Primary
Key of Table 2, then every value of the
Foreign Key in Table 1 must be null or be
available in Table 2.

Unit 2 Relational Database Design 13


4. Key constraints
• Keys are the entity set that is used to
identify an entity within its entity set
uniquely.
• An entity set can have multiple keys, but out
of which one key will be the primary key. A
primary key can contain a unique and null
value in the relational table.

Unit 2 Relational Database Design 14


• Database normalization is a database schema design technique, by which
an existing schema is modified to minimize redundancy and dependency
of data.
• Normalization split a large table into smaller tables and define
relationships between them to increases the clarity in organizing data.
• Normalization of a Database is achieved by following a set of rules
called 'forms' in creating the database.
• Normalization is a process of organizing the data in database to avoid
data redundancy, insertion anomaly, update anomaly & deletion anomaly.

• Anomalies in DBMS
• There are three types of anomalies that occur when the database is not
normalized. These are – Insertion, update and deletion anomaly .

Unit 2 Relational Database Design 15


Example:Suppose a manufacturing company stores the employee details in a
table named employee that has four attributes: emp_id for storing
employee’s id, e_name for storing employee’s name, e_address for storing
employee’s address, and e_dept for storing the department details in which
the employee works. At some point in time the table looks like this:
e_id e_name e_address e_dept

e_id e_name e_address e_dept


101 Rick Delhi D001
101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004

Unit 2 Relational Database Design 16


Update anomaly:
• In the above table, we have two rows for employee Rick as he belongs to
two departments of the company. If we want to update the address of
Rick then we have to update the same in two rows or the data will
become inconsistent.
• If somehow, the correct address gets updated in one department but not
in other then as per the database, Rick would be having two different
addresses, which is not correct and would lead to inconsistent data.
Insert anomaly:
• Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to
insert the data into the table if the e_dept field doesn’t allow nulls.
Delete anomaly:
• Suppose, if at a point of time the company closes the department D890
then deleting the rows that are having e_dept as D890 would also delete
the information of employee Maggie since she is assigned only to this
department.
• To overcome these anomalies in DBMS, we need to normalize the data.

Unit 2 Relational Database Design 17


• Database Normalization Rules
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF)
• Fourth Normal Form (4NF)
• Fifth Normal Form (5NF)

Unit 2 Relational Database Design 18


First Normal Form (1NF)
• Each table cell should contain a single value.
• Each record needs to be unique
Sample Employee table, it displays employees are working with multiple departments.
Employee Age Department
Melvin 32 Marketing, Sales
Edward 45 Quality Assurance
Alex 36 Human Resource

Employee table following 1NF:

Employee Age Department


Melvin 32 Marketing
Melvin 32 Sales
Edward 45 Quality Assurance
Alex 36 Human Resource

Unit 2 Relational Database Design 19


Second Normal Form (2NF)
• The entity should be considered already in 1NF, and all attributes within
the entity should depend solely on the unique identifier of the entity.
Sample Products table: Product table following 2NF: Brand table:
Products Category table:
prod product Brand brand brand
uctI produ product ID
D ctID
1 Apple
1 Monitor Apple 1 Monitor
2 Samsung
2 Monitor Sams 2 Scanner
ung 3 HP
3 Head phone
3 Scanner HP 4 JBL

4 Head JBL
phone
Products Brand table:

pbID productID brandID


1 1 1
2 1 2
3 2 3
4 3 4 Unit 2 Relational Database Design 20
Third Normal Form (3NF)
• A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
• 3NF is used to reduce the data duplication. It is also used to achieve the
data integrity.
• If there is no transitive dependency for non-prime attributes, then the
relation must be in third normal form.
• A relation is in third normal form if it holds atleast one of the following
conditions for every non-trivial function dependency X → Y.
• X is a super key.
Y is a prime attribute,
• EMP_ID EMP_NA i.e.,EMP_ZIP
each elementEMP_STA
of Y is part of some candidate key.
EMP_CIT
ME TE Y

222 Harry 201010 UP Noida


333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP
Unit 2 Relational Database Design
Bhopal 21
EMP_ID EMP_NA EMP_ZIP EMP_STA EMP_CIT
EMPLOYEE_DETAI
ME TE Y L table:

222 Harry 201010 UP Noida


333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are
non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on
EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent
on super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
Unit 2 Relational Database Design 22
EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010


333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Unit 2 Relational Database Design 23
Boyce Codd normal form (BCNF)
BCNF is the advance version of 3NF. It is stricter than 3NF. A table is in BCNF if every
functional dependency X → Y, X is the super key of the table For BCNF, the table
should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department. EMPLOYEE table:

EMP_ID EMP_COU EMP_DEPT DEPT_TYP EMP_DEPT


NTRY E _NO

264 India Designing D394 283


264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549

In the above table Functional dependencies are as follows:


EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}

Unit 2 Relational Database Design 24


The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table: EMP_DEPT table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYP EMP_DEPT_N


E O
264 India
Designing D394 283
264 India
Testing D394 300

EMP_DEPT_MAPPING table: Stores D283 232


Developing D283 549
EMP_ID EMP_DEPT

D394 283 Functional dependencies:


EMP_ID → EMP_COUNTRY
D394 300 EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
D283 232 Candidate keys:
For the first table: EMP_ID
D283 549 For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Unit 2 Relational Database Design 25


Fourth normal form (4NF)
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency. For a dependency A → B, if for a single value of A, multiple values of B
exists, then the relation will be a multi-valued dependency.
STUDENT
STU_ID COURSE HOBBY

21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY. In the STUDENT relation, a
student with STU_ID, 21 contains two courses, Computer and Math and two hobbies,
Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to
unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

Unit 2 Relational Database Design 26


STUDENT_COURSE

STU_ID COURSE

21 Computer
21 Math STUDENT_HOBBY

34 Chemistry
STU_I HOBBY
74 Biology D
59 Physics 21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey

Unit 2 Relational Database Design 27


Fifth normal form (5NF)
• A relation is in 5NF if it is in 4NF and not contains any join dependency and
joining should be lossless.
• 5NF is satisfied when all the tables are broken into as many tables as
possible in order to avoid redundancy.
• 5NF is also known as Project-join normal form (PJ/NF).

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1


Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1

Unit 2 Relational Database Design 28


• In the above table, John takes both Computer and Math class for Semester
1 but he doesn't take Math class for Semester 2. In this case, combination
of all these fields required to identify a valid data.
• Suppose we add a new Semester as Semester 3 but do not know about
the subject and who will be taking that subject so we leave Lecturer and
Subject as NULL. But all three columns together acts as a primary key, so
we can't leave other two columns blank.
• So to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3:
• P1 P2 P3

SUBJE LECTUR SEMSTER LECT


SEMESTER SUBJECT
CT ER URER
Semester 1 Computer
Comput Anshika Semester 1 Anshika
Semester 1 Math er Semester 1 John
Semester 1 Chemistry Comput John Semester 1 John
Semester 2 Math er
Semester 2 Akash
Math John
Semester 1 Praveen
Math Akash
Unit 2 Relational Database Design 29
Chemist Praveen
Modelling Temporal Data
• A temporal table is a table that records the period of time when a row is
valid with respect to system time (or transaction time, when the
transaction is recorded), business time (or valid time, when the data is
valid with respect to information about the real world), or both.
• A period is an interval of time that is defined by two date or time columns
in a temporal table. A period contains a begin column and an end column.
The begin column indicates the beginning of the period, and the end
column indicates the end of the period. The beginning value of a period is
inclusive, but the ending value of a period is exclusive. For example, if the
begin column has a value of 01/01/1995, that date is included in the row.
Whereas, if the end column has a value of 03/21/1995, that date is not
within the period of the row.

Unit 2 Relational Database Design 30

You might also like