TCS Aspire RDBMS - 2
TCS Aspire RDBMS - 2
Let us take University database as an example and try to understand how ER model is arrived
at.
Example:
A university consists of a number of departments. Each department offers several courses.
Each course includes a number of modules. Students enroll in a particular course and study
modules towards the completion of that course. Each module is taught by a lecturer from the
appropriate department, and each lecturer teaches a group of students.
Entities
Entities are real world items or concepts that exist on their own and are represented as objects
or things of interest. An entity type is a collection of entities that share a common definition.
Identify all nouns in our university example,
A university consists of a number of departments. Each department offers several courses.
Each course includes a number of modules. Students enroll in a particular course and study
modules towards the completion of that course. Each module is taught by a lecturer from the
appropriate department, and each lecturer teaches a group of students.
This scenario consists of students, lecturers, modules, courses and departments. So here the
physical things(Physical things are those which exist in this world, that we can touch, feel
etc.) like students, lecturers and abstract things(An abstract thing is an idea or a concept in
your mind. It is not something that you can physically reach out and touch, smell, hear, taste,
see) like modules,department etc., make an entity type. If we take students as an entity type,
then each student in the university is an entity. The entities are represented as nouns in the
description because they are objects or things.
We can touch an entity of physical things and feel the entity of abstract things but an entity
type is simply an idea. Student is an idea of physical things (entity type) while Scott, Nancy,
Lindsey, and Mackenzie are touchable (Student names are entities). Department is an idea of
abstract things (entity type) while IT,CSE,ECE and CIVIL are entities.
Entity Diagrams
The figure below represents the entities and their corresponding attributes in the University
database.
The name of the relationship is given in a diamond box (For example Belongs to
as shown in Figure 5.1).
Cardinality Ratio
Each entity can be involved in three types of relationships as shown:
One to One (1:1)
Each student belongs to one University. We can illustrate this ratio by writing
ones on the lines indicating the relationship as shown in Figure 2.5.
A lecturer teaches many students, and this One to Many relationship is illustrated
in figure 2.7.
Each student takes many modules, and each module is taken by many students as
shown in figure 2.9.
Entities
Attributes
Relationships
Cardinality ratios
Now lets see how an ER model will look like when all these elements are put together. The
final ER Model of our University database is shown in the Figure 2.10. In this figure we have
shown the entities and the relationship between the entities which depict the complete ER
model of a University. Here Department, Course, Module, Lecturer and Student are the
entities.
The relationships in the Figure 2.10 are defined as Department Offers many Courses and
those two entities have One to Many relationship. A Department Assigns Many
Lecturers(One(1) To Many(n)). Each Lecturer teaches Many Students(One(1) To Many(n)).
Every Student takes several Modules(Many(n) To Many(n)). Every Module includes Many
Courses(Many(n) To Many(n)). A Course is enrolled by Many Students(One(1) to Many(n)).
The
ER
Model
for
the
above
example
is
given
below:
The complete ER Model for our University database will be as shown in the diagram below.
It is an Integrated ER model containing the Entities and Relationships for a University
database.
2.2. Normalization - First Normal Form, Second Normal Form and Third
Normal Form
The database design technique that is used to organize tables in a manner that reduces
redundancy and dependency of data is called Normalization. It is the scientific process of
decomposing complex tables(Relations) into smaller and easily manageable tables. The use
of normalization is to accurately access data from database. Without normalization, database
systems can be inaccurate, redundant, slow and inefficient. They might not produce the data
that is expected. Listed below are the advantages of normalization.
Advantages
Helps to avoid update anomalies. That is, it isolates data so that additions, deletions,
and modifications of a field can be made in just one table. The changes are then
propagated to the rest of the database through the defined relationships.
Edgar Codd invented the relational model and he proposed the theory of normalization with
the introduction of First Normal Form. He continued to extend the theory with Second and
Third Normal Forms. Later Edgar Codd joined with Raymond F. Boyce to develop the theory
of Boyce-Codd Normal Form(BCNF).
Theory of Normalization is still developing. For example, the discussions on 6th Normal
Form are in progress. However, in most practical applications normalization achieves its best
in Third Normal Form. The evolution of Normalization theories is illustrated below:
--
What is a KEY ?
A KEY is a value used to uniquely identify a row in a table. It could be a single column or a
combination
of
multiple
columns.
Note: The columns in a table that are NOT used to uniquely identify a record or row in a table
are called non-key columns.
What is a primary Key?
A primary key is a single column value that is used to uniquely identify a database record.
The primary key column in a table cannot have duplicate values. Each primary key
value must be unique.
The primary key column should have a value when a new record is inserted into the
table.
Example:
The table below contains the details of students. Here studentId is Primary Key which is used
to uniquely identify the details of a student from the table.
Table 2.1
Functional Dependency
In simple terms, functional dependency can be explained as follows. If you know one
attribute then you can get another attribute. Then both these attributes are said to be
functionally dependent. In the Student table given below, we can get the attribute 'Name' if
you know the attribute 'StudentId', then Name and StudentId are functionally dependent. Here
we can say StudentId is determinant and Name as dependent.
For example, let's consider the Student table given below. Table 2.2 stores student
details(StudentId, Name, Languages Known), student's department details (Dept_No,
Dept_Name) and lecturer details (LecturerInCharge, Designation) for Students.
In this approach, we keep repeating the languages known and department details data for all
the students in the same field. This is called an UnNormalized table. Instead of storing the
same data again and again, we could normalize the data and create related tables.
Let's see how we can normalize the table,create related tables and learn forms with the
Student table(which is not normalized):
Student Table (UnNormalized Table):
Table 2.2
First Normal Form
To move from unnormalized form to first normal form all multi-valued attributes (called
repeating groups) should be removed. The repeating groups nust be eliminated. All attributes
must be atomic.
Table 2.2 is not in 1NF since there are repeating groups (more than 1 value in a field). The
column "Languages Known" has(English, Hindi and Tamil) in the Row(Tuple)1 and (English
and Hindi) in the Row(Tuple) 2 .To satisfy 1NF we can create separate rows for each value in
Languages Known by duplicating the values in the remaining columns. Table 2.3 represents
the same.
1NF Rules
A relation in 1NF will be in second normal form (2NF) if there are no partial
dependencies.
Partial dependency
It is the functional dependency on part of the primary key instead of the entire primary key.
It is clear that we can't move forward to make our simple database in 2nd Normalization form
unless we partition the columns in Table 2.3. Here, assume that StudentId and Dept_No
together act as the key (Composite key). As per 2NF all non-key attributes must be dependent
on whole key.
In Table 2.3 the attribute 'Dept_Name' is functionally dependent on whole key
(StudentId+Dept_No). That is, you can get the department name only if you know both
StudentId and Dept_No. All other column attributes can be identified by just providing
'StudentId'. So for all other columns StudentId acts as the primary key. So split the table as
given below to satisfy 2NF.
Student
Table 2.4
Department
Table 2.5
Languages
Table 2.6
Introducing
Foreign
Key
A foreign key is a field in a table that matches the primary key column of another table. The
cross-reference tables can be achieved by Foreign Key.
In Table 2.7,Dept_No is the foreign Key
Table 2.7
The foreign key ensures that a row in a table is mapped to a corresponding row in
another table.
Foreign key does not have to be unique; most often it is not unique.
Foreign Key
Table 2.8
Department
Table 2.9
Transitive functional dependencies
When changing a non-key column might cause any of the other non-key columns to change,
it is called transitive functional dependency. Attributes that are not a part of the key must not
depend on any non-key attribute.
Consider the table 2.9. Changing the non-key column Lecturer In Charge , may change
Designation. Here Dept_No acts as the key. All other columns are non-key attributes. As per
3NF non-key attributes should not be dependent on any other non-key attributes but 'Lecturer
In Charge' is dependent on 'Designation'. Both Lecturer In Charge and Designation are nonkey attributes. So it forms transitive dependency. So, to satisfy 3NF let's split the table in a
short while.
Third Normal Form
Third normal form (3NF) is the third step in database normalization and it builds on the first
(INF)and second normal forms(2NF).
The Third Normal Form(3NF) states that all column references in the referenced data that are
not dependent on the primary key should be removed. Another way of putting this statement
is that only foreign key columns should be used to reference another table, and the other
columns from the parent table should not exist in the reference table.
The Second Normal form(2NF) covers in case of multi-column primary keys. 3NF is meant
to cover single column keys as mentioned in transitive functional dependencies above.
3NF Rules
Rule 2- The table has no transitive functional dependencies which is explained above.
We need to divide our table if it has to be moved from second normal form(2NF) into Third
Normal form(3NF). In table 2.1 Dept_No acts as the key. All other columns are non-key
attributes. The non-key attributes should not be dependent on any other non-key attributes as
per third normal form. The 'Designation' is dependent on 'Lecturer In Charge' and these are
non key attributes in the Lecturer table explained. It forms transitive dependency. So, to
satisfy 3NF split the table as follows.
Student
Table 2.10
Department
Table 2.11
Lecturer
Table 2.12
Languages
Table 2.13
The example given above cannot be decomposed further to attain higher forms of
normalization because it is already normalized to the highest level.Normally only complex
data bases would need next levels of normalization.
2.3. Joins
What
are
Joins?
A join is a technique where records from two or more tables are retrieved through a single
SQL query and shown as a single output. As it forms a set, It can be saved as a table or used
as it is. A join is a means of combining columns from two tables by using values common to
both tables. It allows us to combine data from more than one table into a single result set. A
join condition is used in the WHERE clause of select, update and delete queries.
Note: The query will give results from two tables as Cartesian product(A Cartesian product is
defined as all possible combinations of rows in all tables). If join condition is omitted. The
first table's rows are joined with all rows of the second table. For example, if the first table
has 30 rows and the second table has 10 rows, the result will be 30 * 10, or 300 rows. This
query
Let's
will
use
the
take
two
tables
long
below
to
time
explain
the
to
join
execute.
conditions.
Table "Student"
Table 2.14
Table "Department"
Table 2.15
In the above example the column that is common between both the tables is Dept_No. Using
Dept_No,the Student and Department tables can be joined to combine data from both the
tables as shown below.
Lets consider a scenario to retrieve the details of student who belong to 'CSE' department. We
have to join two tables based on the common column present in the two tables.
Table 2.16
2.4. Summary
The database design technique which is used to organize tables in a manner that
reduces redundancy and dependency of data is called as Normalization.
There are three forms of normalization. They are First Normal form(1NF),Second
Normal form(2NF) and Third Normal form(3NF).
A key is a value used to uniquely identify a row in a table. One or more columns
could be used to form a key for a table.
A primary key is a single column value used to identify a database record uniquely.
A composite key is a primary key derived by combining multiple columns and is used
to identify a record uniquely.
The field in a table which matches the primary key column of another table is called
as foreign key. The cross-reference tables can be achieved by foreign key.
Third normal Form- States that all column reference in referenced data that are not
dependent on the primary key(transitive dependency) should be removed.
Join is a means of combining fields from two tables by using values common to both.
It allows to combine data from more than one table into a single result set.
ADDITIONAL DATA
SQL JOIN
An SQL JOIN clause is used to combine rows from two or more tables, based on a common
field between them.
The most common type of join is: SQL INNER JOIN (simple join). An SQL INNER JOIN
return all rows from multiple tables where the join condition is met.
Let's look at a selection from the "Orders" table:
OrderID
CustomerID
OrderDate
10308
1996-09-18
10309
37
1996-09-19
10310
77
1996-09-20
CustomerName
ContactName
Country
Alfreds Futterkiste
Maria Anders
Germany
Ana Trujillo
Mexico
Antonio Moreno
Mexico
Notice that the "CustomerID" column in the "Orders" table refers to the "CustomerID" in the
"Customers" table. The relationship between the two tables above is the "CustomerID"
column.
Then, if we run the following SQL statement (that contains an INNER JOIN):
Example
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;
Try it yourself
CustomerName
OrderDate
10308
9/18/1996
10365
11/27/1996
10383
12/16/1996
10355
11/15/1996
10278
Berglunds snabbkp
8/12/1996
INNER JOIN: Returns all rows when there is at least one match in BOTH
tables
LEFT JOIN: Return all rows from the left table, and the matched rows from
the right table
RIGHT JOIN: Return all rows from the right table, and the matched rows
from the left table
FULL JOIN: Return all rows when there is a match in ONE of the tables