Manual Approach: What Is Database System? DB Course Is About
Manual Approach: What Is Database System? DB Course Is About
DB course is about:
How to organize data
Efficient and effective data retrieval
Secured and reliable storage of data
Maintaining consistent data
Making information useful for decision making
Data management passes through the different levels of development along with
the development in technology and services. These levels could best be described
by categorizing the levels into three levels of development. Even though there is
an advantage and a problem overcome at each new level, all methods of data
handling are in use to some extent. The major three levels are;
1. Manual Approach
2. Traditional File Based Approach
3. Database Approach
1. Manual Approach
¾ Files for as many event and objects as the organization has are used to
store information.
¾ Each of the files containing various kinds of information is labelled and
stored in one ore more cabinets.
¾ The cabinets could be kept in safe places for security purpose based on the
sensitivity of the information contained in it.
¾ Insertion and retrieval is done by searching first for the right cabinet then
for the right the file then the information.
¾ One could have an indexing system to facilitate access to the data
1
2. Traditional File Based Approach
¾ This approach is the decentralized computerized data handling method.
¾ A collection of application programs perform services for the end-users. In
such systems, every application program that provides service to end
users define and manage its own data
¾ File based systems were an early attempt to computerize the manual filing
system.
¾ Such systems have number of programs for each of the different
applications in the organization.
¾ Since every application defines and manages its own data, the system is
subjected to serious data duplication problem.
¾ File, in traditional file based approach, is a collection of records which
contains logically related data.
2
Limitations of the Traditional File Based approach
¾ Separation or Isolation of Data: Available information in one application
may not be known.
¾ Duplication or redundancy of data
¾ Data dependency on the application
¾ Incompatible file formats between different applications and programs
creating inconsistency.
¾ Fixed query processing which is defined during application development
The limitations for the traditional file based data handling approach arise
from two basic reasons.
1. Definition of the data is embedded in the application program which
makes it difficult to modify the database definition easily.
2. No control over the access and manipulation of the data beyond that
imposed by the application programs.
The most significant problem experienced by the traditional file based approach
of data handling is the “update anomalies”. We have three types of update
anomalies;
1. Modification Anomalies: a problem experienced when one ore more data
value is modified on one application program but not on others
containing the same data set.
2. Deletion Anomalies: a problem encountered where one record set is
deleted from one application but remain untouched in other application
programs.
3. Insertion Anomalies: a problem encountered where one can not decide
whether the data to be inserted is valid and consistent with other similar
data set.
3. Database Approach
¾ Database is just a computerized record keeping system or a kind of
electronic filing cabinet.
¾ Database is a storehouse for collection of computerized data files.
¾ Database is a shared collection of logically related data designed to meet the
information needs of an organization. Since it is a shared corporate
resource, the database is integrated with minimum amount of or no
duplication.
¾ Database is a collection of logically related data where these logically
related data comprises entities, attributes, and relationships of an
organization's information.
¾ In addition to containing data required by an organization, database also
contains a description of the data which called as “Metadata” or “Data
Dictionary” or “Systems Catalogue” or “Data about Data”. Since a
3
database contains information about the data (metadata), it is called a self
descriptive collection on integrated records.
¾ The purpose of a database is to store information and to allow users to
retrieve and update that information on demand.
¾ Database is deigned once and used simultaneously by many users.
¾ Unlike the traditional file based approach in database approach there is
program data independence. That is the separation of the data definition
from the application. Thus the application is an affected by changes made in
the data structure and file organization.
Why Database?
¾ Compactness (no voluminous papers)
¾ Speed: retrieval and update is fast
¾ Less drudgery: less file maintenance process
¾ Currency: accurate and up-to-date information is available on demand.
¾ Centralized information control unit
4
Limitations of Database Approach
¾ Complexity in designing and managing data
¾ High cost incurred to develop and maintain
¾ Reduced performance due to centralization
¾ High impact on the system when failure occur
5
Functions of Database Management System (DBMS)
A full scale DBMS should have the following services to provide to the
user.
1. Data storage, retrieval and update in the database
2. A user accessible catalogue
3. Transaction support service: ALL or NONE transaction which
minimize data inconsistency.
4. Concurrency Control Services: access and update on the database by
different users simultaneously should be implemented correctly.
5. Recovery Services: a mechanism for recovering the database after a
failure must be available.
6. Authorization Services (Security): must support the implementation
of access and authorization service to database administrator and
users.
7. Support for Data Communication: should provide the facility to
integrate with data transfer software or data communication
managers.
8. Integrity Services: rules about data and the change that took place on
the data, correctness and consistency of stored data, and quality of
data based on business constraints.
9. Services to promote data independency between the data and the
application
10. Utility services: sets of utility service facilities like
¾ Importing data
¾ Statistical analysis support
¾ Index reorganization
¾ Garbage collection
The DBMS is software that helps to design, handle, and use data using the
database approach. Taking a DBMS as a system one can describe it with respect
6
to it environment or other systems interacting with the DBMS. The DBMS
environment has five components.
7
b. Logical Design: a higher level conceptual abstraction with selected
specific data model to implement the data structure. It is particular
DBMS independent and with no other physical considerations.
c. Physical Design: physical implementation of the upper level
design of the database with respect to internal storage and file
structure of the database for the selected DBMS.
8
2. DataBase Designer (DBD)
¾ Identifies the data to be stored and choose the appropriate structures to
represent and store the data.
¾ Should understand the user requirement and should choose how the user
views the database.
¾ Involve on the design phase before the implementation of the database
system.
We have two distinctions of database designers, one involving in the logical
and conceptual design and another involving in physical design.
2. Physical DBD
¾ Take logical design specification as input and decide how it
should be physically realized.
¾ Map the logical data model on the specified DBMS with respect
to tables and integrity constraints. (DBMS dependent designing)
¾ Select specific storage structure and access path to the database
¾ Design security measures required on the database
4. End Users
Workers, whose job requires accessing the database frequently for various
purpose. There are different group of users in this category.
1. Naïve Users:
¾ Sizable proportion of users
9
¾ Unaware of the DBMS
¾ Only access the database based on their access level and
demand
¾ Use standard and pre-specified types of queries.
2. Sophisticated Users
¾ Are users familiar with the structure of the Database and
facilities of the DBMS.
¾ Have complex requirements
¾ Have higher level queries
¾ Are most of the time engineers, scientists, business analysts, etc
3. Casual Users
¾ Users who access the database occasionally.
¾ Need different information from the database each time.
¾ Use sophisticated database queries to satisfy their needs.
¾ Are most of the time middle to high level managers.
These users can be again classified as “Actors on the Scene” and “Workers
Behind the Scene”.
10
History of Database Systems
The purpose and origin of the Three-Level database
architecture
8 All users should be able to access same data
8 A user's view is unaffected or immune to changes made in other
views
8 Users should not need to know physical database storage details
8 DBA should be able to change database storage structures
without affecting the users' views.
8 Internal structure of database should be unaffected by changes to
physical aspects of storage.
8 DBA should be able to change conceptual structure of database
without affecting all users.
11
ANSI-SPARC Architecture and Database
Design Phases
12
The contents of the external, conceptual and internal levels
The purpose of the external/conceptual and the
conceptual/internal mappings
External Level: Users' view of the database. Describes that part of database
that is relevant to a particular user. Different users have their own
customized view of the database independent of other users.
13
Defines DBMS schemas at three levels:
Internal schema at the internal level to describe physical storage structures and
access paths. Typically uses a physical data model.
External schemas at the external level to describe the various user views. Usually
uses the same data model as the conceptual level.
Data Independence
Logical Data Independence:
8 Refers to immunity of external schemas to changes in conceptual
schema.
8 Conceptual schema changes e.g. addition/removal of entities
should not require changes to external schema or rewrites of
application programs.
8 The capacity to change the conceptual schema without having to
change the external schemas and their application programs.
14
8 Internal schema changes e.g. using different file organizations,
storage structures/devices should not require change to
conceptual or external schemas.
Database Languages
Data Definition Language (DDL)
8 Allows DBA or user to describe and name entitles, attributes and
relationships required for the application.
8 Specification notation for defining the database schema
15
SQL is the most widely used non-procedural language query
language
16
8 Record-based
8 Physical
Object-based Data Models
8 Entity-Relationship
8 Semantic
8 Functional
8 Object-Oriented
Record-based Data Models
Consist of a number of fixed format records.
Each record type defines a fixed number of fields,
Each field is typically of a fixed length.
8 Relational Data Model
8 Network Data Model
8 Hierarchical Data Model
1. Hierarchical Model
2. Network Model
3. Relational Data Model
1. Hierarchical Model
• The simplest data model
• Record type is referred to as node or segment
• The top node is the root node
• Nodes are arranged in a hierarchical structure as sort of upside-
down tree
• A parent node can have more than one child node
• A child node can only have one parent node
• The relationship between parent and child is one-to-many
• Relation is established by creating physical link between stored
records (each is stored with a predefined access path to other
records)
• To add new record type or relationship, the database must be
redefined and then stored in a new form.
17
Department
Employee Job
2. Network Model
• Allows record types to have more that one parent unlike
hierarchical model
• A network data models sees records as set members
• Each set has an owner and one or more members
• Allow no many to many relationship between entities
• Like hierarchical model network model is a collection of
physically linked records.
• Allow member records to have more than one owner
18
Department Job
Employee
Activity
Time Card
19
• Records are related by the data stored jointly in the fields of records in
two tables or files. The related tables contain information that creates
the relation
• The tables seem to be independent but are related some how.
• No physical consideration of the storage is required by the user
• Many tables are merged together to come up with a new virtual view
of the relationship
Alternative terminologies
Relation Table File
Tuple Row Record
Attribute Column Field
20
Relational Data Model
All values in a column represent the same attribute and have the same
data format
1
Building Blocks of the Relational Data Model
The building blocks of the relational data model are:
1. The ENTITIES (persons, places, things etc.) which the organization has to
deal with. Relations can also describe relationships
2
Types of Attributes
3
3. The RELATIONSHIPS between entities which exist and must be taken into
account when processing information.
4
Relational Constraints
• Relational Integrity
¾ Domain Integrity: No value of the attribute should be
beyond the allowable limits
¾ Entity Integrity: In a base relation, no attribute of a
primary key can be null
¾ Referential Integrity: If a foreign key exists in a
relation, either the foreign key value must match a
candidate key in its home relation or the foreign key
value must be null foreign key to primary key
match-ups
¾ Enterprise Integrity: Additional rules specified by the
users or database administrators of a database are
incorporated
• Key constraints
If tuples are need to be unique in the database, and then we need to make
each tuple distinct. To do this we need to have relational keys that
uniquely identify each relation.
5
• Relational languages and views
The languages in relational database management systems are the DDL
and the DML that are used to define or create the database and perform
manipulation on the database.
Purpose of a view
¾ Hides unnecessary information from users
¾ Provide powerful flexibility and security
¾ Provide customized view of the database for users
¾ Update on views derived from various relations is not
allowed, but a view of one base relation can be updated.
6
Schemas and Instances and Database State
Schemas
z Database Schema (intension): specifies name of relation, plus name and
type of each column.
¾ refer to a description of database (or intention)
¾ specified during database design
¾ should not be changed
z schema diagrams
¾ convention to display some aspect of a schema visually
z schema construct
¾ refers to each object in the schema (e.g. STUDENT)
E.g.: STUNEDT (FName,LName,Id,Year,Dept,Sex)
z Three-Schema Architecture
¾ Internal schema (or internal level)
z Internal schema describes the physical storage, structure of
the database (data storage, access paths)
7
Instances
z Database state (snapshot or extension): is the collection of data in the
database at a particular point of time (snap-shot).
¾ Refers to the actual data in the database at a specific time
¾ State of database is changed any time we add or delete a record
¾ Valid state: the state that satisfies the structure and constraints
specified in the schema and is enforced by DBMS
8
Database Design
Database design consists of several tasks:
¾ Requirements Analysis,
¾ Conceptual Design, and Schema Refinement,
¾ Logical Design,
¾ Physical Design and
¾ Tuning
Logical design
Physical design
9
Conceptual Database Design
Conceptual design is the process of constructing a model of the
information used in an enterprise, independent of all physical
considerations.
It is the source of information for the logical design phase.
Community User’s view
10
Conceptual Database Design
Conceptual design revolves around discovering and analyzing organizational
and user data requirements
The important activities are to identify
¾ Entities
¾ Attributes
¾ Relationships
¾ Constraints
And based on these components develop the ER model using
¾ ER diagrams
11
Developing an E-R Diagram
12
Graphical Representations in ER Diagramming
Key
Relationship
Weak Relationship
13
Example 1
Id Gpa
Students Course
Age
Enrolled_In
Cour_Id Stud_Id
Grade
14
Entity versus Attributes
Consider designing a database of employees for an
organization:
Should address be an attribute of Employees or an
entity (connected to Employees by a relationship)?
15
Structural Constraints on Relationship
One-to-one relationship:
¾ A customer is associated with at most one loan via the relationship borrower
¾ A loan is associated with at most one customer via borrower
1..1 Manages
0..1
Employee Branch
16
One-To-Many Relationships
¾ In the one-to-many relationship a loan is associated with at most one customer
via borrower, a customer is associated with several (including 0) loans via
borrower
0..* Leads
1..1
Employee Project
Many-To-Many Relationship
¾ A customer is associated with several (possibly 0) loans via borrower
¾ A loan is associated with several (possibly 0) customers via borrower
1..* Teaches
0..*
Instructor Course
17
Participation of an Entity Set in a Relationship Set
18
Problem in ER Modeling
The Entity-Relationship Model is a conceptual data model that views the real
world as consisting of entities and relationships. The model visually represents
these concepts by the Entity-Relationship diagram. The basic constructs of the ER
model are entities, relationships, and attributes. Entities are concepts, real or
abstract, about which information is collected. Relationships are associations
between the entities. Attributes are properties which describe the entities.
While designing the ER model one could face a problem on the design which is
called a connection traps. Connection traps are problems arising from
misinterpreting certain relationships
Example:
Problem: Which car (Car1 or Car3 or Car5) is used by Employee 6 Emp6 working
in Branch 1 (Bra1)? Thus from this ER Model one can not tell which car is used by
which staff since a branch can have more than one car and also a branch is
19
populated by more than one employee. Thus we need to restructure the model to
avoid the connection trap.
To avoid the Fan Trap problem we can go for restructuring of the E-R Model.
This will result in the following E-R Model.
Car1
Bra1 Emp1
Car2
Bra2 Emp2
Car3
Bra3 Emp3
Car4
Bra4 Emp4
Car5
Emp5
Car6
Emp6
Car7
Emp7
20
2. Chasm Trap:
Occurs where a model suggests the existence of a relationship between
entity types, but the path way does not exist between certain entity
occurrences.
May exist when there are one or more relationships with a minimum
multiplicity on cardinality of zero forming part of the pathway between
related entities.
Example:
If we have a set of projects that are not active currently then we can not
assign a project manager for these projects. So there are project with no
project manager making the participation to have a minimum value of
zero.
Problem:
How can we identify which BRANCH is responsible for which PROJECT?
We know that whether the PROJECT is active or not there is a responsible
BRANCH. But which branch is a question to be answered, and since we
have a minimum participation of zero between employee and PROJECT
we can’t identify the BRANCH responsible for each PROJECT.
The solution for this Chasm Trap problem is to add another relation ship
between the extreme entities (BRANCH and PROJECT)
21
Enhanced E-R (EER) Models
Object-oriented extensions to E-R model
EER is important when we have a relationship between two entities
and the participation is partial between entity occurrences. In such
cases EER is used to reduce the complexity in participation and
relationship complexity.
ER diagrams consider entity types to be primitive objects
EER diagrams allow refinements within the structures of entity types
EER Concepts
Generalization
Specialization
Sub classes
Super classes
Attribute Inheritance
Constraints on specialization and generalization
22
Generalization
¾ Generalization occurs when two or more entities represent categories
of the same real-world object.
¾ Generalization is the process of defining a more general entity type
from a set of more specialized entity types.
¾ A generalization hierarchy is a form of abstraction that specifies that
two or more entities that share common attributes can be generalized
into a higher level entity type.
¾ Is considered as bottom-up definition of entities.
¾ Generalization hierarchy depicts relationship between higher level
superclass and lower level subclass.
Generalization hierarchies can be nested. That is, a subtype of one
hierarchy can be a supertype of another. The level of nesting is limited
only by the constraint of simplicity.
Example: Account is a generalized form for Saving and Current
Accounts
23
Specialization
¾ Is the result of subset of a higher level entity set to form a lower level
entity set.
¾ The specialized entities will have additional set of attributes
(distinguishing characteristics) that distinguish them from the
generalized entity.
¾ Is considered as Top-Down definition of entities.
¾ Specialization process is the inverse of the Generalization process.
Identify the distinguishing features of some entity occurrences, and
specialize them into different subclasses.
¾ Reasons for Specialization
o Attributes only partially applying to superclasses
o Relationship types only partially applicable to the superclass
¾ In many cases, an entity type has numerous sub-groupings of its
entities that are meaningful and need to be represented explicitly. This
need requires the representation of each subgroup in the ER model.
The generalized entity is a superclass and the set of specialized entities
will be subclasses for that specific Superclass.
o Example: Saving Accounts and Current Accounts are
Specialized entities for the generalized entity Accounts.
Manager, Sales, Secretary: are specialized employees.
Subclass/Subtype
¾ An entity type whose tuples have attributes that distinguish its
members from tuples of the generalized or Superclass entities.
¾ When one generalized Superclass has various subgroups with
distinguishing features and these subgroups are represented by
specialized form, the groups are called subclasses.
¾ Subclasses can be either mutually exclusive (disjoint) or overlapping
(inclusive).
¾ A single subclass may inherit attributes from two distinct superclasses.
¾ A mutually exclusive category/subclass is when an entity instance can
be in only one of the subclasses.
E.g.: An EMPLOYEE can either be SALARIED or PART-TIMER but
not both.
¾ An overlapping category/subclass is when an entity instance may be
in two or more subclasses.
E.g.: A PERSON who works for a university can be both
EMPLOYEE and a STUDENT at the same time.
24
Superclass /Supertype
¾ An entity type whose tuples share common attributes. Attributes that
are shared by all entity occurrences (including the identifier) are
associated with the supertype.
¾ Is the generalized entity
25
¾ We can also have subclasses of a subclass forming a hierarchy
of specialization.
¾ Superclass attributes are shared by all subclasses f that
superclass
¾ Subclass attributes are unique for the subclass.
Attribute Inheritance
¾ An entity that is a member of a subclass inherits all the
attributes of the entity as a member of the superclass.
¾ The entity also inherits all the relationships in which the
superclass participates.
¾ An entity may have more than one subclass categories.
¾ All entities/subclasses of a generalized entity or superclass
share a common unique identifier attribute (primary key). i.e.
The primary key of the superclass and subclasses are always
identical.
26
Constraints on specialization and generalization
Completeness Constraint.
• The Completeness Constraint addresses the issue of whether or not an
occurrence of a Superclass must also have a corresponding Subclass
occurrence.
• The completeness constraint requires that all instances of the subtype be
represented in the supertype.
• The Total Specialization Rule specifies that an entity occurrence should
at least be a member of one of the subclasses. Total Participation of
superclass instances on subclasses is diagrammed with a double line from
the Supertype to the circle as shown below.
• The Partial Specialization Rule specifies that it is not necessary for all
entity occurrences in the superclass to be a member of one of the
subclasses. Here we have an optional participation on the specialization.
Partial Participation of superclass instances on subclasses is diagrammed
with a single line from the Supertype to the circle.
27
Disjointness Constraints.
• Specifies the rule whether one entity occurrence can be a member of
more than one subclasses. i.e. it is a type of business rule that deals
with the situation where an entity occurrence of a Superclass may
also have more than one Subclass occurrence.
• The Disjoint Rule restricts one entity occurrence of a superclass to
be a member of only one of the subclasses. Example: a EMPLOYEE
can either be SALARIED or PART-TIMER, but not the both at the
same time.
• The Overlap Rule allows one entity occurrence to be a member f
more than one subclass. Example: EMPLOYEE working at the
university can be both a STUDENT and an EMPLOYEE at the same
time.
• This is diagrammed by placing either the letter "d" for disjoint or "o"
for overlapping inside the circle on the Generalization Hierarchy
portion of the E-R diagram.
From the two types of constraints we can have four possible constraints
28
Normalization
A relational database is merely a collection of data, organized in a particular
manner. As the father of the relational database approach, Codd created a series
of rules called normal forms that help define that organization
1. Insertion Anomalies
2. Deletion Anomalies
3. Modification Anomalies
1
All the normalization rules will eventually remove the update anomalies that
may exist during data manipulation after the implementation. The update
anomalies are;
Pitfalls of Normalization
2
Example of problems related with Anomalies
Deletion Anomalies:
If employee with ID 16 is deleted then ever information about skill C++ and the
type of skill is deleted from the database. Then we will not have any information
about C++ and its skill type.
Insertion Anomalies:
What if we have a new employee with a skill called Pascal? We can not decide
weather Pascal is allowed as a value for skill and we have no clue about the type
of skill that Pascal should be categorized as.
Modification Anomalies:
What if the address for Helico is changed fro Piazza to Mexico? We need to look
for every occurrence of Helico and change the value of School_Add from Piazza
to Mexico, which is prone to error.
3
Functional Dependency (FD)
Before moving to the definition and application of normalization, it is important to have
an understanding of "functional dependency."
Data Dependency
The logical association between data items that point the database designer in the
direction of a good database design are refered to as determinant or dependent
relationships.
The essence of this idea is that if the existence of something, call it A, implies that B
must exist and have a certain value, then we say that "B is functionally dependent on
A." We also often express this idea by saying that "A determines B," or that "B is a
function of A," or that "A functionally governs B." Often, the notions of functionality and
functional dependency are expressed briefly by the statement, "If A, then B." It is
important to note that the value B must be unique for a given value of A, i.e., any given
value of A must imply just one and only one value of B, in order for the relationship to
qualify for the name "function." (However, this does not necessarily prevent different
values of A from implying the same value of B.)
X Æ Y holds if whenever two tuples have the same value for X, they must have the
same value for Y
4
Example
Dinner Course Type of Wine
Meat Red
Fish White
Cheese Rose
Since the type of Wine served depends on the type of Dinner, we say Wine is
functionally dependent on Dinner.
Dinner Æ Wine
Since both Wine type and Fork type are determined by the Dinner type, we say Wine is
functionally dependent on Dinner and Fork is functionally dependent on Dinner.
Dinner Æ Wine
Dinner Æ Fork
Partial Dependency
If an attribute which is not a member of the primary key is dependent on some part of the
primary key (if we have composite primary key) then that attribute is partially
functionally dependent on the primary key.
Full Dependency
If an attribute which is not a member of the primary key is not dependent on some part of
the primary key but the whole key (if we have composite primary key) then that attribute
is fully functionally dependent on the primary key.
Then if {A,B}ÆC
and BÆC and AÆC both does not hold
Then C Fully functionally dependent on {A,B}
5
Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of the following form:
"If A implies B, and if also B implies C, then A implies C."
Example:
If Abebe is a Human, and if every Human is an Animal, then Abebe must be an Animal.
6
Steps of Normalization:
We have various levels or steps in normalization called Normal Forms. The level of
complexity, strength of the rule and decomposition increases as we move from one lower
level Normal Form to the higher.
normal form below represents a stronger condition than the previous one
7
Example for First Normal form (1NF )
UNNORMALIZED
EmpID FirstName LastName Skill SkillType School SchoolAdd SkillLevel
12 Abebe Mekuria SQL, Database, AAU, Sidist_Kilo 5
VB6 Programming Helico Piazza 8
16 Lemma Alemu C++ Programming Unity Gerji 6
IP Programming Jimma Jimma City 4
28 Chane Kebede SQL Database AAU Sidist_Kilo 10
65 Almaz Belay SQL Database Helico Piazza 9
Prolog Programming Jimma Jimma City 8
Java Programming AAU Sidist_Kilo 6
24 Dereje Tamiru Oracle Database Unity Gerji 5
94 Alem Kebede Cisco Networking AAU Sidist_Kilo 7
8
Second Normal form 2NF
No partial dependency of a non key attribute on part of the primary key. This will result
in a set of relations with a level of Second Normal Form.
Any table that is in 1NF and has a single-attribute (i.e., a non-composite) key is
automatically also in 2NF.
EMP_PROJ rearranged
EmpID ProjNo EmpName ProjName ProjLoc ProjFund ProjMangID
This schema is in its 1NF since we don’t have any repeating groups or attributes with
multi-valued property. To convert it to a 2NF we need to remove all partial dependencies
of non key attributes on part of the primary key.
EmpIDÆEmpName
ProjNoÆProjName, ProjLoc, ProjFund, ProjMangID
As we can see some non key attributes are partially dependent on some part of the
primary key. Thus these collections of attributes should be moved to a new relation.
EMPLOYEE
EmpID EmpName
PROJECT
ProjNo ProjName ProjLoc ProjFund ProjMangID
EMP_PROJ
EmpID ProjNo
9
Third Normal Form (3NF )
Eliminate Columns Not Dependent On Key - If attributes do not contribute to a
description of the key, remove them to a separate table.
This level avoids update and delete anomalies.
This schema is in its 2NF since the primary key is a single attribute.
Let’s take StudID, Year and Dormitary and see the dependencies.
10
Year Dormitary
1 401
3 403
11
Boyce-Codd Normal Form (BCNF):
Isolate Independent Multiple Relationships - No table may contain two or more 1:n or N:M
relationships that are not directly related.
The correct solution, to cause the model to be in 4th normal form, is to ensure that all M:M
relationships are resolved independently if they are indeed independent, as shown below.
Def: A table is in DKNF if every constraint on the table is a logical consequence of the
definition of keys and domains.
12
Physical Database Design
Methodology for Relational Database
The Logical database design is concerned with the what;
The Physical database design is concerned with the how.
Physical database design is the process of producing a
description of the implementation of the database on
secondary storage. It describes the base relations, file
organization, and indexes used to achieve effective access to
the data along with any associated integrity constraints and
security measures.
Physical design describes the base relation, file organization,
and indexes used to achieve efficient access to the data, and
any associated integrity constraints and security measures.
Sources of information for the physical design process include
global logical data model and documentation that describes
model.
Describes the storage structures and access methods used to
achieve efficient access to the data
Knowledge of the DBMS that is selected to host the database
systems, with all its functionalities, is required since
functionalities of current DBMS vary widely.
1
To understand the functionality of the transactions that
will run on the database and to analyze the important
transactions
2
1. Translate logical data model for target DBMS
This phase is the translation of the global logical data model to produce
a relational database schema in the target DBMS. This includes creating
the data dictionary based on the logical model and information
gathered.
After the creation of the data dictionary, the next activity is to
understand the functionality of the target DBMS so that all necessary
requirements are fulfilled for the database intended to be developed.
3
Most of the time derived attributes are not expressed in the logical
model but will be included in the data dictionary. Whether to store
stored attributes in a base relation or calculate them when required is a
decision to be made by the designer considering the performance
impact.
All the enterprise level constraints and the definition method in the
target DBMS should be fully documented.
4
Relational Query Languages
Query languages: Allow manipulation and retrieval of data from a
database.
Query Languages != programming languages!
QLs not intended to be used for complex calculations.
QLs support easy, efficient access to large data sets.
Relational model supports simple, powerful query languages.
Two mathematical Query Languages form the basis for Relational languages
Relational Algebra:
Relational Calculus:
Relational Algebra
1
The basic set of operations for the relational model is known as the relational
algebra. These operations enable a user to specify basic retrieval requests.
The result of the retrieval is a new relation, which may have been formed
from one or more relations. The algebra operations thus produce new
relations, which can be further manipulated using operations of the same
algebra.
2
Table1:
Sample table used to illustrate different kinds of relational
operations. The relation contains information about employees,
IT skills they have and the school where they attend each skill.
Employee
EmpID FName LName SkillID Skill SkillType School SchoolAdd SkillLevel
12 Abebe Mekuria 2 SQL Database AAU Sidist_Kilo 5
16 Lemma Alemu 5 C++ Programming Unity Gerji 6
28 Chane Kebede 2 SQL Database AAU Sidist_Kilo 10
25 Abera Taye 6 VB6 Programming Helico Piazza 8
65 Almaz Belay 2 SQL Database Helico Piazza 9
24 Dereje Tamiru 8 Oracle Database Unity Gerji 5
51 Selam Belay 4 Prolog Programming Jimma Jimma City 8
94 Alem Kebede 3 Cisco Networking AAU Sidist_Kilo 7
18 Girma Dereje 1 IP Programming Jimma Jimma City 4
13 Yared Gizaw 7 Java Programming AAU Sidist_Kilo 6
3
Selection
Selects subset of tuples/rows in a relation that satisfy selection
condition.
Selection operation is a unary operator (it is applied to a single
relation)
The Selection operation is applied to each tuple individually
The degree of the resulting relation is the same as the original relation
but the cardinality (no. of tuples) is less than or equal to the original
relation.
The Selection operator is commutative.
Set of conditions can be combined using Boolean operations (∧(AND),
∨(OR), and ~(NOT))
No duplicates in result!
Schema of result identical to schema of (only) input relation.
Result relation can be the input for another relational algebra operation!
(Operator composition.)
It is a filter that keeps only those tuples that satisfy a qualifying
condition (those satisfying the condition are selected while others are
discarded.)
Notation:
<Selection Condition> <Relation Name>
Example: Find all Employees with skill type of Database.
If the query is all employees with a SkillType Database and School Unity the
relational algebra operation and the resulting relation will be as follows.
4
Projection
Selects certain attributes while discarding the other from the base
relation.
The PROJECT creates a vertical partitioning – one with the needed
columns (attributes) containing results of the operation and other
containing the discarded Columns.
Deletes attributes that are not in projection list.
Schema of result contains exactly the fields in the projection list, with
the same names that they had in the (only) input relation.
Projection operator has to eliminate duplicates!
Note: real systems typically don’t do duplicate elimination
unless the user explicitly asks for it.
If the Primary Key is in the projection list, then duplication will not
occur
Duplication removal is necessary to insure that the resulting table is
also a relation.
Notation:
π <Selected Attributes> <Relation Name>
Example: To display Name, Skill, and Skill Level of an employee, the query
and the resulting relation will be:
If we want to have the Name, Skill, and Skill Level of an employee with Skill
SQL and SkillLevel greater than 5 the query will be:
5
Rename Operation
We may want to apply several relational algebra operations one after
the other. The query could be written in two different forms:
1. Write the operations as a single relational algebra expression
by nesting the operations.
2. Apply one operation at a time and create intermediate result
relations. In the latter case, we must give names to the
relations that hold the intermediate resultsÎRename
Operation
If we want to have the Name, Skill, and Skill Level of an employee with salary
greater than 1500 and working for department 5, we can write the expression
for this query using the two alternatives:
Then Result will be equivalent with the relation we get using the
first alternative.
6
UNION Operation
The result of this operation, denoted by R U S, is a relation that includes
all tuples that are either in R or in S or in both R and S. Duplicate tuples
are eliminated.
The two operands must be “type compatible”.
Type Compatibility
The operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) must have
the same number of attributes, and the domains of corresponding
attributes must be compatible; that is, Dom(Ai)=Dom(Bi) for i=1, 2, ..., n.
INTERSECTION Operation
The result of this operation, denoted by R ∩ S, is a relation that includes
all tuples that are in both R and S. The two operands must be "type
compatible"
7
CARTESIAN (cross product) Operation
This operation is used to combine tuples from two relations in a combinatorial
fashion. That means, every tuple in Relation1(R) one will be related with
every other tuple in Relation2 (S).
In general, the result of R(A1, A2, . . ., An) x
S(B1,B2, . . ., Bm) is a relation Q
with degree n + m attributes Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
Where R has n attributes and S has m attributes.
The resulting relation Q has one tuple for each combination of tuples—one
from R and one from S.
Hence, if R has n tuples, and S has m tuples, then | R x S | will have n* m
tuples.
Example
Employee
ID FName LName
123 Abebe Lemma
567 Belay Taye
822 Kefle Kebede
Dept
DeptID DeptName MangID
2 Finance 567
3 Personnel 123
8
JOIN Operation
The sequence of Cartesian product followed by select is used quite commonly
to identify and select related tuples from two relations, a special operation,
called JOIN. Thus in JOIN operation, the Cartesian Operation and the
Selection Operations are used together.
JOIN Operation is denoted by a symbol.
This operation is very important for any relational database with more than a
single relation, because it allows us to process relationships among relations.
The general form of a join operation on two relations
R(A1, A2,. . ., An) and S(B1, B2, . . ., Bm) is:
Where R and S can be any relations that result from general relational algebra
expressions. Since JOIN function in two relation, it is a Binary operation.
EQUIJOIN Operation
The most common use of join involves join conditions with equality
comparisons only ( = ). Such a join, where the only comparison operator used
is called an EQUIJOIN. In the result of an EQUIJOIN we always have one or
more pairs of attributes (whose names need not be identical) that have
identical values in every tuple since we used the equality logical operator.
For example, the above JOIN expression is an EQUIJOIN since the
logical operator used is the equal to operator ( =).
9
created to get rid of the second (or extra) attribute that we will have in
the result of an EQUIJOIN condition.
The standard definition of natural join requires that the two join
attributes, or each pair of corresponding join attributes, have the same
name in both relations. If this is not the case, a renaming operation on
the attributes is applied first.
Notation:
SEMIJOIN Operation
SEMI JOIN is another version of the JOIN operation where the resulting
Relation will contain those attributes of Relation one that are related
with tuples in the second Relation.
R <Join Condition> S
10
Relational Calculus
A relational calculus expression creates a new relation, which is
specified in terms of variables that range over rows of the stored
database relations (in tuple calculus) or over columns of the stored
relations (in domain calculus).
If COND is a predicate, then the set off all tuples evaluated to be true
for the predicate COND will be expressed as follows:
{t | COND(t)}
11
Where t is a tuple variable and COND (t) is a conditional
expression involving t. The result of such a query is the set of all
tuples t that satisfy COND (t).
If we have set of predicates to evaluate for a single query, the predicates
can be connected using ∧(AND), ∨(OR), and ~(NOT)
12
evaluated to either true or false. And A and B are either
constant or variables.
¾ Formulae should be unambiguous and should make sense.
¾ To find only the EmpId, FName, LName, Skill and the School
where the skill is attended where of employees with skill level
greater than or equal to 8, the tuple based relational calculus
expression will be:
13
Quantifiers in Relation Calculus
¾ To tell how many instances the predicate applies to, we can use
the two quantifiers in the predicate logic.
¾ One relational calculus expressed using Existential Quantifier can
also be expressed using Universal Quantifier.
Example:
14