0% found this document useful (0 votes)
39 views

Topic 3 - Database Design

This document discusses database design, which involves collecting requirements, conceptual design, logical design, and physical design. Conceptual design creates an abstract model of the data independent of any database system. It involves identifying entities, relationships between entities, and building a conceptual data model. Logical design maps the conceptual schema to a specific database implementation model. Physical design specifies storage structures, access methods, and file organization. The entity-relationship model and relational model are also introduced.

Uploaded by

Richard Kyuli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Topic 3 - Database Design

This document discusses database design, which involves collecting requirements, conceptual design, logical design, and physical design. Conceptual design creates an abstract model of the data independent of any database system. It involves identifying entities, relationships between entities, and building a conceptual data model. Logical design maps the conceptual schema to a specific database implementation model. Physical design specifies storage structures, access methods, and file organization. The entity-relationship model and relational model are also introduced.

Uploaded by

Richard Kyuli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

CHAPTER 3: DATABASE DESIGN

Introduction to Database Design

Database design: The process of creating a design for a database that will support the enterprise’s
operations and objectives. After data requirements collection and analysis, database design is
performed in three main phases (stages) namely:
 Conceptual design,
 Logical design
 Physical design.
The database design process consists of a number of steps listed below.

Step 1: Data Requirements Collection and Analysis

 Prospective users are interviewed to understand and document data requirements


 This step results in a concise set of user requirements, which should be detailed and complete.
 The functional requirements should be specified, as well as the data requirements. Functional
requirements consist of user operations that will be applied to the database, including
retrievals and updates.
 Functional requirements can be documented using diagrams such as sequence diagrams,
data flow diagrams, scenarios, etc.
Step 2: Conceptual Design
 Once the requirements are collected and analyzed, the designers go about creating the
conceptual schema.
 Conceptual schema: concise description of data requirements of the users, and includes a
detailed description of the entity types, relationships and constraints.
 The concepts do not include implementation details; therefore the end users easily understand
them, and they can be used as a communication tool.
 The conceptual schema is used to ensure all user requirements are met, and they do not
conflict.
Step 3: Logical Design: Database Implementation

 Many DBMS systems use an implementation data model, so the conceptual schema is
transformed from the high-level data model into the implementation data model.
 This step is called logical design or data model mapping, which results in the implementation
data model of the DBMS.
Step 4: Physical Design

 Internal storage structures, indexes, access paths and file organizations are specified.
Application programs are designed and implemented
The Entity Relationship Model

 In software Engineering, an entity relational model is an abstract and conceptual


representation of data.
 Entity-relationship modeling is a database modeling method, used to produce a type of
conceptual schema or semantic data model of a system, often a relational database, and its
requirements in a top-down fashion.
 Diagrams created by this process are called entity-relationship diagrams, ER diagrams, or
ERDs.
 In 1976, Entity relationship model developed by Chen,
 ER Model is high level Conceptual model which used conceptual design of database
whereas Relational model are used to logical design of database

Relational Model

The most popular data model in DBMS is the Relational Model. It is more scientific a model than
others. This model is based on first-order predicate logic and defines a table as an n-ary relation.

The main highlights of this model are :

 Data is stored in tables called relations.


 Relations can be normalized.
 In normalized relations, values saved are atomic values.
 Each row in a relation contains a unique value.
 Each column in a relation contains values from a same domain.
ER Diagram

 A database can be modelled as:


o A collection of entities
o Relationship among the entities
 An entity is an real world object that exist and it is distinguishable from other entities
Example: Person, company, event, plant

 All the entities in the data model have attributes also known as properties of an entity
Example: people have names and addresses

 An Entity set is a set of an entity of all same type that shares the same properties.
Example: set of all persons, companies, trees

ER Diagram

Rectangles represent entity sets.


Diamonds represent relationship sets.
Lines link attributes to entity sets and entity sets to relationship sets.
Underline indicates primary key attributes
Ellipses represent an attributes
Double Lines represent total participation of an entity in a relationship set
Double rectangle represent a weak entity set

Phase 1: Conceptual design


This is the process of constructing an abstract model of data to be included in the database. This
model should be independent of any database management system (DBMS) and other physical
considerations.
Steps followed
a) Identification of entities
b) Identification of relationships
c) Construction of conceptual data model.
a) Identify entities
 It is a thing or object of importance about which data must be captured.
 To identify entities in the requirements, map nouns to entities
 Database entities appear in a data model as a rectangle with a title.
 The title (in upper case) is the name of the entity.
Types of entities
i. Strong entity: This is an entity that has a primary key (unique identifier).A weak entity is
existence dependent.

Example: In the Case of Client entity Client_no is the primary key of Client entity which is
used to uniquely identified among the Client ‘s entity set. In the case of Customer Entity,
Customer_id is the primary key of Customer Entity which is used to uniquely identify among
the Customer’s entity set. Strong Entity type is represented by rectangle Symbol

Strong entity

ii. Weak entity: This is an entity that does not have sufficient attributes to form a primary key
(unique identifier). Entity type which is dependent on some other entity type is called as Weak
entity type. Weak entity type is dependent on a strong entity and cannot exist on its own. The
existence of a weak entity is indicated by a double rectangle in the ER diagram.
Weak entity

iii. Recursive Entity: This is an entity in which a relationship can exist between occurrences of
the same entity. This occurs in a unary relationship. For instance, a doctor can be a patient.
They are represented as follows:

Treat

Doctor

iv. Composite Entities: An entity created to resolve Many to Many relationship to one-to-many. It
is also referred to as a bridge entity. Bridge entity is composed of the primary keys of each of
the entities to be connected. A composite entity is represented by a diamond shape within a
rectangle in an ER Diagram.
Remarks:
Generalization: The process of defining a more general entity type from a set of more
specialized entity types. Bottom-Up
Specialization: The process of defining one or more subtypes of the supertype and forming
supertype/subtype relationships. Top-Down

b) Identification of relationships
A relationship is a meaningful association between entity types.
Classifying relationships:
i. Degree
ii. Cardinality
iii. Optionality
1. Classification according to degree
Degree: The number of entity types participating in a relationship.
i. Unary (recursive) relationship: One entity type participates in a relationship.
DOCTOR DOCTOR

ii. Binary relationship: Two entity types participate in a relationship.


DOCTOR PATIENT

iii. Ternary relationship: Three entity types participate in a relationship.

PRESCRIPTION PHARMICST

PATIENT


iv. n-ary relationship: n entity types participate in a relationship.

2. Classification according to cardinality(connectivity)

In a relationship, cardinality refers to the number of entity occurrences involved in the relationship. It
also defined as minimum and maximum number of entity occurrences that can participate in a
relationship.
Examples
 One-to-one − One entity from entity set A can be associated with at most one entity of entity
set B and vice versa.

 One-to-many − One entity from entity set A can be associated with more than one entities of
entity set B however an entity from entity set B, can be associated with at most one entity.

 Many-to-one − More than one entities from entity set A can be associated with at most one
entity of entity set B, however an entity from entity set B can be associated with more than one
entity from entity set A.
 Many-to-many − One entity from A can be associated with more than one entity from B and
vice versa.

3. Classification according to optionality(participation)


Optionality refers to the minimum number of entities in a relationship. Can be mandatory or optional
 Examples
 Mandatory: There must be an entity occurrence participating in the relationship
 Optionality: There may be an entity occurrence participating in the relationship. To
show optionality, put a circle or `0' at the `optional end' of the relationship.

c) Building conceptual data model


The data model that results after conceptual design is called conceptual data model
Choose a notation and design the data model. Examples of notations are Chen’s notation , Rein85
notation, Crow’s foot notation and IDEF1X notations. Most of the differences concern how
relationships are specified and how attributes are shown. In almost all variations, entities are
depicted as rectangles with either pointed or rounded corners. The entity name appears inside.
Relationships can be displayed as diamonds or can be simply line segments between two entities.
For Relationships, there is need to convey relationship name, degree, cardinality and optionality
where necessary. Any notation selected must be used consistently.

Chen’s notation Crow’s foot Rein85 IDEF1X


 Relationship Name:  Relationship name:  Relationship  Relationship
Displayed just inside the Relationships name name: name:
relationship diamond. indicated next to entity Relationships name Relationships name
 Cardinality: The above or below the link. indicated next to indicated next to
cardinality is indicated by  Cardinality: Crow's relationship relationship
placing the appropriate feet are used to show a diamond. diamond.
numbers beside the entities many side of a  Cardinality: Not  Cardinality: Not
using the format (x,y), which relationship. A single line shown except the shown except the
represent the minimum and indicates a one side of connectivity. connectivity.
maximum value, the relationship.  Optionality:  Optionality:
respectively.  Optionality: Optional Optional participation Optional
 Optionality: Mandatory participation is shown is shown with an participation is
participation indicated by with an open circle next open circle next to shown with an open
one/ two line(s) linking the to optional end. optional end. circle next to
entity to the diamond next Mandatory participation Mandatory optional end.
to mandatory end. Optional is shown with two participation is Mandatory
participation indicated by a vertical lines. shown with participation is
circle near optional end. continuous link line shown with
next to mandatory continuous link line
end. next to mandatory
end.
Comparison of ERD notations.
a) Chen’s notation
Chen model moved conceptual modeling into the practical database design arena by establishing
basic building blocks: entities and relationships. Dominant player in the CASE tool market during the
1980s and early 1990s.
b) Crow’s Foot notation
This model combines connectivity and cardinality information in a single symbol set. Popularized by
the Knowledgeware modeling tool. Cardinality is limited to 0,1 or N.
c) Rein85 notation
This model is based on the same modeling conventions as the Crow’s Foot model, it’s symbols are
different. It does not recognize cardinalities explicitly, relying on connectivities to lead to logical
cardinality conclusions
d) IDEF1X notation
This is a derivative of the integrated computer-aided manufacturing (ICAM) studies of the late 1970s.
Became the source of graphical methods for defining the functions, data structures and dynamics of
manufacturing businesses. The integration of these methods became known as IDEF (ICAM
Definition). Hughes Aircraft developed the original version named IDEF1. Later, the extended version
was named IDEF1X.
Example 1: Chen’s notation

M
M 1
1
DOCTOR PATIENT MEDICATION
treat takes

(0, N) (0, N)
(1, 1) (1, 1)

The above model will be read as follows:


Each DOCTOR must treat one or more PATIENTS. Each patient must be treated by a doctor.
Each PATIENT may take one or more MEDICATIONS. Each MEDICATION is given to one
PATIENT.

Example1: Crow’s foot notation.


Treat Takes
PATIENT
DOCTOR
MEDICATION

The above model will be read as follows:


Each DOCTOR must treat one or more PATIENTS. Each patient must be treated by a doctor.
Each PATIENT may take one or more MEDICATIONS. Each MEDICATION is given to one
PATIENT.
Example1: Rein85
Treat
Takes

DOCTOR PATIENT
MEDICATION

The above model will be read as follows:


Each DOCTOR must treat one or more PATIENTS. Each patient must be treated by a doctor.
Each PATIENT may take one or more MEDICATIONS. Each MEDICATION is given to one
PATIENT.

Example 1 : IDEF1X
The above model will be read as follows:
Each DOCTOR must treat one or more PATIENTS. Each patient must be treated by a doctor.
Each PATIENT may take one or more MEDICATIONS. Each MEDICATION is given to one
PATIENT.
Treats Takes

DOCTOR PATIENT MEDICATION

The features of the conceptual data model include:


i. Important entities and the relationships among them.
ii. No attribute is specified.
iii. No primary key is specified.
Example: A supplier receives orders for products. A SUPPLIER may receive one or more ORDERS.
Each one or more ORDERS must be send to the SUPPLIER. Each order consists of one or several
products. But each PRODUCT is belonging to one ORDER.
Receives
SUPPLIER
PRODUCT

has

ORDER

Figure 1: Conceptual data model

 From the figure above, the conceptual data model is the entities that describe the data and the
relationships between those entities.

Phase 2: Logical database design

The process of constructing a database based on specific database management system (DBMS)
and database model. For example a relational Database model. It considers how the data are
represented using the structures offered by the DBMS:
Steps followed:
 Select database model.
 Map Entity-Relationship Diagrams(from conceptual data model into logical model)
 Identify the attributes
 Normalize database
 Design the ERD (logical data model) and validate with the users.
 Create a data dictionary

i). Selecting database model


A database model is a collection of concepts and rules for the description of the structure of the
database. Some of the models one can choose are flat file database, hierarchical, network, relational
and object oriented.

ii). Map Entity-Relationship Diagrams


-Identify all the entities and relationships as captured in conceptual data model.
iii). Identify attributes

Attributes

 Entities are represented by means of their properties, called attributes.


 Attributes are explicit values such as a quality or characteristic associated with someone or
something (entity).
 They represent characteristics or values about a database relation (an entity)
 All attributes have values. For example, a student entity may have name, class, and age as
attributes.
 There exists a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
Types of attributes
i. Key attributes: Attributes which uniquely identify an instance of an entity (identifier). These
attributes are distinct for each individual entity. For example a student Number or Car
registration number etc.
ii. Non key attributes: These are attributes which describe a non-unique characteristics of an
entity instance (descriptors). Examples year, color, model.
iii. Required attributes: These are attributes which must have a value or a value must be known
for each entity occurrence. Example employee number.
iv. Optional attributes: These are attributes which could have a value or a value may be known
for each entity occurrence. Example an employee; employees spouse is optional since not
every employee will have a spouse.
v. Composite attribute: This is an attribute which can be divided into smaller subparts. These
subparts represent basic attributes with independent meanings of their own. For example,
take Name attributes. We can divide it into sub-parts like First_name, Middle_name, and
Last_name.
vi. Simple attribute: This is an attribute that cannot be divided into subparts. It is therefore said to
be simple or atomic attribute. For example, Employee Number is a simple attribute. Age of
a person is a simple attribute.
vii. Single-valued attribute: It is an attribute that can have single value at a particular instance of
time. For instance, a person cannot have more than one age value.
viii. Multi-valued attribute: It is an attribute that can have more than one value at one time. For
instance, a person can have more than one degree at one time, therefore attribute degree is
said to be a multi-valued attribute.
ix. Stored attribute: An attribute that supplies a value to the related attribute.
x. Derived attribute: An attribute whose value is derived from a stored attribute. These attributes
are usually created by a formula or certain operation on other attributes. Example: Age=
[Current-year]- [year_of_birth]
xi. Candidate Key: An attribute or set of attributes that uniquely identifies individual occurrences
of an entity type.
xii. Primary Key: An attribute that is used to identify an entity type.e.g. entity type student can be
identified by student registration number.
xiii. Composite Key: A candidate key that consists of two or more attributes.
e.g. VoterID+NationalID

iv). Normalization of the database


Data is organized into database tables by using normal forms rules or conditions. These rules help
to make a good database design. Usually, data is organized using first, second and third normal
forms.

 Begin with a list of all of the fields that must appear in the database.
 Do not include computed fields
 You can get this information from a printed document used by the system e.g. reports.
 Additional attributes besides those for the entities described on the document can be added to the
database.

v). Design the ERD (logical data model) and validate with the users.
In this step we use the normalized forms and standard notations to draw the ERD. Such notations
include Rein95, IDEFl1X, Chen, Crow’s foot among others.

Chen Crow’s Foot Rein85 IDEFI1X


Entity

Relationship
line
Relationship

Optional symbol

One(1) symbol 1

Many(M) M
symbol
Composite
entity
Weak entity
A logical data model describes the data in as much detail as possible, without regard to how they will
be physically implemented in the database. A logical data model is used to explore the domain
concepts, and their relationships, of the problem. This could be done for the scope of a single project
or for the entire enterprise. The logical data model depict the logical entity types, typically referred to
simply as entity types, the data attributes describing those entities, and the relationships between the
entities.

Features of a logical data model include:

 Includes all entities and relationships among them.


 All attributes for each entity are specified.
 The primary key for each entity is specified.
 Foreign keys (keys identifying the relationship between different entities) are specified.
 Normalization occurs at this level.

Figure 2: Logical Data model using crow’s foot.

Logical data model versus Conceptual data model

In a logical data model, primary keys are present, whereas in a conceptual data model, no primary
key is present.
In a logical data model, all attributes are specified within an entity. No attributes are specified in a
conceptual data model.
Relationships between entities are specified using primary keys and foreign keys in a logical data
model. In a conceptual data model, the relationships are simply stated, not specified, so we simply
know that two entities are related, but we do not specify what attributes are used for this relationship.

Phase 3: Physical database design


This is a process of constructing a model of information used in an organization based on physical
considerations like software and hardware. Physical design is concerned on how the database is
organized as files and what kind of structures to use for efficiency of database processing
Steps:
• Select DBMS
• Select storage devices
• Determine access methods
• Design files and indexes
• Determine database distribution
• Produce the physical model and validate with the users
i). Select DBMS
Factors considered when selecting the DBMS
 Costs
 Features and Tools
 Underlying model
 Portability
 DBMS hardware requirements
 Security
 Usability

Main steps to selecting a DBMS:


 Define terms of reference of study
 Shortlist two or three products
 Evaluate products
 Recommend selection and produce report.
Examples of DBMS
 Microsoft Access
 Microsoft SQL server
 Oracle
 Informix
 MySQL
Functions of DBMS
 Database creation (storing data in a defined database) using data definition language
 Retrieval (query and reporting) using data manipulation language.
 Update (Changing the contents of the database) using data manipulation language.
 Programming user facilities for system development.
 Database revision and restructuring using data manipulation language.
 Database integrity control using data definition language
 Performance Monitoring
 Concurrency control; controlling simultaneous access to the database

ii). Select storage devices


When selecting storage devices consider:
 Speed with which data can be accessed
 Cost per unit of data
 Reliability: loss on power failure or system crash and physical failure of the storage device
Media
 Cache: Fastest and most costly form of storage; volatile; managed by the computer system
hardware.
 Main memory: Fast access but generally too small (or too expensive) to store the entire
database.
 Flash memory: Data survives power failure. Data can be written at a location only once, but
location can be erased and written to again. Reads are roughly as fast as main memory. But
writes are slow (few microseconds), erase is slower.
 Magnetic-disk: Data is stored on spinning disk, and read/written magnetically. It is a primary
medium for the long-term storage of data; typically stores entire database. Data must be moved
from disk to main memory for access, and written back for storage. It is possible to read data on
disk in any order, unlike magnetic tape. Survives power failures and system crashes. However,
disk failure can destroy data.
 Optical storage: This is a non-volatile, data is read optically from a spinning disk using a laser.
Reads and writes are slower than with magnetic disk.
 Tape storage: Is a non-volatile storage, used primarily for backup (to recover from disk failure),
and for archival data. Allows sequential-access and thus much slower than disk. Very high
capacity.

iii). Determine access methods


In this step identify specific structures and access methods for the data to achieve optimum
performance for the database system. The access method depends on storage structure. Some of
the access methods are:
 Sequential
 Indexed sequential
 Hashing

iv). Design files and indexes


Creating set of tables and constraints on these tables from the information given in logical data
model. The tables are designed and their field properties specified.
SUPPLIER TABLE
FieldName Data type Length
SupplierID Number 8
SupplierName Text 50
Address Text 50
Street Text 50
ORDER TABLE
FieldName Data type Length
OrderID Text 50
OrderName Text 50
Dateprepared Date/Time
SupplierID Number 8
PRODUCT TABLE
FieldName Data type Length
ProductID Text 30
ProductName Text 60
Quantity Number 8
OrderID Text 50

v). Determine database distribution


There is need to consider location and number of users when deciding the database distribution. For
instance the database may single user database in which case the database can be stored in a local
computer, multiuser database may stored in a dedicated server where users can access
simultaneously. If the users are not within the same location and the organization has several
branches in different locations, then a distributed database may be used.

Physical data model

Physical data model represents how the model will be built in the database.
A physical database model shows all table structures, including column name, column data type,
column constraints, primary key, foreign key, and relationships between tables.
Features of a physical data model include:
i. Specification of all tables and columns.
ii. Foreign keys are used to identify relationships between tables.
iii. De-normalization may occur based on user requirements.
iv. Physical considerations may cause the physical data model to be quite different from the
logical data model.
v. Physical data model will be different for different RDBMS. For example, data type for a
column may be different between MySQL and SQL Server.

The steps for physical data model design are as follows:

i. Convert entities into tables.


ii. Convert relationships into foreign keys.
iii. Convert attributes into columns.
iv. Modify the physical data model based on physical constraints / requirements

SUPPLIER
1 PRODUCT

SupplierID(PK): Number ∞ ProductID (PK): Text(30)


SupplerName: ProductName : Text (60)
Text(50)
Quantity :Number
Address:
Text(50) OrderID(FK) :Text(50)

ORDER

OrderID (PK) : Text(50)


OrderName :
Text(50) 1
Figure 3: Physical data model

Logical data model versus Physical data model

Comparing the logical data model shown above with the logical data model diagram, we see the main
differences between the two:

i. Entity names are now table names.


ii. Attributes are now column names.
iii. Data type for each column is specified. Data types can be different depending on the
actual database being used.

Revision questions
a) Describe four functions of a database management system (DBMS). [8 Marks]
b) Describe the three levels of database architecture. [6 Marks]
c) Discuss the logical and physical data independence. [4 Marks]
d) Define the following attributes as used in relational database design. [6 Marks]
i). Multi-valued attribute
ii). Composite attribute
iii). Derived attribute
e) Identify at least three attributes for each of the following entities below and choose one of your
attributes as the identifier for the entities.
i). Customer. [3 Marks]
ii). Organization. [3 Marks]
f) Discuss the differences in the following data modelling notations.
i). Chen
ii). IDEF1X
iii). Rein85
iv). Crow’s foot
g) Describe the process of physical database design. [8 Marks]
h) Briefly explain the following database storage devices.
i). Optical storage
ii). Magnetic disk
iii). Tape storage [6 Marks]
i) Explain the following database access methods.
i). Hash
ii). Indexed sequential access [6 Marks]

You might also like