0% found this document useful (0 votes)
3 views

lecture 3_

Data modeling is a critical part of database design, focusing on the logical and physical structure of databases to meet user needs. It involves several steps, including planning, conceptual design, and implementation, and produces outputs like entity-relationship diagrams and data documents. The process ensures that all necessary data objects are accurately represented, facilitating effective database development and integrity.

Uploaded by

kbjoash
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

lecture 3_

Data modeling is a critical part of database design, focusing on the logical and physical structure of databases to meet user needs. It involves several steps, including planning, conceptual design, and implementation, and produces outputs like entity-relationship diagrams and data documents. The process ensures that all necessary data objects are accurately represented, facilitating effective database development and integrity.

Uploaded by

kbjoash
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Data Modelling

Data Modeling In the Context of Database Design

 Database design is defined as: "design the logical and physical structure of one
or more databases to accommodate the information needs of the users in an
organization for a defined set of applications". The design process roughly
follows five steps:

1. planning and analysis


2. conceptual design
3. logical design
4. physical design
5. implementation

 The data model is one part of the conceptual design process. The other, typically
is the functional model. The data model focuses on what data should be stored
in the database while the functional model deals with how the data is processed.
 To put this in the context of the relational database, the data model is used to
design the relational tables. The functional model is used to design the queries
which will access and perform operations on those tables.
Components of A Data Model

 The data model gets its inputs from the planning and analysis
stage. Here the modeler, along with analysts, collects
information about the requirements of the database by
reviewing existing documentation and interviewing end-users.

 The data model has two outputs:


 The first is an entity-relationship diagram which represents the
data structures in a pictorial form. Because the diagram is easily
learned, it is valuable tool to communicate the model to the end-
user.
 The second component is a data document. This document that
describes in details the data objects, relationships, and rules
required by the database. The dictionary provides the detail
required by the database developer to construct the physical
database.
Why is Data Modeling Important?
Data modeling is probably the most labor intensive and time consuming part
of the development process. Why bother especially if you are pressed for
time?
 A common response by practitioners who write on the subject is that you
should no more build a database without a model than you should build a
house without blueprints.
 The goal of the data model is to make sure that the all data objects
required by the database are completely and accurately represented.
Because the data model uses easily understood notations and natural
language, it can be reviewed and verified as correct by the end-users.
 The data model is also detailed enough to be used by the database
developers to use as a "blueprint" for building the physical database. The
information contained in the data model will be used to define the
relational tables, primary and foreign keys, stored procedures, and
triggers.
 A poorly designed database will require more time in the long-term.
Without careful planning you may create a database that omits data
required to create critical reports, produces results that are incorrect or
inconsistent, and is unable to accommodate changes in the user's
requirements.
The Entity-Relationship Model
 The Entity-Relationship (ER) model was originally proposed by Peter in
1976 as a way to unify the network and relational database views.
Simply stated the ER model is a conceptual data model that views the
real world as entities and relationships.
 A basic component of the model is the Entity- Relationship diagram
which is used to visually represents data objects. Since Chen wrote his
paper the model has been extended and today it is commonly used for
database design.
 For the database designer, the utility of the ER model is:
 it maps well to the relational model. The constructs used in the ER model
can easily be transformed into relational tables.
 it is simple and easy to understand with a minimum of training. Therefore,
the model can be used by the database designer to communicate the
design to the end user.
 In addition, the model can be used as a design plan by the database
developer to implement a data model in a specific database management
software.
Basic Constructs of E-R Modeling
 The ER model views the real world as a construct of entities and
association between entities.
 Entities are the principal data object about which information is to
be collected. Entities are usually recognizable concepts, either
concrete or abstract, such as person, places, things, or events
which have relevance to the database. Some specific examples of
entities are EMPLOYEES, PROJECTS, INVOICES.
 An entity is analogous to a table in the relational model.
 Entities are classified as independent or dependent (in some
methodologies, the terms used are strong and weak, respectively).
An independent entity is one that does not rely on another for
identification. A dependent entity is one that relies on another for
identification.
 An entity occurrence (also called an instance) is an individual
occurrence of an entity. An occurrence is analogous to a row in the
relational table.
Data Modeling As Part of Database Design

 The data model is one part of the conceptual design process. The
other is the function model.
 The data model focuses on what data should be stored in the
database while the function model deals with how the data is
processed. To put this in the context of the relational database, the
data model is used to design the relational tables. The functional
model is used to design the queries that will access and perform
operations on those tables.
 Data modeling is preceeded by planning and analysis. The effort
devoted to this stage is proportional to the scope of the database. The
planning and analysis of a database intended to serve the needs of an
enterprise will require more effort than one intended to serve a small
workgroup.
 The information needed to build a data model is gathered during the
requirements analysis. Although not formally considered part of the
data modeling stage by some methodologies, in reality the
requirements analysis and the ER diagramming part of the data model
are done at the same time.
Requirements Analysis
 The goals of the requirements analysis are:

 to determine the data requirements of the database in terms of primitive


objects

 to classify and describe the information about these objects

 to identify and classify the relationships among the objects

 to determine the types of transactions that will be executed on the


database and the interactions between the data and the transactions

 to identify rules governing the integrity of the data


Information needed for the requirements analysis can be gathered in
several ways:
• review of existing documents - such documents include existing
forms and reports, written guidelines, job descriptions, personal
narratives, and memoranda. Paper documentation is a good way
to become familiar with the organization or activity you need to
model.
• interviews with end users - these can be a combination of
individual or group meetings. Try to keep group sessions to under
five or six people. If possible, try to have everyone with the same
function in one meeting. Use a blackboard, flip charts, or
overhead transparencies to record information gathered from the
interviews.
• review of existing automated systems - if the organization
already has an automated system, review the system design
specifications and documentation

The requirements analysis is usually done at the same time as the


data modeling. As information is collected, data objects are identified
and classified as entities, attributes, or relationship; assigned names;
and, defined using terms familiar to the end-users. The objects are
Attributes

 Attributes are data objects that either identify or


describe entities. Attributes that identify entities are
called key attributes.
 Attributes that describe an entity are called non-key
attributes. The process for identifying attributes is similar
except now you want to look for and extract those
names that appear to be descriptive noun phrases.
Validating Attributes
 Attribute values should be atomic, that is, present a
single fact. Having disaggregated data allows simpler
programming, greater reusability of data, and easier
implementation of changes. Normalization also depends
upon the "single fact" rule being followed.
Relationships

 Relationships are associations between entities. Typically, a relationship is


indicated by a verb connecting two or more entities. For example:
 employees are assigned to projects

 As relationships are identified they should be classified in terms of cardinality,


optionality, direction, and dependence. As a result of defining the
relationships, some relationships may be dropped and new relationships
added.
 Cardinality quantifies the relationships between entities by measuring how
many instances of one entity are related to a single instance of another. To
determine the cardinality, assume the existence of an instance of one of the
entities. Then determine how many specific instances of the second entity
could be related to the first. Repeat this analysis reversing the entities. For
example:

 employees may be assigned to no more than three projects at a time;


every project has at least two employees assigned to it.

 Here the cardinality of the relationship from employees to projects is three;


from projects to employees, the cardinality is two. Therefore, this relationship
can be classified as a many-to-many relationship.
 If a relationship can have a cardinality of zero, it is an optional
relationship. If it must have a cardinality of at least one, the
relationship is mandatory. Optional relationships are typically
indicated by the conditional tense. For example:

 an employee may be assigned to a project

 Mandatory relationships, on the other hand, are indicated by words


such as must have.

 For example: a student must register for at least three course each
semester

 In the case of the specific relationship form (1:1 and 1:M), there is
always a parent entity and a child entity. In one-to-many
relationships, the parent is always the entity with the cardinality of
one. In one-to- one relationships, the choice of the parent entity
must be made in the context of the business being modeled. If a
decision cannot be made, the choice is arbitrary.
Primary and Foreign Keys
 Primary and foreign keys are the most basic components on which
relational theory is based.
 Primary keys enforce entity integrity by uniquely identifying entity
instances.
 Foreign keys enforce referential integrity by completing an
association between two entities. The next step in building the
basic data model to
1. identify and define the primary key attributes for each entity
2. validate primary keys and relationships
3. migrate the primary keys to establish foreign keys
Composite Keys

 Sometimes it requires more than one attribute to


uniquely identify an entity.
 A primary key that made up of more than one
attribute is known as a composite key.
Validate Keys and Relationships

 Basic rules governing the identification and migration of primary


keys are:
• Every entity in the data model shall have a primary key whose values uniquely
identify entity instances.
• The primary key attribute cannot be optional (i.e., have null values).
• The primary key cannot have repeating values. That is, the attribute may not
have more than one value at a time for a given entity instance is prohibited.
This is known as the No Repeat Rule.
• Entities with compound primary keys cannot be split into multiple entities with
simpler primary keys. This is called the Smallest Key Rule.
• Two entities may not have identical primary keys with the exception of entities
within generalization hierarchies.
• The entire primary key must migrate from parent entities to child entities and
from supertype, generic entities, to subtypes, category entities.
Foreign Keys
A foreign key is an attribute that completes a relationship by identifying the parent entity.
Foreign keys provide a method for maintaining integrity in the data (called referential
integrity) and for navigating between different instances of an entity. Every relationship in the
model must be supported by a foreign key.

Identifying Foreign Keys


 Every dependent and category (subtype) entity in the model must have a foreign key for
each relationship in which it participates. Foreign keys are formed in dependent and
subtype entities by migrating the entire primary key from the parent or generic entity. If
the primary key is composite, it may not be split.

Foreign Key Ownership


 Foreign key attributes are not considered to be owned by the entities to which they
migrate, because they are reflections of attributes in the parent entities. Thus, each
attribute in an entity is either owned by that entity or belongs to a foreign key in that
entity.
 If the primary key of a child entity contains all the attributes in a foreign key, the child
entity is said to be "identifier dependent" on the parent entity, and the relationship is
called an "identifying relationship."
 If any attributes in a foreign key do not belong to the child's primary key, the child is not
identifier dependent on the parent, and the relationship is called "non identifying."
Generalization

The process of categorizing entities by their similarities and differences is known


as generalization.

 A generalization hierarchy is a structured grouping of entities that share


common attributes. It is a powerful and widely used method for representing
common characteristics among entities while preserving their differences. It is
the relationship between an entity and one or more refined versions.
 The entity being refined is called the supertype and each refined version is
called the subtype.

 Generalization hierarchies should be used when


1) a large number of entities appear to be of the same type
2) attributes are repeated for multiple entities
or
2) the model is continually evolving.
 Generalization hierarchies improve the stability of the model by allowing
changes to be made only to those entities germane to the change and simplify
the model by reducing the number of entities in the model.
Types of Hierarchies
 A generalization hierarchy can either be overlapping or disjoint. In
an overlapping hierarchy an entity instance can be part of multiple
subtypes. For example, to represent people at a university you
have identified the supertype entity PERSON which has three
subtypes, FACULTY, STAFF, and STUDENT.
 It is quite possible for an individual to be in more than one
subtype, a staff member who is also registered as a student, for
example.
 In a disjoint hierarchy, an entity instance can be in only one
subtype. For example, the entity EMPLOYEE, may have two
subtypes, CLASSIFIED and WAGES. An employee may be one type
or the other but not both.
Rules of Generalization

 The primary rule of generalization hierarchies is that each


instance of the supertype entity must appear in at least one
subtype; likewise, an instance of the subtype must appear in the
supertype.

 Subtypes can be a part of only one generalization hierarchy. That


is, a subtype can not be related to more than one supertype.
However, generalization hierarchies may be nested by having the
subtype of one hierarchy be the supertype for another.
 Subtypes may be the parent entity in a relationship but not the
child. If this were allowed, the subtype would inherit two primary
keys.
 Generalization hierarchies are a structure that enables the
modeler to represent entities that share common characteristics
but also have differences.
Data Integrity Rules
Data integrity is one of the cornerstones of the relational model. Simply
stated data integrity means that the data values in the database are
correct and consistent.
 Data integrity is enforced in the relational model by entity and
referential integrity rules. Although not part of the relational model,
most database software enforces attribute integrity through the use of
domain information.
Entity Integrity
 The entity integrity rule states that for every instance of an entity, the
value of the primary key must exist, be unique, and cannot be null.
Without entity integrity, the primary key could not fulfill its role of
uniquely identifying each instance of an entity.
Referential Integrity
 The referential integrity rule states that every foreign key value must
match a primary key value in an associated table. Referential integrity
ensures that we can correctly navigate between related entities.
Insert and Delete Rules

 A foreign key creates a hierarchical relationship


between two associated entities. The entity
containing the foreign key is the child, or
dependent, and the table containing the primary
key from which the foreign key values are obtained
is the parent.
 In order to maintain referential integrity between
the parent and child as data is inserted or deleted
from the database certain insert and delete rules
must be considered.

You might also like