UML, ER and Dimensional Modelling

UML, ER and DIMENSIONAL MODELS
S t e f a n o D a l l a P a l m a

A model is an abstraction
that describe a system

Software size grows up rapidly

UML is way of visualizing a software program using
diagrams
Class
Diagram
Object
Diagram
Package
Diagram
Component
Diagram
Deployment
Diagram
Composite
Structure
Diagram
Sequence
Diagram
Activity
Diagram
Use case
Diagram
Collaborative
Diagram
Statechart
Diagram
Structural diagrams Behavioural diagrams

UML is way of visualizing a software program using
diagrams
Class
Diagram
Object
Diagram
Package
Diagram
Component
Diagram
Deployment
Diagram
Composite
Structure
Diagram
Structural diagrams Behavioural diagrams
Activity
Diagram
Use case
Diagram
Collaborative
Diagram
Statechart
Diagram
Sequence
Diagram

Why do we use UML?
A complex enterprise application with many
collaborators will require a solid foundation of
planning and clear, concise communication among
team members as the project advance. Visualizing
user interactions, processes, and the structure of the
system you're trying to build will help save time down
the line and make sure everyone on the team is on the
same page.

ActorName
Actor
Actors are roles, external entities (users, systems, environment)
that interact with the system

UseCaseName
A use case is a class of functionalities provided by the
system as events flow. It consists of an unique name,
participating actors, entry conditions, event flows, exit
conditions, special requirements
Use Case

JADS
Student
<<Use>>
RegisterGrades
JADS
Professor
NotifyPostponedExam
System
Notifier
AddExam
BookExam
<<User>>
<<User>>
<<System>>

Include
UC2
<<include>>
UC1
Include is used to extract use case fragments that
are duplicated in multiple use cases for the purpose of
reuse

ATMUser
DepositCash WithdrawCash TransferFunds
Authenticate
<<include>> <<include>>
<<include>>

Extend
UC2
<<extend>>
UC1
Extend models exceptional or rare cases
It is used when a use case adds steps to another first-
class use case

ATMUser
Cancel
<<extend>> <<extend>><<extend>>
CardNotSupported

Inheritance
Inheritance models the hierarchy between actors and
use cases. It is used to specialize a more general use
case adding more details
UC2UC1

ATMUser
Authenticate
<<include>> <<include>>
<<include>>
AuthenticateWithPin AuthenticateWithFingerprint

Access modifier
All classes have different access levels – visibility – depending on the access modifier
Visibility markers signify who can access the information
contained within a class

Inheritance
Inheritance (or generalization) is a
relationship between two classes
where one class is a specialized
version of another

Associations
Associations model static relationships between classes
Unidirectional association
Bidirectional association

In a unidirectional association one class is aware of the
other and interact with it
It is modelled with a straight connecting line that points an open
arrowhead from knowing class to the known class

In a bidirectional association both classes are aware of
each other and their relationships with the other
Association name* 0…1
Multiplicity
Role Role
Reading direction

Unary association
wife
husband
0…1
0…1
marriedTo

Unary association
wife
husband
0…1
0…1
marriedTo
marriedTo
wifehusband

Unary association
wife
husband
0...1
0...1
marriedTo
marriedTo
wifehusband
marriedTo
wifehusband
husband
wife

Unary association
wife
husband
0...1
0…1
marriedTo
marriedTo
wifehusband
marriedTo
wifehusband
husband
wife

Binary association
hasCapital
1 1
administers
1 *
attends
* *
1-to-1 association
1-to-many association
many-to-many association

n-ary association: more than two classes are
involved in the relationship
*
*
0…1 examiner
Grade
Ternary association
class

Association class is necessary when modeling many-to-many
associations. It is possible, but not necessary, when modeling 1-
to-1 or 1-to-many associations
*
Association class
*
*1

*
Association class
*
*1
Association class is necessary when modeling many-to-many
associations. It is possible, but not necessary, when modeling 1-
to-1 or 1-to-many associations

Dependency
A dependency is a relationship between two elements in
which a change to an independent element can have
repercussions on a dependent element
Class A depends on class B, which means that changing
the structure or the behaviour of B could affect the
structure or the behaviour of A
Independent classDependent class

Aggregation
Aggregation is a special association that denotes a “consist of”
hierarchy between a class A, the aggregate, and a class B, its part. It
is used to express that a class is part of another class. Aggregation
is a weak ownership between two classes, meaning that parts can
exist independently of the whole

4,4 2,5
0,1*
* 1,*
Student is part of a lab class
Course is part of a study program
A car consist of an engine, a battery, four wheel and at least two doors
and maximum five doors

Composition
Composition is a special type of aggregation that denotes a strong
ownership between class A, the whole, and class B, its part. There exists a
dependency between the two classes. If the composite object (the whole) is
deleted, also its parts are deleted. Furthermore, one part can only be
contained in at most one composite object at one specific point in time
1

Lecture hall is part of the building. If the building is
deleted, the lecture hall is also deleted
City is part of the Country and the Country is part of the
World. If the Country is deleted, the City is also deleted. If
the World is deleted…
1
11
*
* *

describe interactions among classes, or
among actors and the system, in terms of
an exchange of messages over time
Sequence diagrams

Participant
Participant (or class role) describe the way an object will
behave in context. Use the UML object symbols to
illustrate roles, but don’t list object attributes

Lifeline
Lifelines are vertical dashed lines that indicate the
object's presence over time. Theyrepresent either roles
or object instances that participate in the sequence
being modelled

Execution
Occurrence
Activation boxes represent the time an object needs to
complete a task. When an object is busy executing a
process or waiting for a reply message, use a thin
rectangle placed vertically on its lifeline

Messages
Messages are arrows that represent communication
between objects
The message (or method) name is placed above the arrowed line. The message that is
being sent to the receiving object represents an operation (or method) that the
receiving object's class implements.

Synchronous
message
A synchronous message requires a response before the
interaction can continue. It's usually drawn using a line
with a solid arrowhead pointing from one object to
another

Asynchronous
message
An asynchronous message doesn’t need a reply for
interaction to continue. Like synchronous messages,
they are drawn with an arrow connecting two lifelines;
however, the arrowhead is usually opened and there is
no return message depicted

Reply message
A reply message is drawn with a dotted line and an open
arrowhead pointing back to the original lifeline

Self message
A self message is a message an object sends to itself,
usually shown as a U shaped arrow pointing back to
itself

Create message
A create message creates an new object. Similar to a
return message, it’s depicted with a dashed line and an
open arrowhead that points to the rectangle
representing the object created

Destroy message
A destroy message destroys an object. It can be shown
by an arrow with an X at the end

Sequence diagrams represent the dynamic behaviour of the system
in term of interactions between objects and actors
They are complementary to class diagrams. That is, with class
diagrams one knowns the static structure of a systems and
relationships between classes, but not how classes interact each
other over time
They require a lot of time, but it’s worth it!
The reason the sequence diagram is so useful is because it shows
the interaction logic between the objects in the system in the time
order that the interactions take place

illustrate dynamic nature of the system by modelling the
flow of control from activity to activity. An activity is an
operation on some class in the system that results in a
change in the state of the system. Typically, activity
diagrams are used to model workflow or business
process and internal operations
Activity diagrams

Initial state
Initial state (or start point) is shown by a small filled
circle followed by an arrow and represents the initial
action or the start point for any activity diagram

Activity
An activity (or action state) represents an ongoing
execution of a step in a workflow (such as an operation
or a transaction) that results in a change of the system
state
Activity

Action flow
Action flows, also called edges and paths, illustrate
transitions from one activity state to another

Branching
A diamond represents a decision (or branch) with
alternate paths
When an activity requires a decision prior to moving on the next
activity, add a diamond between the two activities. The outgoing
alternatives should be labelled with a condition or guard
expression. Guards are statements that must be true before moving
to the next activity
Guards
[“Condition” is true]
[“Condition” is false]

Synchronization
A fork node is used to split a single incoming flow into multiple
concurrent flows. A join node joins multiple concurrent flows back
into a single outgoing flow. A fork and join node used together are
often referred to as synchronization
Fork node
Join node

Final state
A final state is shown by a small filled circle nested
inside another empty circle and represents the final
action state for any activity diagram

What is a database?
A database is a collection of correlated data organized in a
systematic way. With “data” we mean known facts that can be
stored. For example the name, surname and birthdate of a
person. A database must represent the aspects of the real world
under consideration, called miniworld. It must contain
correlated data. Indeed, a random collection of data cannot be
considered a database. Finally, it must be designed for a
specific use. A database can have any size and complexity.

A phonebook is a
small database

What about Facebook and
Google databases?

4.000.000.000.000.000
bytes of data are stored by Facebook every day

The equivalent of
14 billions high
quality 3 minutes
songs

Example: University
We want to realize a database for the University to manage students,
courses (with prerequisites) and passed exams. We can organize it in 4
entities:
1. STUDENT: contains data associated to each student
2. COURSE: contains data associated to each course
3. PREREQUISITES: contains the prerequisites for attending each course.
4. GRADE: contains data about the grades for each student and each
exam.
To define the database, we need to specify the structure of each
entity. That is, we have to specify attributes (data element) of each
record and the type of each data element.

A data model is a set of concepts that can be used to describe the
structure of a database, that is, (1) the types of data, (2)
relationships between data and (3) semantic constraints on data
Data Model

Data models are classified depending on the concepts used to describe the structure
of the database
High level data models (or conceptual data model): provide concepts that are near to
the way the user perceive the data.
Representational data models (or implementational data models): provide concepts
comprehensible by the final users, but that are not so distant from the way they are
physically organized.
Low-lever data models (or physical data models): provide concepts that describe
details on how data are stored on the machine.

Data models are classified depending on the concepts used to describe the structure
of the database
High level data models (or conceptual data model): provide concepts that are near to
the way the user perceive the data.
Representational data models (or implementational data models): provide concepts
comprehensible by the final users, but that are not so distant from the way they are
physically organized.
Low-lever data models (or physical data models): provide concepts that describe
details on how data are stored on the machine.
High level data models use concepts such as entities, attributes and
relationships
STUDENT
Name Student Number Year
COURSE
Name Semester Lecturer
PREREQUISITES
Course Propaedeucity Lecturer
GRADE
Student Course Grade

Entity
An entity is a class of specific objects (facts, persons,
vehicles or otherwise) which have common properties
and are represented in the database

It can be either a physical object or abstract

Attribute
Attributes are properties that describe an entity

The entity Employee has the attributes Name, Surname,
Salary, Birthdate and Address

Separable More values Calculable
There exist several types of attribute

Multi-valued
Attribute
A multivalued attribute is an attribute that can have
multiple value

Key attribute
A key attribute is an attribute that has a unique value
for the entity. More attributes together can form a key
attribute. In such a case those attributes can be
grouped in a composite attribute that becomes a “key
attribute”

Hi, sure. Tell me more about it.
Hi dude! I need a database for my Company. Can you
help me?
Mmm… got it. What about employees?
The company is organized in departments and each
department has a name, a number that identifies it
and an employee the manage it (a manager). We need
to keep track of the manager’s start date. Each
department manages several projects. Each project
has a name, a number and a single location. A
department, instead, can have multiple location.
The Company database example

For each employee we need to keep track of its name,
SSN, address, salary, sex and birthdate. Each
employee works for a department and can work on
more projects. We also want to memorize the total
weekly hours that an employee has spent on a project
and the supervisor of each employee. Finally, each
employee has a list of dependents for which we store
name, sex, birthdate and relationship with the
employee.
Ok bro! I will work on it and I’ll let you
know. See ya!

The company is organized in departments and
each department has a name, a number that identifies
it and an employee that manage it (a manager). We
need to keep track of the manager’s start date. Each
department manages several projects. Each project
has a name, a number and a single location. A
department, instead, can have multiple location.
Let’s find the entities!

For each employee we need to keep track of its
name, SSN, address, salary, sex and birthdate. Each
more projects. We also want to memorize the total
weekly hours that an employee has spent on a project
and the supervisor of each employee. Finally,
each employee has a list of dependents for which
we store name, sex, birthdate and relationship with
the employee.
Let’s find the entities!

The initial design is typically not complete. Some aspects in
the requirements will be represented as relationships.
Indeed, now we have to represent the fact that an employee
can work on multiple projects and the total weekly hours of
an employee on each project. We also need to represent
the fact that an employee manage a department and when
she start doing it and the fact that each employee has a
supervisor

Relationship
A relationship relates two or more entities with a specific
meaning
A diamond-shaped box is used to display a relationship type connected to the participating
entity types via straight lines. Note that the relationship type is not shown with an arrow. The
name should be typically be readable from left to right and top to bottom
Relationship
Name

Constraint on
Relationship
Each relationship type has constraints that limit the possible entity
combinations that can participate to its instances. They are
determined by the context of the mini-world that the relationship
represent. We can distinguish between two types of constraint
(1) Cardinality Ratio and (2) Participation Constraint
Relationship
NameCardinality Cardinality
(min, max)(min, max)

Cardinality Ratio
The Cardinality Ratio specifies the minimum and maximum number
of instances that can participate to the relationship. A cardinality
ration can have any positive number, with the constraint that the
minimum cardinality must be lower than or equals to the maximum
cardinality. The minimum cardinality can be omitted, but the
maximum cardinality must always be present
Relationship
Name
(min, max) (min, max)

Participation
Constraint
The Participation Constraint specifies the minimum number of
instances that can participate to the relationship. That is, it
specifies whether the existence of an entity depends on another
entity. There exist two types
(1) Mandatory participation and (2) Optional participation
Relationship
Name
(1, max) (0, max)
Mandatory Optional

Relationship
Name
Relationship
Name
Mandatory Optional
OptionalMandatory
Different representations of mandatory and optional partecipation

(1,1) (0,N)
The following is a N-to-1 relationship. Each department can have zero or more employees (0,N)
and an employee can work for only one department (1,1). The participation is mandatory for the
entity Employee and optional for the entity Department
Mandatory participation means that each instance in the set of the entity Employee must be
related to the entity department
(0,1) (1,1)
The following is a 1-to-1 relationship. A department can be managed by only one employee and
an employee can manage at maximum one department. Therefore, the relationship is
mandatory for the entity type department and optional for the entity employee

When a type of entity participates to a relationship type more than
once with different roles we refer to those relationships as recursive.
Role Name
Recursive
Relationship
Any entity that participate to some type of relationship has a role. The name of the
role specifies the role the entity has in each instance of the relationship. For example,
in a “MARRIED_TO” relationship an entity PERSON can have the role either of husband
or wife.
Role 1
Role 2

[…] each department has […] an employee that manage it (a
manager). We need to keep track of the manager’s start date.
Each department manages several projects. […] Each
more projects. We also want to memorize the total weekly hours
that an employee has spent on a project and the supervisor of
each employee. Finally, each employee has a list of
dependents […].
Let’s find the relationships!

Why do we use Dimensional
Modelling?
All the modelling techniques give us different ways to store the data.
Different ways of storing data gives us different advantages. For
example, ER Modelling gives us the advantage of storing data is such
a way that there is less redundancy. Dimensional modelling, on the
other hand, give us the advantage of storing data in such a fashion
that it is easier to retrieve the information from the data once the
data is stored in database. Be careful! dimensional model is not a
good solution if your primary purpose of your data modeling is to
reduce storage space requirement, reduce redundancy, speed-up
loading time etc.

Dimensional Model
is a database structure that is optimized for
online queries and Data Warehousing tools. It
consits of dimensions and facts
Dimensions
are the objects of context, that is, things about
something is being spoken, and are the entry
points for getting at the facts
Facts
are numeric values that a business whishes to
count

Dimensional Models are designed for reading, summarizing and analyzing numeric
information, whereas Relational Models are optimized for adding and maintaining
data using real-time operational systems.

The goal of dimensional modelling can be summarized as
Faster Data Retrieval
Better Understandability
Extensibility

The four key decisions made during the design of a
dimensional model include:
Select the business process
Declare the grain
Identify the facts
Identify the dimensions

Select the business process
Business processes are the operational activities performed by your organization,
such as taking an order, processing an insurance claim, registering students for a
class, or snapshotting every account each month. Business process events generate
or capture performance metrics that translate into facts in a fact table. Most fact
tables focus on the results of a single business process. Choosing the process is
important because it deﬁnes a speciﬁc design target and allows the grain,
dimensions, and facts to be declared. Each business process corresponds to a row
in the enterprise data warehouse bus matrix

Declare the grain
Declaring the grain is the pivotal step in a dimensional design. The grain establishes
exactly what a single fact table row represents. The grain declaration becomes a
binding contract on the design. The grain must be declared before choosing
dimensions or facts because every candidate dimension or fact must be consistent
with the grain. This consistency enforces a uniformity on all dimensional designs that
is critical to Business Intelligence application performance and ease of use. Atomic
grain refers to the lowest level at which data is captured by a given business
process. You should start by focusing on atomic-grained data because it withstands
the assault of unpredictable user queries

Identify the dimensions
Dimensions provide the “who, what, where, when, why, and how” context surrounding
a business process event. Dimension tables contain the descriptive attributes used
by applications for filtering and grouping the facts. With the grain of a fact table
firmly in mind, all the possible dimensions can be identified. Whenever possible, a
dimension should be single valued when associated with a given fact row.
Dimension tables are sometimes called the “soul” of the data warehouse because
they contain the entry points and descriptive labels that enable the Data
Warehouse/Business Intelligence system to be leveraged for business analysis. A
disproportionate amount of effort is put into the data governance and development
of dimension tables because they are the drivers of the user’s BI experience.

Identify the facts
Facts are the measurements that result from a business process event and are
almost always numeric. A single fact table row has a one-to-one relationship to a
measurement event as described by the fact table’s grain. Thus a fact table
corresponds to a physical observable event, and not to the demands of a particular
report. Within a fact table, only facts consistent with the declared grain are allowed.
For example, in a retail sales transaction, the quantity of a product sold and its
extended price are good facts, whereas the store manager’s salary is disallowed.
A Fact Table consists of one or more numeric facts of importance to a business. At
the lowest grain, a fact table row corresponds to a measurement event and vice
versa. In addition to numeric measures, a fact table always contains foreign keys for
each of its associated dimensions. Details about the dimensions (e.g customer name,
customer address) are stored in the dimension table (i.e. customer).

Identify dimensions
In the above statement, we are speaking about 3 different things - we
are speaking about food, some specific McDonald’s store and some
specific day. Burgers and fries are two different members of food
dimension. We will have to create separate tables for separate
dimensions
The business objective is to create a data model that can store, and report
number of burgers and fries sold from a specific McDonald’s outlet per day

Identify facts
In the above statement, the number of burgers/fries sold is a measure.
Examples of facts are the number of products sold, the value of products sold, the number
of products produced and the number of service calls received
Businesses have a need to monitor these facts closely and to sum them using different
dimensions:
1. The value of products sold this quarter versus last quarter
2. The value of products sold by store
3. The value of products sold by channel (e.g. delivery apps, in-store shopping, etc.)
Businesses will often need to sum facts by multiple dimensions:
1. The value of products sold store, by product type and by day of week
2. The value of products sold by product and by channel
The business objective is to create a data model that can store, and report number
of burgers and fries sold from a specific McDonald’s outlet per day

Identify the attributes or properties of dimensions
We need to know what are the properties or attributes of each dimension that we need to
store in our table. This is important since knowing the properties let us decide what
columns are required to be created in each dimension table.
Each dimension might have number of different properties, but for a given context, not all
of them are relevant for us. As an example, let's take the dimension food. We can think of
so many different attributes of food - e.g. names of the food, price of the food, total
calories in the food, colour of the food and so on. But we need to check which of these
attributes are relevant to us - that is - which of these attributes are required for reporting
on this data. As for the given statement above, we just need to know only one attribute of
the food dimension - i.e. name of the food (burger or fries)
The business objective is to create a data model that can store, and report number
of burgers and fries sold from a specific McDonald’s outlet per day

Approaches to Dimensional Model
Star Schema
Multiple fact tables share dimension tables, viewed as a collection
of stars, therefore called galaxy schema or fact constellation
Snowflake Schema
Facts Constellation
A fact table in the middle connected to a set of dimension tables
A refinement of star schema where some dimensional hierarchy is
normalized into a set of smaller dimension tables, forming a shape
similar to snowflake

UML, ER and Dimensional Modelling

More Related Content

What's hot (19)

Similar to UML, ER and Dimensional Modelling (20)

More from Stefano Dalla Palma (11)

Recently uploaded (20)

UML, ER and Dimensional Modelling