Unit - 2: Entity Relationship Diagram - ER Diagram in DBMS
Unit - 2: Entity Relationship Diagram - ER Diagram in DBMS
An Entity–relationship model (ER model) describes the structure of a database with the help
of a diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER model
is a design or blueprint of a database that can later be implemented as a database. The main
components of E-R model are: entity set and relationship set.
An ER diagram shows the relationship among entity sets. An entity set is a group of similar
entities and these entities can have attributes. In terms of DBMS, an entity is a table or attribute
of a table in database, so by showing relationship among tables and their attributes, ER diagram
shows the complete logical structure of a database. Lets have a look at a simple ER diagram to
understand this concept.
A simple ER Diagram:
In the following diagram we have two entities Student and College and their relationship. The
relationship between Student and College is many to one as a college can have many students
however a student cannot study in multiple colleges at the same time. Student entity has
attributes such as Stu_Id, Stu_Name & Stu_Addr and College entity has attributes such as
Col_ID & Col_Name.
Here are the geometric shapes and their meaning in an E-R Diagram. We will discuss these
terms in detail in the next section (Components of a ER Diagram) of this guide so don’t worry
too much about these terms now, just go through them once.
Components of a ER Diagram
As shown in the above diagram, an ER diagram has three main components:
1. Entity
2. Attribute
3. Relationship
1. Entity
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the relationship
with other entity is called weak entity. The weak entity is represented by a double rectangle.
For example – a bank account cannot be uniquely identified without knowing the bank to which
the account belongs, so bank account is a weak entity.
Weak Entities
A weak entity is a type of entity which doesn't have its key attribute. It can be identified
uniquely by considering the primary key of another entity. For that, weak entity sets need to
have participation.
In above example, "Trans No" is a discriminator within a group of transactions in an ATM.Let's
learn more about a weak entity by comparing it with a Strong Entity
Strong entity set always has a primary key. It does not have enough attributes to build a
primary key.
It contains a Primary key represented by the It contains a Partial Key which is represented
underline symbol. by a dashed underline symbol.
The member of a strong entity set is called The member of a weak entity set called as a
as dominant entity set. subordinate entity set.
Primary Key is one of its attributes which In a weak entity set, it is a combination of
helps to identify its member. primary key and partial key of the strong
entity set.
In the ER diagram the relationship between The relationship between one strong and a
two strong entity set shown by using a weak entity set shown by using the double
diamond symbol. diamond symbol.
The connecting line of the strong entity set The line connecting the weak entity set for
with the relationship is single. identifying relationship is double.
2. Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in an ER
diagram. There are four types of attributes:
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute
1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student roll
number can uniquely identify a student from a set of students. Key attribute is represented by
oval same as other attributes however the text of key attribute is underlined.
3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is represented
with double ovals in an ER Diagram. For example – A person can have more than one phone
numbers so the phone number attribute is multivalued.
4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute. It is
represented by dashed oval in an ER Diagram. For example – Person age is a derived attribute
as it changes over time and can be derived from another attribute (Date of birth).
E-R diagram with multivalued and derived attributes:
3. Relationship
Cardinality: Defines the numerical attributes of the relationship between two entities or entity
sets.
When a single instance of an entity is associated with a single instance of another entity then it
is called one to one relationship. For example, a person has only one passport and a passport is
given to one person.
When a single instance of an entity is associated with more than one instances of another entity
then it is called one to many relationship. For example – a customer can place many orders but
a order cannot be placed by many customers.
3. Many to One Relationship
When more than one instances of an entity is associated with a single instance of another entity
then it is called many to one relationship. For example – many students can study in a single
college but a student cannot study in many colleges at the same time.
When more than one instances of an entity is associated with more than one instances of another
entity then it is called many to many relationship. For example, a can be assigned to many
projects and a project can be assigned to many students.
A Total participation of an entity set represents that each entity in entity set must have at least
one relationship in a relationship set. For example: In the below diagram each college must
have at-least one associated Student.
Steps to Create an ERD (E-R Digram)
Following are the steps to create an ERD.
DBMS Generalization
Generalization is a process in which the common attributes of more than one entities form a
new entity. This newly formed entity is called generalized entity.
Generalization Example
Lets say we have two entities Student and Teacher.
Attributes of Entity Student are: Name, Address & Grade
Attributes of Entity Teacher are: Name, Address & Salary
These two entities have two common attributes: Name and Address, we can make a generalized
entity with these common attributes. Lets have a look at the ER model after generalization.
The ER diagram after generalization:
We have created a new generalized entity Person and this entity has the common attributes of
both the entities. As you can see in the following ER diagram that after the generalization
process the entities Student and Teacher only has the specialized attributes Grade and Salary
respectively and their common attributes (Name & Address) are now associated with a new
entity Person which is in the relationship with both the entities (Student & Teacher).
Note:
1. Generalization uses bottom-up approach where two or more lower level entities combine
together to form a higher level new entity.
2. The new generalized entity can further combine together with lower level entity to create a
further higher level generalized entity.
DBMS Specialization
Specialization is a process in which an entity is divided into sub-entities. You can think of it
as a reverse process of generalization, in generalization two entities combine together to form
a new higher level entity. Specialization is a top-down process.
The idea behind Specialization is to find the subsets of entities that have few distinguish
attributes. For example – Consider an entity employee which can be further classified as sub-
entities Technician, Engineer & Accountant because these sub entities have some distinguish
attributes.
Specialization Example
In this diagram, we can see that we have a higher level entity “Employee” which we have
divided in sub entities “Technician”, “Engineer” & “Accountant”. All of these are just an
employee of a company, however their role is completely different and they have few different
attributes. Just for the example, I have shown that Technician handles service requests,
Engineer works on a project and Accountant handles the credit & debit details. All of these
three employee types have few attributes common such as name & salary which we had left
associated with the parent entity “Employee” as shown in the above diagram.
DBMS Aggregration
Aggregation is a process in which a single entity alone is not able to make sense in a
relationship so the relationship of two entities acts as one entity. I know it sounds confusing
but don’t worry the example we will take, will clear all the doubts.
Aggregration Example
In real world, we know that a manager not only manages the employee working under them
but he has to manage the project as well. In such scenario if entity “Manager” makes a
“manages” relationship with either “Employee” or “Project” entity alone then it will not make
any sense because he has to manage both. In these cases the
relationship of two entities acts as one entity. In our example,
the relationship “Works-On” between “Employee” &
“Project” acts as one entity that has a relationship “Manages”
with the entity “Manager”.
Terminologies
STUDENT TABLE
97162717
1 RAM Haryana India 20
21
98982912
2 RAM Punjab India 19
81
78982919
3 SUJIT Rajasthan India 18
81
99852863
4 SURESH Punjab India 21
17
1 C1 DBMS
2 C2 Computer Networks
1 C2 Computer Networks
Candidate Key
Table STUDENT
The candidate key can be simple (having only one attribute) or composite as
well.
Example:
Table STUDENT_COURSE
STUD_NO TEACHER_NO COURSE_NO
1 001 C001
2 056 C005
Primary Key
There can be more than one candidate key in relation out of which one can
be chosen as the primary key. For Example, STUD_NO, as well as
STUD_PHONE, are candidate keys for relation STUDENT but STUD_NO
can be chosen as the primary key (only one out of many candidate keys).
It is a unique key.
It can identify only one tuple (a record) at a time.
It has no duplicate values, it has unique values.
It cannot be NULL.
Primary keys are not necessarily to be a single column; more than
one column can also be a primary key for a table.
Example:
STUDENT table -> Student(STUD_NO, SNAME,
ADDRESS, PHONE) , STUD_NO is a primary key
Table STUDENT
STUD_NO SNAME ADDRESS PHONE
Super Key
The set of attributes that can uniquely identify a tuple is known as Super
Key. For Example, STUD_NO, (STUD_NO, STUD_NAME), etc. A super key
is a group of single or multiple keys that identifies rows in a table. It supports
NULL values.
Adding zero or more attributes to the candidate key generates the
super key.
A candidate key is a super key but vice versa is not true.
Super Key values may also be NULL.
Example:
Alternate Key
The candidate key other than the primary key is called an alternate key.
All the keys which are not primary keys are called alternate keys.
It is a secondary key.
It contains two or more fields to identify two or more records.
These values are repeated.
Eg:- SNAME, and ADDRESS is Alternate keys
Example:
Consider the table shown above.
STUD_NO, as well as PHONE both,
are candidate keys for relation STUDENT but
PHONE will be an alternate key
(only one out of many candidate keys).
Foreign Key
If an attribute can only take the values which are present as values of some other
attribute, it will be a foreign key to the attribute to which it refers. The relation
which is being referenced is called referenced relation and the corresponding
attribute is called referenced attribute the relation which refers to the referenced
relation is called referencing relation and the corresponding attribute is called
referencing attribute. The referenced attribute of the referenced relation should be
the primary key to it.
It is a key it acts as a primary key in one table and it acts as
secondary key in another table.
It combines two or more relations (tables) at a time.
They act as a cross-reference between the tables.
For example, DNO is a primary key in the DEPT table and a non-key in
EMP
Example:
Refer Table STUDENT shown above.
STUD_NO in STUDENT_COURSE is a
foreign key to STUD_NO in STUDENT relation.
Table STUDENT_COURSE
1 005 C001
2 056 C005
It may be worth noting that, unlike the Primary Key of any given relation, Foreign
Key can be NULL as well as may contain duplicate tuples i.e. it need not follow
uniqueness constraint. For Example, STUD_NO in the STUDENT_COURSE
relation is not unique. It has been repeated for the first and third tuples. However, the
STUD_NO in STUDENT relation is a primary key and it needs to be always unique,
and it cannot be null.
Composite Key
Sometimes, a table might not have a single column/attribute that uniquely identifies
all the records of a table. To uniquely identify rows of a table, a combination of two
or more columns/attributes can be used. It still can give duplicate values in rare
cases. So, we need to find the optimal set of attributes that can uniquely identify
rows in a table.
It acts as a primary key if there is no primary key in a table
Two or more attributes are used together to make a composite key.
Different combinations of attributes may give different accuracy in terms of
identifying the rows uniquely.
Example:
FULLNAME + DOB can be combined
together to access the details of a student.
Conclusion
In conclusion, the relational model makes use of a number of keys: Candidate keys
allow for distinct identification, the Primary key serves as the chosen identifier,
Alternate keys offer other choices, and Foreign keys create vital linkages that
guarantee data integrity between tables. The creation of strong and effective
relational databases requires the thoughtful application of these keys.
Many-to-Many M:N Relationship in DBMS
This type of relationship exists when each of the records of the first
table can be associated with one or more records of the second table,
as well as a single record of the second table may be related to one or
more records of the first table. A Many-to-Many relationship is formed
by two one-to-many relationships that are connected by an ‘associate
table’ or ‘linking table.’ By having fields that are the primary keys of the
other two tables, the bridging table connects two tables. The following
example will help us comprehend this.
Example
When the entity types ‘Customer’ and ‘Product’ are combined, each
customer can purchase several products, and a product can be
purchased by multiple customers.
To grasp the concept of a linking table in this context, consider the ‘Order’
entity as a linking table that connects the ‘Customer’ and ‘Product’ entities.
This Many-to-Many relationship can be broken down into two one-to-many
partnerships. To begin with, each ‘Customer’ can have several ‘Orders,’
whereas each ‘Order’ is only associated with one ‘Customer.’ Second, each
‘Order’ is associated with only one Product, despite the fact that several
orders for the same Product may exist.
The concept of linking in the previous example can be understood by
considering all of the attributes of the entities ‘Order,’ ‘Customer,’ and
‘Product.’ The primary keys of both the ‘Product’ and ‘Customer’ entities are
included in the connecting table, i.e. the ‘Order’ table, as can be seen in the
above example. When referring to the respective table from the ‘Order’ table,
these keys operate as foreign keys.
Relational Algebra
p is used as a propositional logic formula which may use connectors like: AND OR
and NOT. These relational can use as relational operators like =, ≠, ≥, <, >, ≤.
Output:
BRANCH_NAME LOAN_NO AM
OUNT
2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the
result. Rest of the attributes are eliminated from the table.
o It is denoted by ∏.
1. Notation: ∏ A1, A2, An (r)
Where
A1, A2, A3 is used as an attribute name of relation r.
Input:
Output:
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
3 Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the
tuples that are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
1. Notation: R ∪ S
A union operation must hold the following condition:
o R and S must have the attribute of the same number.
o Duplicate tuples are eliminated automatically.
Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOU
NT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_N
O
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
1. ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSI
TOR)
Output:
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation
contains all tuples that are in both R & S.
o It is denoted by intersection ∩.
1. Notation: R ∩ S
Example: Using the above DEPOSITOR table and BORROW table
Input:
1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSI
TOR)
Output:
CUSTOMER_NAME
Smith
Jones
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation
contains all tuples that are in R but not in S.
o It is denoted by intersection minus (-).
1. Notation: R - S
Input:
1. ∏ CUSTOMER_NAME (BORROW) -
∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Jackson
Hayes
Willians
Curry
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each
row in the other table. It is also known as a cross product.
o It is denoted by X.
1. Notation: E X D
Example:
EMPLOYEE
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Input:
1. EMPLOYEE X DEPARTMENT
Output:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to
STUDENT1.
1. ρ(STUDENT1, STUDENT)
Join Operations:
A Join operation combines related tuples from different relations, if and only if a
given join condition is satisfied. It is denoted by ⋈.
Example:
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
Result:
In models like Entity-Relationship models, we did not have such features. Database
Constraints can be categorized into 3 main categories:
1. Constraints that are applied in the data model are called Implicit
Constraints.
2. Constraints that are directly applied in the schemas of the data model, by
specifying them in the DDL(Data Definition Language). These are
called Schema-Based Constraints or Explicit Constraints.
3. Constraints that cannot be directly applied in the schemas of the data model.
We call these Application-based or Semantic Constraints.
Relational Constraints
These are the restrictions or sets of rules imposed on the database contents. It
validates the quality of the database. It validates the various operations like data
insertion, updation, and other processes that have to be performed without affecting
the integrity of the data. It protects us against threats/damages to the database.
Mainly Constraints on the relational database are of 4 types
Domain constraints
Key constraints or Uniqueness Constraints
Entity Integrity constraints
Referential integrity constraints
1. Domain Constraints
Every domain must contain atomic values(smallest indivisible units) which
means composite and multi-valued attributes are not allowed.
We perform a datatype check here, which means when we assign a data type
to a column we limit the values that it can contain. Eg. If we assign the
datatype of attribute age as int, we can’t give it values other than int
datatype.
Example:
123456789
01 Bikash Dutta
234456678
Explanation: In the above relation, Name is a composite attribute and Phone is a
multi-values attribute, so it is violating domain constraint.
These are called uniqueness constraints since it ensures that every tuple in
the relation should be unique.
A relation can have multiple keys or candidate keys(minimal superkey), out
of which we choose one of the keys as the primary key, we don’t have any
restriction on choosing the primary key out of candidate keys, but it is
suggested to go with the candidate key with less number of attributes.
Null values are not allowed in the primary key, hence Not Null constraint is
also part of the key constraint.
Example:
EID Name Phone
01 Bikash 6000000009
02 Paul 9000090009
01 Tuhin 9234567892
Explanation: In the above table, EID is the primary key, and the first and the last
tuple have the same value in EID ie 01, so it is violating the key constraint.
01 Bikash 9000900099
02 Paul 600000009
Explanation: In the above relation, EID is made the primary key, and the primary
key can’t take NULL values but in the third tuple, the primary key is null, so it is
violating Entity Integrity constraints.
3. Referential Integrity Constraints
01 Divine 12
02 Dino 22
04 Vivian 14
DNO Place
12 Jaipur
13 Mumbai
14 Delhi
Explanation: In the above tables, the DNO of Table 1 is the foreign key, and DNO
in Table 2 is the primary key. DNO = 22 in the foreign key of Table 1 is not allowed
because DNO = 22 is not defined in the primary key of table 2. Therefore,
Referential integrity constraints are violated here.
Conclusion
Relational database constraints are rules in a database model that help maintain the
integrity and consistency of data. These rules include primary key constraints,
unique constraints, foreign key constraints, check constraints, default constraints, not
null constraints, multi-column constraints, etc. Relational database constraints help
keep data accurate, maintain relationships, and avoid the insertion of wrong or
inconsistent data.