DBMS NOTES
DBMS NOTES
• Data independence refers characteristic of being able to modify the schema at one level of
the database system without altering the schema at the next higher level.
• Logical data independence refers characteristic of being able to change the conceptual
schema without having to change the external schema.
• Logical data independence is used to separate the external level from the conceptual view.
• If we do any changes in the conceptual view of the data, then the user view of the data
would not be affected.
• Physical data independence can be defined as the capacity to change the internal schema
without having to change the conceptual schema.
• If we do any changes in the storage size of the database system server, then the
Conceptual structure of the database will not be affected.
• Physical data independence is used to separate conceptual levels from the internal levels.
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to
be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The
value of the attribute must be available in the corresponding domain.
Example:
o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation and if the
primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.
Example:
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of
Table 2, then every value of the Foreign Key in Table 1 must be null or be available in Table 2.
Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.
Example:
Attributes in DBMS
DBMS
DBMS stands for Database Management System, which is a tool or software used for the creation,
deletion, or manipulation of the database.
Attributes
In DBMS, we have entities, and each entity contains some property about their behavior which is
also called the attribute. In relational databases, we have tables, and each column contains some
entity that has some attributes, so all the entries for that column should strictly follow the attribute
of the entity. Entities define the characteristic property of the attributes.
o Simple Attribute:
It is also known as atomic attributes. When an attribute cannot be divided further, then it is called a
simple attribute.
For example, in a student table, the branch attribute cannot be further divided. It is called a simple or
atomic attribute because it contains only a single value that cannot be broken further.
o Composite Attribute:
Composite attributes are those that are made up of the composition of more than one attribute.
When any attribute can be divided further into more sub-attributes, then that attribute is called a
composite attribute.
For example, in a student table, we have attributes of student names that can be further broken
down into first name, middle name, and last name. So the student name will be a composite
attribute.
Another example from a personal detail table would be the attribute of address. The address can be
divided into a street, area, district, and state.
o Single-valued Attribute:
Those attributes which can have exactly one value are known as single valued attributes. They
contain singular values, so more than one value is not allowed.
For example, the DOB of a student can be a single valued attribute. Another example is gender
because one person can have only one gender.
o Multi-valued Attribute:
Those attributes which can have more than one entry or which contain more than one value are
called multi valued attributes.
In the Entity Relationship (ER) diagram, we represent the multi valued attribute by double oval
representation.
For example, one person can have more than one phone number, so that it would be a multi valued
attribute. Another example is the hobbies of a person because one can have more than one hobby.
o Derived Attribute:
Derived attributes are also called stored attributes. When one attribute can be derived from the
other attribute, then it is called a derived attribute. We can do some calculations on normal
attributes and create derived attributes.
For example, the age of a student can be a derived attribute because we can get it by the DOB of the
student.
Another example can be of working experience, which can be obtained by the date of joining of an
employee.
o Complex Attribute:
If any attribute has the combining property of multi values and composite attributes, then it is called
a complex attribute. It means if one attribute is made up of more than one attribute and each
attribute can have more than one value, then it is called a complex attribute.
For example, if a person has more than one office and each office has an address made from a street
number and city. So the address is a composite attribute, and offices are multi valued attributes, So
combing them is called complex attributes.
o Key Attribute:
Those attributes which can be identified uniquely in the relational table are called key attributes.
o Department is a single valued attribute that can have only one value.
o Name is a composite attribute because it is made up of a first name and the last name as the
middle name attribute.
o Phone number is a multi-valued attribute because one employee can have more than one
phone number, which is represented by a double oval representation.
Keys:
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to
establish and identify relationships between tables.
For example, ID is used as a key in the Student table because it is unique for each student. In the
PERSON table, passport_number, license_number, SSN are keys since they are unique for each
person.
Types of keys:
1. Primary key
o It is the first key used to identify one and only one instance of an entity uniquely. An entity
can contain multiple keys, as we saw in the PERSON table. The key which is most suitable
from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary keys
since they are also unique.
o For each entity, the primary key selection is based on requirements and developers.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The
candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes,
like SSN, Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a
candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can
also be a key.
4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another table.
o Every employee works in a specific department in a company, and employee and department
are two different entities. So we can't store the department's information in the employee
table. That's why we link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the
EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple
in a relation. These attributes or combinations of the attributes are called the candidate keys. One
key is chosen as the primary key from these candidate keys, and the remaining candidate key, if it
exists, is termed the alternate key. In other words, the total number of the alternate keys is the total
number of candidate keys minus the primary key. The alternate key may or may not exist. If there is
only one candidate key in a relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate
keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No,
acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key
is also known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned multiple roles,
and an employee may work on multiple projects simultaneously. So the primary key will be
composed of all three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So these
attributes act as a composite key since the primary key comprises more than one attribute.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created
when a primary key is large and complex and has no relationship with many other relations. The data
values of the artificial keys are usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in
employee relations. So it would be better to add a new virtual attribute to identify each tuple in the
relation uniquely.
o It develops a conceptual design for the database. It also develops a very simple and easy to
design view of data.
For example, Suppose we design a school database. In this database, the student will be an entity
with attributes like address, name, id, age, etc. The address can be another entity with attributes like
city, street name, pin code, etc and there will be a relationship between them.
Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be represented
as rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be taken
as an entity.
a. Weak Entity
An entity that depends on another entity called a weak entity. The weak entity doesn't contain any
key attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a primary
key. The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute. The
composite attribute is represented by an ellipse, and those ellipses are connected with an ellipse.
c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued attribute.
The double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from another attribute like Date
of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to
represent the relationship.
a. One-to-One Relationship
ADVERTISEMENT
When only one instance of an entity is associated with the relationship, then it is known as one to
one relationship.
For example, A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity on the
right associates with the relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
ADVERTISEMENT
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an entity on
the right associates with the relationship then it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.
Normalization
A large database defined as a single relation may result in data duplication. This repetition of data
may result in:
o It isn't easy to maintain and update data as it would involve searching many records in
relation.
So to handle these problems, we should analyze and decompose the relations with redundant data
into smaller, simpler, and well-structured relations that are satisfy desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.
What is Normalization?
o Normalization is used to minimize the redundancy from a relation or set of relations. It is also
used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the
database grows. Normalization consists of a series of guidelines that helps to guide you in creating a
good database structure.
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a
relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data
results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value requires
multiple rows of data to be updated.
Normalization works through a series of stages called Normal forms. The normal forms apply to
individual relations. The relation is said to be in particular normal form if it satisfies constraints.
Normal Description
Form
Advantages of Normalization
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF,
5NF.
o Careless decomposition may lead to a bad database design, leading to serious problems.
A prime attribute is one of the attributes that make up the candidate key. In addition to
being called prime attributes, key attributes is another name for this type of attribute. It is
also present in all of the candidate keys.
A set of attributes that uniquely identify tuples in a table is known as a Candidate Key.
Candidate Key is a super key with no attributes that are repeated.
They are key attributes because they can be used to uniquely identify any of the table's
records.
Non-prime attributes are those attributes of the relationships that are not present in any of
the possible candidate keys of the relation.
They are also known as non-key attributes. A primary key is an attribute or group of
attributes used to uniquely identify any record in a table. The values of a primary key cannot
be duplicated.
Non-prime (non-key) attributes are those that are not the primary key attributes. They can
store a value an unlimited number of times. They are non-key attributes because they cannot
be used to uniquely identify any of the table's records.
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
For example:
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if
we know the Emp_Id, we can tell that employee name associated with it.
1. Emp_Id → Emp_Name
Example:
Example:
1. ID → Name,
2. Name → DOB
o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
1. EMP_ID → EMP_COUNTRY
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Canonical Cover
In database management systems (DBMS), a canonical cover is a set of functional dependencies that
is equivalent to a given set of functional dependencies but is minimal in terms of the number of
dependencies. The process of finding the canonical cover of a set of functional dependencies
involves three main steps:
Reduction: The first step is to reduce the original set of functional dependencies to an
equivalent set that has the same closure as the original set, but with fewer dependencies.
This is done by removing redundant dependencies and combining dependencies that have
common attributes on the left-hand side.
Elimination: The second step is to eliminate any extraneous attributes from the left-hand
side of the dependencies. An attribute is considered extraneous if it can be removed from
the left-hand side without changing the closure of the dependencies.
Minimization: The final step is to minimize the number of dependencies by removing any
dependencies that are implied by other dependencies in the set.
Illustrative Example
Consider a set of Functional dependencies: F = {A -> BC, B -> C, AB -> C}. Here are the steps to find
the canonical cover:
To find the canonical cover (also known as the minimal cover) for a set of functional dependencies
(FDs), we follow these steps:
Checking AB->C. First we will check if A is extraneous or B is extraneous as A and B comes on the left
side.
We can reach to C without using AB->C with the help of other functional dependency, therefore, we
will remove AB->C from here.
We will check each functional dependency, whether we can reach to them with out using them, we
can remove them as here A->C, we can reach to A->C with the help of A->B and B->C. Therefore, A-
>C is redundant and we can remove it.
Join Operations:
is satisfied. It is denoted by ⋈.
A Join operation combines related tuples from different relations, if and only if a given join condition
Example:
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
Result:
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT
o A natural join is the set of tuples of all combinations in R and S that are equal on their
common attribute names.
o It is denoted by ⋈.
Example: Let's use the above EMPLOYEE table and SALARY table:
Input:
Output:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with missing
information.
Example:
EMPLOYEE
ADVERTISEMENT
ADVERTISEMENT
FACT_WORKERS
Input:
1. (EMPLOYEE ⋈ FACT_WORKERS)
Output:
EMP_NAM STREET CITY BRANC SALAR
E H Y
o Left outer join contains the set of tuples of all combinations in R and S that are equal on their
common attribute names.
o It is denoted by ⟕.
Input:
1. EMPLOYEE ⟕ FACT_WORKERS
ADVERTISEMENT
ADVERTISEMENT
o Right outer join contains the set of tuples of all combinations in R and S that are equal on
their common attribute names.
o It is denoted by ⟖.
Input:
1. EMPLOYEE ⟖ FACT_WORKERS
Output:
ADVERTISEMENT
ADVERTISEMENT
o Full outer join is like a left or right join except that it contains all rows from both tables.
o In full outer join, tuples in R that have no matching tuples in S and tuples in S that have no
matching tuples in R in their common attribute name.
o It is denoted by ⟗.
Input:
1. EMPLOYEE ⟗ FACT_WORKERS
ADVERTISEMENT
ADVERTISEMENT
Output:
EMP_NAM STREET CITY BRANC SALAR
E H Y
3. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched data as per the
equality condition. The equi join uses the comparison operator(=).
Example:
CUSTOMER RELATION
CLASS_ID NAME
1 John
2 Harry
3 Jackson
PRODUCT
PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Noida
Input:
1. CUSTOMER ⋈ PRODUCT
ADVERTISEMENT
ADVERTISEMENT
Output:
1 John 1 Delhi
2 Harry 2 Mumbai
3 Harry 3 Noida