Unit 1 DBMS
Unit 1 DBMS
The second mode is to support Data Analytics, that is, the processing of data to draw conclusions, and
infer rules or decision procedures, which are then used to drive business decisions.
History of Database
Techniques for data storage and processing have evolved over the years:
1950s and early 1960s: Magnetic tapes were developed for data storage. Data processing tasks such as
payroll were automated, with data stored on tapes. Processing of data consisted of reading data from one
or more tapes and writing data to a new tape.
Late 1960s and early 1970s: Widespread use of hard disks in the late 1960s changed the scenario for
data processing greatly, since hard disks allowed direct access to data. The position of data on disk was
immaterial, since any location on disk could be accessed in just tens of milliseconds. Widespread use of
hard disks in the late 1960s changed the scenario for data processing greatly, since hard disks allowed
direct access to data. The position of data on disk was immaterial, since any location on disk could be
accessed in just tens of milliseconds. Data were thus freed from the tyranny of sequentiality. With the
advent of disks, the network and hierarchical data models were developed, which allowed data structures
such as lists and trees to be stored on disk. Programmers could construct and manipulate these data
structures. A landmark paper by Edgar Codd in 1970 defined the relational model and nonprocedural
ways of querying data in the relational model, and relational databases were born.
Late 1970s and 1980s: By the early 1980s, relational databases had become competitive with network
and hierarchical database systems even in the area of performance. Relational databases were so easy to
use that they eventually replaced network and hierarchical databases. The 1980s also saw much research
on parallel and distributed databases, as well as initial work on object-oriented databases.
1990s: The SQL language was designed primarily for decision support applications, which are query-
intensive, yet the mainstay of databases in the 1980s was transaction processing applications, which are
update-intensive.
2000s: The types of data stored in database systems evolved rapidly during this period. Semi-structured
data became increasingly important. XML emerged as a data-exchange standard. JSON, a more compact
data-exchange format well suited for storing objects from JavaScript or other programming languages
subsequently grew increasingly important.
2010s: The limitations of No SQL systems, such as lack of support for consistency, and lack of support
for declarative querying, were found acceptable by many applications (e.g., social networks), in return
for the benefits they provided such as scalability and availability.
File based systems were an early attempt to computerize the manual system. File system is a method of
organising the files in a hard disk or other medium of storage. A file system is a software that manages the data
files in a computer system. File system arranges the files and helps in retrieving the files, when required. It is
compatible with different file types, such as mp3, doc, txt, mp4,etc. and these are also grouped into directories.
The file system is a collection of data and for any management with it, the user has to write the procedures. It is
also called a traditional based approach in which a decentralized approach was followed where each department
stored and controlled its own data with the help of a data processing specialist. The main role of a data
processing specialist was to create the necessary computer file structures, and also manage the data within
structures and design some application programs that create reports based on file data.
In the above figure, consider an example of a student's file system. The student file will contain information
regarding the student (i.e. roll no, student name, course etc.). Similarly, we have a subject file that contains
information about the subject and the result file which contains the information regarding the result.
Some fields are duplicated in more than one file, which leads to data redundancy. So to overcome this problem,
we need to create a centralized system, i.e. DBMS approach.
DBMS
A database approach is a well-organized collection of data that are related in a meaningful way which can be
accessed by different users but stored only once(centralized) in a system. The user need not write the procedures
for handling the database. The various operations performed by the DBMS system are: Insertion, deletion,
selection, sorting etc.
Data Sharing is the primary advantage of Database management systems. DBMS system allows users and
applications to share Data with multiple applications and users.
4. Data Concurrency
DBMS allows multiple users to access and modify the same set of data at the same time, and reflect these
changes in real-time. The DBMS executes the actions of the program in such a way that the concurrent access is
permitted but the conflicting operations are not permitted to proceed concurrently.
Another aspect is that the DBMS allows multiple views for a single database schema i.e. offering different
interfaces for the same data according to user capabilities. DBMS provides the various concurrency control
protocols for ensuring the atomicity and serializability of the concurrent data access.
Since Database Systems lets multiple users to access the same data from different locations at the same time,
the working speed on the database is increased.
5. Fast data Access
While in traditional file-based approach, it might take hours to look for very specific information that might be
needed in the context of some business emergency, while DBMS reduces this time to a few seconds. This is a
great advantage of DBMS because we can write small queries which will search the database for us and it will
retrieve the information in the fastest way possible due to its inbuilt searching operations.
6. Data Backup and Recovery
This is another advantage of DBMS as it provides a strong framework for Data backup, users are not required to
back up their data periodically and manually, it is automatically taken care by DBMS. Moreover, in case of a
server crash, DBMS restores the Database to its previous condition.
7. Data Integrity
Data integrity means that the data contained in the database is both accurate and consistent. It is essential as
there are multiple databases in DBMS. All these databases contain data which is visible to multiple users.
Therefore, DBMS ensures that data is consistent and correct in all databases for all users. Therefore, data values
being entered for the storage could be checked to ensure that they fall within a specified range and are of the
correct format.
8. Data Security
DBMS systems provide a strong framework to protect data privacy and security. DBMS ensures that only
authorized users have access to data and there is a mechanism to define access privileges. The DBA who has the
ultimate responsibility for the data in the DBMS can ensure that proper access procedures are followed
including proper authentication schemas for access to the database system and additional check before
permitting access to sensitive data.
9. Data Atomicity
DBMS ensures Atomicity, i.e. complete transaction should be performed on the database. If any transaction is
partially completed, then it rolls backs them.
For e.g.: If we make an online purchase, money is deducted from our account while if the purchase is somehow
failed, then no money is deducted or if it gets deducted, it gets returned within few days.
10. Conflict Resolution
DBA resolve the conflicting requirements of various user and applications. The DBA chooses the best file
structure (storage) and access(time) method to get optional performance for the response-critical applications,
while permitting less critical applications to continue to use the database with a relatively slower response.
11. Data Independence
Data independence is usually considered from two points of views; Physical Data independence and Logical
Data Independence.
a) Physical Data Independence: It allows changes in the physical storage devices or organization of the files
to be made without requiring changes in the conceptual view or any of the external views and hence in the
application programs using the database. Thus, the files may migrate from one type of physical media to
another or the file structure may change without any need for changes in the application program.
Physical data independence refers to the immunity of the internal model to change in the physical model. The
logical schema stays unchanged even though changes are made to file organization or storage structures, storage
devices or indexing strategy. Physical data independence deals with hiding the details of the storage structure
from user applications. The applications should not be involved with these issues, since there is no difference in
the operation carried out against the data.
b) Logical Data Independence: It implies that application programs need not be changed, if fields are added to
an existing record; nor do they have to be changed if fields not used by applications programs are deleted.
Logical data independence indicates that the conceptual schema can be changed without affecting the existing
external schemas.
A logical schema is a conceptual design of the database done on paper or a whiteboard, much like architectural
drawings for a house. The ability to change the logical schema, without changing the external schema or user
view, is called Logical Data Independence. For example, the addition or removal of new entities, attributes or
relationships to this conceptual schema should be possible without having to change existing external schemas
or rewrite existing application programs. In other words, changes to the logical schema (e.g., alterations to the
structure of the database like adding a column or other tables) should not affect the function of the application
(external views).
Data Independence is an advantage with DBMS environment since it allows for changes at one level of the
database without affecting other levels.
Disadvantages of DBMS
1. Cost of software/hardware and migration: DBMS software and hardware (networking installation) cost is
high. In addition to the cost of purchasing or developing the software, the hardware has to be upgraded to allow
for the extensive programs and work spaces required for their execution and storage. An additional cost is that
of migration from a traditionally separate application environment to an integrated one.
2. The processing overhead by the DBMS for implementation of security, integrity and sharing of the data.
3. Problem associated with centralization-While centralization reduces duplication, the lack of duplication
requires that the database be adequately backed up so that in the case of failure the data can be recovered.
Centralization also means that the data is accessible from a single source. This increases the potential severity of
security breaches and disruption of the operation of the organization because of downtimes and failures. The
replacement of a monolithic centralized database by a federation of independent and cooperating distributed
databases resolves some of the problems resulting from failures and downtimes.
4. Complexity of Backup and Recovery: Backup and recovery operations are fairly complex in a DBMS
environment , and this is exacerbated in a concurrent multi user database system.
5. Setup of the database system requires more knowledge, money, skills, and time.
View Level: The highest level of abstraction describes only part of the entire database. Even though the
logical level uses simpler structures, complexity remains because of the variety of information stored in
a large database. Many users of the database system do not need all this information; instead, they need
to access only a part of the database. The view level of abstraction exists to simplify their interaction
with the system. The system may provide many views for the same database.
At the logical level, each such record is described by a type definition. The interrelationship of these record
types is also defined at the logical level; a requirement that the dept name value of an instructor record must
appear in the department table is an example of such an interrelationship.
Finally, at the view level, computer users see a set of application programs that hide details of the data types. At
the view level, several views of the database are defined, and a database user sees some or all of these views. In
addition to hiding details of the logical level of the database, the views also provide a security mechanism to
prevent users from accessing certain parts of the database. For example, clerks in the university registrar office
can see only that part of the database that has information about students; they cannot access information about
salaries of instructors.
Instances and Schemas (Storage of Data)
The collection of information stored in the database at a particular moment is called an instance of the database.
The overall design of the database is called the database schema.
A database schema is related to the variable declarations (along with associated type definitions) in a program.
Each variable has a particular value at a given instant. The values of the variables in a program at a point in time
correspond to an instance of a database schema.
Databse can have 3 schemas :
Physical schema : It describes the database design at the physical level.
Logical schema: It describes the database design at the logical level.
Subschemas: It describe different views of the database.
Of these, the logical schema is by far the most important in terms of its effect on application programs, since
programmers construct applications by using the logical schema. The physical schema is hidden beneath the
logical schema and can usually be changed easily without affecting application programs. Application programs
are said to exhibit physical data independence if they do not depend on the physical schema and thus need not
be rewritten if the physical schema changes.
Database Languages
A database system provides a Data-Definition Language (DDL) to specify the database schema and a Data-
Manipulation Language (DML) to express database queries and updates. The data definition and data
manipulation languages are not two separate languages; instead they simply form parts of a single database
language, such as the widely used SQL language.
Data-Definition Language
We specify a database schema by a set of definitions expressed by a special language called a Data Definition
Language (DDL).It will define the schema for database. Also DDL is used to specify following additional
properties of the data.
a) Domain Constraints: A domain of possible values must be associated with every attribute (also called as
fields) (for example, integer types, character types, date/time types). Declaring an attribute to be of a particular
domain acts as a constraint on the values that it can take.
b) Referential Integrity: There are cases where we wish to ensure that a value that appears in one relation for a
given set of attributes also appears in a certain set of attributes in another relation (referential integrity). For
example, the department listed for each course must be one that actually exists. More precisely, the dept name
value in a course record must appear in the dept name attribute of some record of the department relation.
c) Assertions: An assertion is any condition that the database must always satisfy. Domain constraints and
referential-integrity constraints are special forms of assertions. However, there are many constraints that we
cannot express by using only these special forms. For example, “Every department must have at least three
courses offered every semester” must be expressed as an assertion.”
d) Authorization: We may want to differentiate among the users as far as the type of access they are permitted
on various data values in the database. These differentiations are expressed in terms of authorization, the most
common being: read authorization, which allows reading, but not modification, of data; insert authorization,
which allows insertion of new data, but not modification of existing data; update authorization, which allows
modification, but not deletion, of data; and delete authorization, which allows deletion of data. We may assign
the user all, none, or a combination of these types of authorization.
Example:
create table department
(dept name char (20),
building char (15),
budget numeric (12,2));
The DDL, just like any other programming language, gets as input some instructions (statements) and generates
some output. The output of the DDL is placed in the data dictionary, which contains metadata—that is, data
about data.
Data Manipulation Language
A Data-Manipulation Language (DML) is a language that enables users to access or manipulate data as
organized by the appropriate data model. The types of access are:
• Retrieval of information stored in the database
• Insertion of new information into the database
• Deletion of information from the database
• Modification of information stored in the database
There are basically two types:
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are needed
Prepared By- Charu Kavadia Page13
Database Management System Unit 1
Query
A query is a statement requesting the retrieval of information. The portion of a DML that involves information
retrieval is called a query language.
Example:
select instructor.name from instructor where instructor.dept name = 'History';
Retrieve name of instructors from instructor table where department is history.
Data Dictionary
We can define a data dictionary as a DBMS component that stores the definition of data characteristics and
relationships. Such “data about data” is known as metadata. The DBMS data dictionary provides the DBMS
with its self-describing characteristic.
For example, the data dictionary typically stores descriptions of all:
• Data elements that are defined in all tables of all databases. Specifically the data dictionary stores the name,
data types, display formats, internal storage formats, and validation rules. The data dictionary tells where an
element is used, by whom it is used and so on.
• Tables defined in all databases. For example, the data dictionary is likely to store the name of the table
creator, the date of creation access authorizations, the number of columns, and so on.
• Indexes defined for each database tables. For each index the DBMS stores at least the index name the
attributes used, the location, specific index characteristics and the creation date.
• Define databases: who created each database, the date of creation where the database is located, who the DBA
is and so on.
• End users and The Administrators of the data base
• Programs that access the database including screen formats, report formats application formats, SQL queries
and so on.
• Access authorization for all users of all databases.
• Relationships among data elements which elements are involved: whether the relationships are mandatory or
Prepared By- Charu Kavadia Page14
Database Management System Unit 1
• Storage structure and access-method definition. The DBA may specify some parameters pertaining to the
physical organization of the data and the indices to be created.
• Schema and physical-organization modification. The DBA carries out changes to the schema and physical
organization to reflect the changing needs of the organization, or to alter the physical organization to improve
performance.
• Granting of authorization for data access. By granting different types of authorization, the database
administrator can regulate which parts of the database various users can access. The authorization information is
kept in a special system structure that the database system consults whenever a user tries to access the data in
the system.
• Routine maintenance. Examples of the database administrator’s routine maintenance activities are:
Periodically backing up the database onto remote servers
Ensuring that enough free disk space is available for normal operations, and upgrading disk space as
required.
Monitoring jobs running on the database.
To scale up to even larger data volumes and even higher processing speeds, parallel databases are designed to
run on a cluster consisting of multiple machines. Further, distributed databases allow data storage and query
processing across multiple geographically separated machines.
Following figure shows the architecture of applications that use databases as their backend. Database
applications can be partitioned into two or three parts, as shown in figure.
Data Models
Underlying the structure of a database is the data model: a collection of conceptual tools for describing data,
data relationships, data semantics, and consistency constraints. It provides the conceptual tools for describing
the design of a database at each level of data abstraction. Based on it, it’s classified into following three types-
1) Conceptual Data Model: This Data Model defines WHAT the system contains. This model is typically
created by Business stakeholders and Data Architects. The purpose is to organize scope and define business
concepts and rules. This model is used in the requirement gathering process i.e., before the Database
Designers start making a particular database. Example- E-R Model
2) Logical Data Model: Defines HOW the system should be implemented regardless of the DBMS. This
model is typically created by Data Architects and Business Analysts. The purpose is to developed technical
map of rules and data structures.This data model allows us to focus primarily, on the design part of the
database. Example- Relational Model
Prepared By- Charu Kavadia Page18
Database Management System Unit 1
3) Physical Data Model: This Data Model describes HOW the system will be implemented using a specific
DBMS system. This model is typically created by DBA and developers. The purpose is actual
implementation of the database.Ultimately, all data in a database is stored physically on a secondary storage
device such as discs and tapes. This is stored in the form of files, records and certain other data structures. It
has all the information of the format in which the files are present and the structure of the databases,
presence of external data structures and their relation to each other.
Most common types of models-
1. Hierarchical Data Model -Hierarchical Model was the first DBMS model. This model organises the
data in the hierarchical tree structure. The hierarchy starts from the root which has root data and then
it expands in the form of a tree adding child node to the parent node. This model easily represents
some of the real-world relationships like sitemap of a website etc. Example: We can represent the
relationship between the shoes present on a shopping website in the following way:
2. Network Model- This model is an extension of the hierarchical model. It was the most popular model
before the relational model. This model is the same as the hierarchical model, the only difference is that
a record can have more than one parent. It replaces the hierarchical tree with a graph. Example: In the
example below we can see that node student has two parents i.e. CSE Department and Library. This was
earlier not possible in the hierarchical model.
3. Entity Relationship Model -The Entity–Relationship (E-R) Model is a high-level data model. This
model was designed by Peter Chen and published in 1976 papers. It is based on a perception of a real
world that consists of a collection of basic objects, called entities, and of relationships among these
objects. While formulating real-world scenario into the database model, the ER Model creates entity set,
relationship set, general attributes and constraints.
In the above diagram, the entities are Teacher and Department. The attributes of Teacher entity are
Teacher_Name, Teacher_id, Age, Salary, Mobile_Number. The attributes of entity Department entity
are Dept_id, Dept_name. The two entities are connected using the relationship. Here, each teacher
works for a department.
4. Relational Model - The Relational Model is a lower-level Model. It uses a collection of tables to
represent both data and the relationships among those data. This model was initially designed by Edgar
F. Codd, in 1969. Its conceptual simplicity has led to its widespread adoption; today a vast majority of
database products are based on the relational model. Designers often formulate database schema design
by first modelling data at a high level, using the E-R model, and then translating it into the relational
model. Example: Employee table.
Simple: This model is simpler as compared to the network and hierarchical model.
Scalable: This model can be easily scaled as we can add as many rows and columns we want.
Structural Independence: We can make changes in database structure without changing the way to access
the data. When we can make changes to the database structure without affecting the capability to DBMS
to access the data we can say that structural independence has been achieved.
Disadvantages of Relational Model
Hardware Overheads: For hiding the complexities and making things easier for the user this model
requires more powerful hardware computers and data storage devices.
Bad Design: As the relational model is very easy to design and use. So the users don't need to know how
the data is stored in order to access it. This ease of design can lead to the development of a poor database
which would slow down if the database grows.
But all these disadvantages are minor as compared to the advantages of the relational model. These problems
can be avoided with the help of proper implementation and organization.
5. Object Oriented Data Model–This is an extension of the ER model with notions of functions,
encapsulation, and object identity, as well. This model supports a rich type system that includes
structured and collection types. Thus, in 1980s, various database systems following the object-oriented
approach were developed. Here, the objects are nothing but the data carrying its properties.We can store
audio, video, images, etc in the database which was not possible in the relational model (although we
can store audio and video in relational database, it is advised not to store in the relational database). In
this model, two are more objects are connected through links. We use this link to relate one object to
other objects. Example- In following figure, we have two objects Employee and Department. All the
data and relationships of each object are contained as a single unit. The attributes like Name, Job_title of
the employee and the methods which will be performed by that object are stored as a single object. The
two objects are connected through a common attribute i.e the Department_id and the communication
between these two will be done with the help of this common id.
Relationship (E-R) Model is a high-level data model. This model was designed by Peter Chen and published in
1976 papers.
The entity-relationship data model perceives the real world as consisting of basic objects, called entities and
relationships among these objects. It was developed to facilitate data base design by allowing specification of an
enterprise schema which represents the overall logical structure of a data base.
Entity
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For example,
each person in a university is an entity. An entity has a set of properties, and the values for some set of
properties must uniquely identify an entity. For instance, a person may have a person id property whose value
uniquely identifies that person.
Entity Set
An entity set is a set of entities of the same type that share the same properties, or attributes.
Example:The set of all people who are instructors at a given university can be defined as the entity set
instructor. Similarly, the entity set student might represent the set of all students in the university.
Attribute
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of
an entity set. The designation of an attribute for an entity set expresses that the database stores similar
information concerning each entity in the entity set; however, each entity may have its own value for each
attribute. Possible attributes of the instructor entity set are ID, name, dept name, and salary. Possible attributes
of the course entity set are course id, title, dept name, and credits.
Each entity has a value for each of its attributes. For instance, a particular instructor entity may have the value
12121 for ID, the value Wu for name, the value Finance for dept name, and the value 90000 for salary.
The address can be defined as the composite attribute address with the attributes street, city, state, and postal
code. Composite attributes help us to group together related attributes, making the modelling cleaner. A
composite attribute may appear as a hierarchy. In the composite attribute address, its component attribute street
can be further divided into street number, street name, and apartment number. Following figure depicts these
examples of composite attributes for the student entity set.
Figure:Relationship set advisor (only some attributes of instructor and student are shown).
Relationship Sets
A relationship set is a set of relationships of the same type.
A relationship set is represented in an E-R diagram by a diamond, which is linked via lines to a number of
different entity sets (rectangles).
Example-
Here,
One student can enroll in any number (zero or more) of courses.
One course can be enrolled by any number (zero or more) of students.
2. Many-to-One cardinality (m:1)
By this cardinality constraint,
An entity in set A can be associated with at most one entity in set B.
An entity in set B can be associated with any number (zero or more) of entities in set A.
Symbol Used-
Example-
Here,
One student can enroll in at most one course.
One course can be enrolled by any number (zero or more) of students.
3. One-to-Many cardinality (1:n)
By this cardinality constraint,
An entity in set A can be associated with any number (zero or more) of entities in set B.
An entity in set B can be associated with at most one entity in set A.
Symbol Used-
Example-
Here,
One student can enroll in any number (zero or more) of courses.
One course can be enrolled by at most one student.
4. One-to-One cardinality (1:1 )
By this cardinality constraint,
An entity in set A can be associated with at most one entity in set B.
An entity in set B can be associated with at most one entity in set A.
Symbol Used-
Example-
Here,
One student can enroll in at most one course.
One course can be enrolled by at most one student.
E-R Diagram
An Entity–relationship model (E-R model) describes the structure of a database with the help of a diagram,
which is known as Entity Relationship Diagram (E-R Diagram).
Components of E-R Diagram:-
It provides a set of useful concepts that make it convenient for a developer to move from a base id set of
information to a detailed and description of information that can be easily implemented in a database
system.
It describes data as a collection of entities, relationships and attributes.
It is a GUI representation of the logical structure of a Database
Advantages of E-R Model
Simple: Conceptually ER Model is very easy to build. If we know the relationship between the attributes
and the entities we can easily build the ER Diagram for the model.
Effective Communication Tool: This model is used widely by the database designers for communicating
their ideas.
Easy Conversion to any Model: This model maps well to the relational model and can be easily
converted to relational model by converting the ER model to the table. This model can also be converted
to any other model like network model, hierarchical model etc.
Disadvantages of ER Model
No industry standard for notation: There is no industry standard for developing an ER model. So one
developer might use notations which are not understood by other developers.
Hidden information: Some information might be lost or hidden in the ER model. As it is a high-level
view so there are chances that some details of information might be hidden.
Example of E-R Diagram
E-R Diagram of Library Management System
street, and city attributes, and additionally a tot cred (total credit) attribute; employee is described by its ID,
name, street, and city attributes, and additionally a salary attribute. Attribute inheritance applies through
all tiers of lower-level entity sets; thus, instructor and secretary, which are subclasses of employee, inherit the
attributes ID, name, street, and city from person, in addition to inheriting salary from employee , as shown in
following figure.
employment, bank employees are assigned to one of four work teams. We therefore represent the teams as four
lower-level entity sets of the higher-level employee entity set. A given employee is not assigned to a specific
team entity automatically on the basis of an explicit defining condition. Instead, the user in charge of this
decision makes the team assignment on an individual basis. The assignment is implemented by an operation that
adds an entity to an entity set.
B)A second type of constraint relates to whether or not entities may belong to more than one lower-level entity
set within a single generalization. The lower-level entity sets may be one of the following:
• Disjoint. A disjointness constraint requires that an entity belong to no more than one lower-level entity set. In
our example, an account entity can satisfy only one condition for the account-type attribute; an entity can be
either a savings account or a checking account, but cannot be both.
• Overlapping. In overlapping generalizations, the same entity may belong to more than one lower-level entity
set within a single generalization. For example, consider the employee work team example, and assume that
certain managers participate in more than one work team. A given employee may therefore appear in more than
one of the team entity sets that are lower-level entity sets of employee. Thus, the generalization is overlapping.
As another example, suppose generalization applied to entity sets customer and employee leads to a higher-level
entity set person. The generalization is overlapping if an employee can also be a customer.
Lower-level entity overlap is the default case; a disjointness constraint must be placed explicitly on a
generalization (or specialization). We can note a disjointedness constraint in an E-R diagram by adding the
word disjoint next to the triangle symbol.
C)A final constraint, the completeness constraint on a generalization or specialization, specifies whether or not
an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within the
generalization/specialization. This constraint may be one of the following:
• Total generalization or specialization. Each higher-level entity must belong to a lower-level entity set.
•Partial generalization or specialization. Some higher-level entities may not belong to any lower-level entity
set.
Partial generalization is the default. We can specify total generalization in an E-R diagram by using a double
line to connect the box representing the higher-level entity set to the triangle symbol. (This notation is similar to
the notation for total participation in a relationship.)
The account generalization is total: All account entities must be either a savings account or a checking account.
Because the higher-level entity set arrived at through generalization is generally composed of only those entities
in the lower-level entity sets, the completeness constraint for a generalized higher-level entity set is usually
total. When the generalization is partial, a higher-level entity is not constrained to appear in a lower-level entity
set. The work team entity sets illustrate a partial specialization. Since employees are assigned to a team only
after 3 months on the job, some employee entities may not be members of any of the lower-level team entity
sets.
4) Aggregation
There is a one limitation with E-R model that it cannot express .So aggregation is an abstraction through which
relationship is treated as higher level entities.
Aggregation is an abstraction in which relationship sets (along with their associated entity sets) are treated as
higher-level entity sets, and can participate in relationships.
For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he will never
enquiry about the Course only or just about the Center instead he will ask the enquiry about both.
Keys
Keys help us to identify any row of data in a table. In a real-world application, a table could contain thousands
of records. Moreover, the records could be duplicated. Keys ensure that we can uniquely identify a table record
despite these challenges.
Allows us to establish a relationship between and identify the relation between tables.
Help you to enforce identity and integrity in the relationship.
Types of Keys
1) Super Key- A super key is a set of one or more attributes (columns), which can uniquely identify a row in a
table.
Example-
The above table has following super keys. All of the following sets of super key are able to uniquely identify a
row of the employee table.
{Emp_SSN}
{Emp_Number}
{Emp_SSN, Emp_Number}
{Emp_SSN, Emp_Name}
{Emp_SSN, Emp_Number, Emp_Name}
{Emp_Number, Emp_Name}
All the attributes in a super key are definitely sufficient to identify each tuple uniquely in the given relation but
all of them may not be necessary.
2) Candidate Key- A candidate key is a minimal super key with no redundant attributes. The following two set
of super keys are chosen from the above sets as there are no redundant attributes in these sets.
{Emp_SSN}
{Emp_Number}
Only these two sets are candidate keys as all other sets are having redundant attributes that are not necessary for
unique identification.
All the attributes in a candidate key are sufficient as well as necessary to identify each tuple uniquely.
Removing any attribute from the candidate key fails in identifying each tuple uniquely.
The value of candidate key must always be unique.
The value of candidate key can never be NULL.
It is possible to have multiple candidate keys in a relation.
Those attributes which appears in some candidate key are called as prime attributes.
All the candidate keys are super keys, but all the super keys are not candidate keys.
Adding zero or more attributes to the candidate key generates the super key.
No, of candidate keys in a Relation are nC(floor(n/2)),for example if a Relation have 5 attributes i.e.
R(A,B,C,D,E) then total no of candidate keys are 5C(floor(5/2))=10.
3) Primary Key- The primary key is selected from one of the candidate keys and becomes the identifying key
of a table. It can uniquely identify any data row of the table. A primary key is a candidate key that the database
designer selects while designing the database.
In the above example, either {Emp_SSN} or {Emp_Number} can be chosen as a primary key for the table
Employee.
The value of primary key can never be NULL.
The value of primary key must always be unique.
The values of primary key can never be changed i.e. no updation is possible.
The value of primary key must be assigned when inserting a record.
A relation is allowed to have only one primary key.
Primary keys are not necessarily to be a single attribute (column). It can be a set of more than one attributes
(columns).
4) Alternate Key-Out of all Candidate Keys, only one gets selected as primary key, remaining keys are
Prepared By- Charu Kavadia Page33
Database Management System Unit 1
known as alternate or secondary keys. Or we can say, Candidate keys that are left unimplemented or unused
after implementing the primary key are called as Alternate Keys. Alternate Keys are also known as
Secondary Keys.
For example- In the above case, if {Emp_Number}is selected as primary key, then {Emp_SSN} is Alternate
Key.
5) Foreign Key-A Foreign Key is an attribute value in a table that acts as the primary key in another table.
Hence, the foreign key is useful in linking together two tables. Data should be entered in the foreign key column
with great care, as wrongly entered data can invalidate the relationship between the two tables. An attribute ‘X’
in a table is called as a foreign key to some other attribute ‘Y’ in another table, when its values are dependent on
the values of attribute ‘Y’. The attribute ‘X’ can assume only those values which are assumed by the attribute
‘Y’. Here, the relation in which attribute ‘Y’ is present is called as the referenced relation. And the attribute ‘Y’
is called referenced attribute. The relation in which attribute ‘X’ is present is called as the referencing relation
and the attribute ‘Y’ is called referencing attribute.
Example- STUD_NO in STUDENT_COURSE is a foreign key to STUD_NO in STUDENT relation shown
below.
None of these columns alone can play a role of key in this table. Column cust_Id alone cannot become a key as
a same customer can place multiple orders, thus the same customer can have multiple entries. Column
order_Id alone cannot be a primary key as a same order can contain the order of multiple products, thus same
order_Id can be present multiple times. Column product_code cannot be a primary key as more than one
customers can place order for the same product. Column product_count alone cannot be a primary key because
two orders can be placed for the same product count. Based on this, it is safe to assume that the key should be
having more than one attributes:
Key in above table: {cust_id, product_code}. This is a composite key as it is made up of more than one
attributes.
7) Partial Key- Partial key is a key using which all the records of the table can’t be identified uniquely.
However, a bunch of related tuples can be selected from the table using the partial key.
Example-
Here, using partial key Emp_no, we can’t identify a tuple uniquely but we can select a bunch of tuples from the
table.
8) Unique Key- Unique key is a key with the following properties-
It is unique for all the records of the table.
Once assigned, its value can’t be changed i.e. it is non-updatable.
It may have a NULL value.
Example- The best example of unique key is Adhaar Card Numbers.
The Adhaar Card Number is unique for all the citizens (tuples) of India (table).
If it gets lost and another duplicate copy is issued, then the duplicate copy always has the same number
as before.
Thus, it is non-updatable.
Few citizens may not have got their Adhaar cards, so for them its value is NULL.
Key Constraints
Constraints are nothing but the rules that are to be followed while entering data into columns of the database
table. Constraints ensure that data entered by the user into columns must be within the criteria specified by the
Condition.
For example, if we want to maintain only unique IDs in the employee table or if you want to enter only age
under 18 in the student table etc.
Types of Key Constraints
1) NOT NULL: It ensures that the specified column doesn’t contain a NULL value. Null represents a record
where data may be missing or data for that record may be optional. Once not null is applied to a particular
column, we cannot enter null values to that column and restricted to maintain only some proper value other than
null.
2) UNIQUE: It provides a unique / distinct values to specified columns. Sometimes we need to maintain only
unique data in the column of a database table, this is possible by using a unique constraint. Unique constraint
ensures that all values in a column are unique.
3) DEFAULT: It provides a default value to a column if none is specified. When a column is specified as
default with some value then all the rows will use the same value i.e. each and every time while entering the
data we need not enter that value .But default column value can be customized i.e. it can be overridden when
inserting a data for that row based on the requirement.
4) CHECK: Checks for the predefined conditions before inserting the data inside the table. Suppose in real-
time if we want to give access to an application only if the age entered by the user is greater than 18 this is
done at the back-end by using a check constraint. Check constraint ensures that the data entered by the user for
that column is within the range of values or possible values specified.
5) PRIMARY KEY: A primary key is a constraint in a table which uniquely identifies each row record in a
database table by enabling one or more the column in the table as primary key. Primary keys must contain
UNIQUE values, and cannot contain NULL values.
6) FOREIGN KEY: It ensures referential integrity of the relationship. The foreign key constraint is a column
or list of columns which points to the primary key column of another table .The main purpose of the foreign key
is only those values are allowed in the present table that will match to the primary key column of another table.
Participation Constraint
Participation Constraint specifies the existence of an entity when it is related to another entity in a relationship
type. It is also called minimum cardinality constraint. This constraint specifies the number of instances of an
entity that can participate in a relationship type. There are two types of Participation constraint –
1)Total Participation
Each entity in the entity set is involved in at least one relationship in a relationship set i.e. the number of
relationship in every entity is involved is greater than 0.
For Example-
2) Partial Participation
Each entity in entity set may or may not occur in at least one relationship in a relationship set.
For example:
The entity sets which do not have sufficient attributes to form a primary key are known as Weak Entity Sets
and the entity sets which have a primary key are known as Strong Entity sets.
As the weak entities do not have any primary key, they cannot be identified on their own, so they depend on
some other entity (known as owner entity). The weak entities have total participation constraint (existence
dependency) in its identifying relationship with owner identity. Weak entity types have partial keys. Partial
Keys are set of attributes with the help of which the tuples of the weak entities can be distinguished and
identified.
Weak entity always has total participation but Strong entity may not have total participation. Weak entity is
depended on strong entity to ensure the existence of weak entity. Like strong entity, weak entity does not have
any primary key, It has partial discriminator key. Weak entity is represented by double rectangle. The relation
between one strong and one weak entity is represented by double diamond.
Example
Conceptual Database Design with E-R Model (High Level Conceptual Data Model)
Following figure shows a simplified overview of the Database Design Process-
The conceptual schema is a concise description of the data requirements of the users and includes detailed
descriptions of the entity types, relationships, and constraints; these are expressed using the concepts provided
by the high-level data model. Because these concepts do not include implementation details, they are usually
easier to understand and can be used to communicate with nontechnical users. The high-level conceptual
schema can also be used as a reference to ensure that all users’ data requirements are met and that the
requirements do not conflict. This approach enables database designers to concentrate on specifying the
properties of the data, without being concerned with storage and implementation details, which makes it is
easier to create a good conceptual database design.
During or after the conceptual schema design, the basic data model operations can be used to specify the high-
level user queries and operations identified during functional analysis. This also serves to confirm that the
conceptual schema meets all the identified functional requirements. Modifications to the conceptual schema can
be introduced if some functional requirements cannot be specified using the initial schema.
3)The next step in database design is the actual implementation of the database, using a commercial DBMS.
Most current commercial DBMSs use an implementation data model—such as the relational (SQL) model—so
the conceptual schema is transformed from the high-level data model into the implementation data model.
This step is called logical design or data model mapping; its result is a database schema in the implementation
data model of the DBMS. Data model mapping is often automated or semi-automated within the database
design tools.
4) The last step is the physical design phase, during which the internal storage structures, file organizations,
indexes, access paths, and physical design parameters for the database files are specified. In parallel with these
activities, application programs are designed and implemented as database transactions corresponding to the
high-level transaction specifications.
Example-
Database Design for Banking Enterprise
1) Data Requirements
The initial specification of user requirements may be based on interviews with the database users, and on the
designer’s own analysis of the enterprise. The description that arises from this phase serves as the basis for
specifying the conceptual structure of the database. The major characteristics of the banking enterprise-
The bank is organized into branches. Each branch is located in a particular city and is identified by unique
name. The bank monitors the assets of each branch.
Bank customers are identified by their customer-id values. The bank stores each customer’s name, and the
street and city where the customer lives. Customers may have accounts and can take out loans. A customer
may be associated with a particular banker, who may act as a loan officer or personal banker for that customer.
Bank employees are identified by their employee-id values. The bank administration stores the name and
telephone number of each employee, the names of the employee’s dependents, and the employee-id number of
the employee’s manager. The bank also keeps track of the employee’s start date and, thus, length of
employment and the employee’s manager.
The bank offers two types of accounts—savings and checking accounts. Accounts can be held by more than
one customer, and a customer can have more than one account. Each account is assigned a unique account
number. The bank maintains a record of each account’s balance, and the most recent date on which the
account was accessed by each customer holding the account. In addition, each savings account has an interest
rate, and overdrafts are recorded for each checking account.
A loan originates at a particular branch and can be held by one or more customers. A loan is identified by a
unique loan number. For each loan, the bank keeps track of the loan amount and the loan payments. Although
a loan payment number does not uniquely identify a particular payment among those for all the bank’s loans, a
payment number does identify a particular payment for a specific loan. The date and amount are recorded for
each payment.
2) Entity Set Designation
Our specification of data requirements serves as the starting point for constructing a conceptual schema for the
database. From the characteristics listed in previous step, we begin to identify entity sets and their attributes:
The branch entity set, with attributes branch-name, branch-city, and assets.
The customer entity set, with attributes customer-id, customer-name, customer street; and customer-city.
A possible additional attribute is banker-name.
The employee entity set, with attributes employee-id, employee-name, telephone number, salary, and
manager. Additional descriptive features are the multivalued attribute dependent-name, the base
attribute start-date, and the derived attribute employment-length.
Two account entity sets—savings-account and checking-account—with the common attributes of
account-number and balance; in addition, savings-account has the attribute interest-rate and checking-
account has the attribute overdraft-amount.
The loan entity set, with the attributes loan-number, amount, and originating branch.
The weak entity set loan-payment, with attributes payment-number, payment date, and payment-amount.
3) Relationship Set Designation
Now we will specify the following relationship sets and mapping cardinalities. In the process, we also refine
some of the decisions we made earlier regarding attributes of entity sets.
borrower, a many-to-many relationship set between customer and loan.
loan-branch, a many-to-one relationship set that indicates in which branch a loan originated. Note that
this relationship set replaces the attribute originating branch of the entity set loan.
loan-payment, a one-to-many relationship from loan to payment, which documents that a payment is
made on a loan.
depositor, with relationship attribute access-date, a many-to-many relationship set between customer and
account, indicating that a customer owns an account.
cust-banker, with relationship attribute type, a many-to-one relationship set expressing that a customer
can be advised by a bank employee, and that a bank employee can advise one or more customers. Note
that this relationship set has replaced the attribute banker-name of the entity set customer.
works-for, a relationship set between employee entities with role indicators manager and worker; the
mapping cardinalities express that an employee works for only one manager and that a manager
supervises one or more employees. Note that this relationship set has replaced the manager attribute of
employee.
3) E-R Diagram
Drawing on the discussions in previous section, we now present the completed E-R diagram for our example
banking enterprise. Following figure depicts the full representation of a conceptual model of a bank, expressed
in terms of E-R concepts. The diagram includes the entity sets, attributes, relationship sets, and mapping
cardinalities arrived at through the design processes of Sections 1 and 2, and refined in Section 3.
Example-
In the above diagram, Lecturer, Course, Student are entities. They are also called strong entities as they do not
depend on other entities. The Lecturer entity has attributes id, name, and specialty. The Course entity has the
attributes course_id and course name. The Student entity has the id and name attribute. The Exam entity
depends on the Course Entity. Therefore, Exam is a weak entity. It has the attributes name, date, starting_time
and duration.
In the above diagram, follows, conducts and has are relationships of cardinality n:m, 1:m, 1:1 respectively.
we can see we have three entities participating in a relationship so it is a ternary relationship. The degree of this
relation is 3.
Figure : Aggregation
In the diagram above, the relationship between Center and Course together, is acting as an Entity, which is in
relationship with another entity Visitor. Now in real world, if a Visitor or a Student visits a Coaching Center,
he/she will never enquire about the center only or just about the course, rather he/she will ask enquire about
both.
Ternary Relationship
A ternary relationship is a relationship of degree three. That is, a relationship that contains three participating
entities. Cardinalities for ternary relationships can take the form of 1:1:1, 1:1: M, 1: M: N or M: N: P. The
cardinality constraint of an entity in a ternary relationship is defined by a pair of two entity instances associated
with the other single entity instance. For example, in a ternary relationship R(X, Y, Z) of cardinality M: N: 1,
for each pair of (X, Y) there is only one instance of Z; for each pair of (X, Z) there are N instances of Y;
for each pair of (Y, Z) there are M instances of X. For example, note the relationships (and their consequences)
in the following Figure which are represented by the following business rules:
• A DOCTOR writes one or more PRESCRIPTIONs.