Self Unit 1
Self Unit 1
UNIT 1:
What is Management System?
A database-management system (DBMS) is a collection of interrelated data and a set of programs to
access those data. This is a collection of related data with an implicit meaning and hence is a
database. The collection of data, usually referred to as the database, contains information relevant
to an enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient. By data, we mean known facts that can be
recorded and that have implicit meaning.
The management system is important because without the existence of some kind of rules and
regulations it is not possible to maintain the database. We have to select the particular attributes
which should be included in a particular table; the common attributes to create relationship
between two tables; if a new record has to be inserted or deleted then which tables should have to
be handled etc. These issues must be resolved by having some kind of rules to follow in order to
maintain the integrity of the database.
Database systems are designed to manage large bodies of information. Management of data
involves both defining structures for storage of information and providing mechanisms for the
manipulation of information. In addition, the database system must ensure the safety of the
information stored, despite system crashes or attempts at unauthorized access. If data are to be
shared among several users, the system must avoid possible anomalous results. Because information
is so important in most organizations, computer scientists have developed a large body of concepts
and techniques for managing data. These concepts and technique form the focus of this book. This
Chapter briefly introduces the principles of database systems
It is a collection of programs used for managing data and simultaneously it supports different types
of users to create, manage, retrieve, update and store information.
Databases touch all aspects of our lives. Some of the major areas of application are as
follows:
1. Banking
2. Airlines
3. Universities
5. Human resources
Enterprise Information
◦ Accounting: For payments, receipts, account balances, assets and other accounting information.
◦ Human resources: For information about employees, salaries, payroll taxes, and benefits, and for
generation of paychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of items in
factories, inventories of items in warehouses and stores, and orders for items.
Online retailers: For sales data noted above plus online order tracking,generation of
recommendation lists, andmaintenance of online product evaluations.
◦ Credit card transactions: For purchases on credit cards and generation of monthly statements.
◦ Finance: For storing information about holdings, sales, and purchases of financial instruments such
as stocks and bonds; also for storing real-time market data to enable online trading by customers
and automated trading by the firm.
• Universities: For student information, course registrations, and grades (in addition to standard
enterprise information such as human resources and accounting).
• Airlines: For reservations and schedule information. Airlines were among the first to use databases
in a geographically distributed manner.
• Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
Database Languages
A database system provides a data-definition language to specify the database
schema and a data-manipulation language to express database queries and updates. In practice, the
data-definition and data-manipulation languages are nottwo separate languages; instead they simply
form parts of a single database language, such as the widely used SQL language.
Data-Manipulation Language
A data-manipulation language (DML) is a language that enables users to access or manipulate data
as organized by the appropriate data model. The types of access are:
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are
needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than are procedural DMLs. However, since a
user does not have to specify how to get the data, the database system has to figure out an efficient
means of accessing data. A query is a statement requesting the retrieval of information. The portion
of a DML that involves information retrieval is called a query language. Although technically
incorrect, it is common practice to use the terms query language and data manipulation language
synonymously.
The data values stored in the database must satisfy certain consistency constraints. For example,
suppose the university requires that the account balance of a department must never be negative.
The DDL provides facilities to specify such constraints. The database system checks these constraints
every time the database is updated. In general, a constraint can be an arbitrary predicate pertaining
to the database. However, arbitrary predicates may be costly to test. Thus, database systems
implement integrity constraints that can be tested with minimal overhead.
• Domain Constraints.
A domain of possible values must be associated with every attribute (for example, integer types,
character types, date/time types). Declaring an attribute to be of a particular domain acts as a
constraint on the values that it can take. Domain constraints are the most elementary form of
integrity constraint. They are tested easily by the system whenever a new data item is entered into
the database.
• Referential Integrity.
There are cases where we wish to ensure that a value that appears in one relation for a given set of
attributes also appears in a certain set of attributes in another relation (referential integrity). For
example, the department listed for each course must be one that actually exists. More precisely, the
dept name value in a course record must appear in the dept name attribute of some record of the
department relation. Database modifications can cause violations of referential integrity. When a
referential-integrity constraint is violated, the normal procedure is to reject the action that caused
the violation.
• Assertions.
An assertion is any condition that the database must always satisfy. Domain constraints and
referential-integrity constraints are special forms of assertions. However, there are many constraints
that we cannot express by using only these special forms. For example, “Every department must
have at least five courses offered every semester” must be expressed as an assertion. When an
assertion is created, the system tests it for validity. If the assertion is valid, then any future
modification to the database is allowed only if it does not cause that assertion to be violated.
• Authorization.
We may want to differentiate among the users as far as the type of access they are permitted
on various data values in the database. These differentiations are expressed in terms of
authorization, the most common being: read authorization, which allows reading, but not
modification, of data; insert authorization, which allows insertion of new data, but not modification
of existing data; update authorization, which allows modification, but not deletion, of data; and
delete authorization, which allows deletion of data. We may assign the user all, none, or a
combination of these types of authorization.
1)Query Processor:
It interprets the requests (queries) received from end user via an application program into
instructions. It also executes the user request which is received from the DML compiler.
--DML Compiler: It processes the DML statements into low level instruction (machine language), so
that they can be executed.
--DDL Interpreter: It processes the DDL statements into a set of table containing meta data (data
about data).
--Query Optimizer:
2)Storage Manager: It provides the interaction between low level data stored and application
program. It handle all the queries that are submitted to the system. It is also having an interaction
between file manager. Only low level data are stored and this low level data is actually a raw data
that are stored on the disk using file system provided by os.
It contains the following components –
--Authorization Manager: It ensures role-based access control, i.e,. checks whether the particular
person is privileged to perform the requested operation or not.
--Integrity Manager: It checks the integrity constraints when the database is modified.
--File Manager: It manages the file space and the data structure used to represent information in
the database.
--Buffer Manager: It is responsible for cache memory and the transfer of data between the
secondary storage and main memory.
3)Disk Storage:
It contains the following components –
--Data Dictionary: It contains the information about the structure of any database object. It is the
repository of information that governs the metadata.
Data Models
->Relational Model.
The relational model uses a collection of tables to represent both data and the relationships among
those data. Each table has multiple columns, and each column has a unique name. Tables are also
known as relations. The relational model is an example of a record-based model.
Record-based models are so named because the database is structured in fixed-format records of
several types. Each table contains records of a particular type. Each record type defines a fixed
number of fields, or attributes. The columns of the table correspond to the attributes of the record
type. The relational data model is the most widely used data model, and a vast majority of current
database systems are based on the relational model.
->Entity-Relationship Model.
The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and
relationships among these objects .An entity is a “thing” or “object” in the real world that is
distinguishable from other objects. The entity-relationship model is widely used in database design.
Historically, the network data model and the hierarchical data model preceded the relational data
model. These models were tied closely to the underlying implementation, and complicated the task
of modeling data. As a result they are used little now, except in old database code that is still in
service in some places.
What is ER Modeling?
A graphical technique for understanding and organizing the data independent of the actual database
implementationWe need to be familiar with the following terms to go further.
Entity instance
Entity instance is a particular member of the entity type. Example for entity instance : A particular
employee
--Regular Entity
An entity which has its own key attribute is a regular entity. Example for regular entity : Employee.
--Weak entity
An entity which depends on other entity for its existence and doesn't have any key attribute of its
own is a weak entity.Example for a weak entity : In a parent/child relationship, a parent is
considered as a strong entity and the childis a weak entity.
Attributes
Properties/characteristics which describe entities are called attributes.
Domain of Attributes
The set of possible values that an attribute can take is called the domain of the attribute. For
example, the attribute day may take any value from the set {Monday, Tuesday ... Friday}. Hence this
set can be termed as the domain of the attribute day
--Key attribute
The attribute (or combination of attributes) which is unique for every entity instance is called key
attribute. E.g the employee_id of an employee, pan-card-number of a person etc.If the key attribute
consists of two or more attributes in combination, it is called a composite key.
--Simple attribute
If an attribute cannot be divided into simpler components, it is a simple attribute. Example for simple
attribute : employee_id of an employee.
--Composite attribute
If an attribute can be split into components, it is called a composite attribute.Example for composite
attribute : Name of the employee which can be split into First_name, Middle_name, and Last_name.
If an attribute can take only a single value for each entity instance, it is a single valued attribute.
example for single valued attribute : age of a student. It can take only one value for a particular
student.
--Multi-valued Attributes
If an attribute can take more than one value for each entity instance, it is a multi-valued attribute.
Multi-valued example for multi valued attribute : telephone number of an employee, a particular
employee may have multiple telephone numbers.
An attribute which need to be stored permanently is a stored attribute .Example for stored
attribute : name of a student
--Derived Attribute
An attribute which can be calculated or derived based on other attributes is a derived
attribute.Example for derived attribute : age of employee which can be calculated from date of birth
and current date.
Relationship
Associations between entities are called relationships
Example : An employee works for an organization. Here "works for" is a relation between the
entities employee and organization.In ER modeling, notation for relationship is given below.
However in ER Modeling, To connect a weak Entity with others, you should use a weak relationship
notation as given below
--Degree of a Relationship
Degree of a relationship is the number of entity types involved. The n-ary relationship is the general
form for degree n. Special cases are unary, binary, and ternary ,where the degree is 1, 2, and 3,
respectively.
Example for ternary relationship : customer purchase item from a shop keeper
--Cardinality of a Relationship
Relationship cardinalities specify how many of each entity type is allowed. Relationships can have
four possible connectivities as given below.
The minimum and maximum values of this connectivity is called the cardinality of the relationship.
Keys
Keys play an important role in the relational database.It is used to uniquely identify any record or
row of data from the table. It is also used to establish and identify relationships between tables.
Types of keys:
1. Primary key
It is the first key used to identify one and only one instance of an entity uniquely. An entity can
contain multiple keys, as we saw in the PERSON table. The key which is most suitable from those lists
becomes a primary key.
In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary keys since
they are also unique.
For each entity, the primary key selection is based on requirements and developers
2. Candidate key
A candidate key is an attribute or set of attributes that can uniquely identify a tuple.Except for the
primary key, the remaining attributes are considered a candidate key. The candidate keys are as
strong as the primary key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a
candidate key.For example: In the above EMPLOYEE table, for(EMPLOEE_ID,EMPLOYEE_NAME), the
name of two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this
combination can also be a key.The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-
NAME), etc.
4. Foreign key
Foreign keys are the column of the table used to point to the primary key of another table.
Every employee works in a specific department in a company, and employee and department are
two different entities. So we can't store the department's information in the employee table. That's
why we link these two tables through the primary key of one table.We add the primary key of the
DEPARTMENT table, Department_Id, as a new attribute in the EMPLOYEE table.In the EMPLOYEE
table, Department_Id is the foreign key, and both the tables are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple
in a relation. These attributes or combinations of the attributes are called the candidate keys. One
key is chosen as the primary key from these candidate keys, and the remaining candidate key, if it
exists, is termed the alternate key. In other words, the total number of the alternate keys is the total
number of candidate keys minus the primary key. The alternate key may or may not exist. If there is
only one candidate key in a relation, it does not have an alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This
key is also known as Concatenated Key.For example, in employee relations, we assume that an
employee may be assigned multiple roles, and an employee may work on multiple projects
simultaneously. So the primary key will be composed of all three attributes, namely Emp_ID,
Emp_role, and Proj_ID in combination. So these attributes act as a composite key since the primary
key comprises more than one attribute.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created
when a primary key is large and complex and has no relationship with many other relations. The data
values of the artificial keys are usually numbered in a serial order.For example, the primary key,
which is composed of Emp_ID, Emp_role, and Proj_ID, is large in employee relations. So it would be
better to add a new virtual attribute to identify each tuple in the relation uniquely.
ER Design Issues
In the previous sections of the data modeling, we learned to design an ER diagram. We also discussed
different ways of defining entity sets and relationships among them. We also understood the various
designing shapes that represent a relationship, an entity, and its attributes. However, users often
mislead the concept of the elements and the design process of the ER diagram. Thus, it leads to a
complex structure of the ER diagram and certain issues that does not meet the characteristics of the
real-world enterprise model.Here, we will discuss the basic design issues of an ER database schema in the
following points:
The use of an entity set or attribute depends on the structure of the real-world enterprise that is being
modelled and the semantics associated with its attributes. It leads to a mistake when the user use the
primary key of an entity set as an attribute of another entity set. Instead, he should use the
relationship to do so. Also, the primary key attributes are implicit in the relationship set, but we
designate it in the relationship sets.
2) Use of Entity Set vs. Relationship 2Sets
It is difficult to examine if an object can be best expressed by an entity set or relationship set.
To understand and determine the right use, the user need to designate a relationship set for
describing an action that occurs in-between the entities. If there is a requirement of
representing the object as a relationship set, then its better not to mix it with the entity set.
Generally, the relationships described in the databases are binary relationships. However,
non-binary relationships can be represented by several binary relationships. For example, we
can create and represent a ternary relationship 'parent' that may relate to a child, his father,
as well as his mother. Such relationship can also be represented by two binary relationships
i.e, mother and father, that may relate to their child. Thus, it is possible to represent a non-
binary relationship by a set of distinct binary relationships .
The cardinality ratios can become an affective measure in the placement of the relationship
attributes. So, it is better to associate the attributes of one-to-one or one-to-many
relationship sets with any participating entity sets, instead of any relationship set. The
decision of placing the specified attribute as a relationship or entity attribute should possess
the charactestics of the real world enterprise that is being modelled.
The basic E-R concepts can model most database features, some aspects of a database may be more aptly
expressed by certain extensions to the basic E-R model. The extended E-R features are specialization,
generalization, higher- and lower-level entity sets, attribute inheritance, and aggregation .
1. Specialization
An entity set may include subgroupings of entities that are distinct in some way from other entities in the set. For
instance, a subset of entities within an entity set may have attributes that are not shared by all the entities in the
entity set. The E-R model provides a means for representing these distinctive entity groupings.
Consider an entity set person, with attributes name, street, and city. A person may be further classified as one of
the following:
• customer
• employee
Each of these person types is described by a set of attributes that includes all the attributes of entity set person
plus possibly additional attributes. For example, customer entities may be described further by the attribute
customer-id, whereas employee entities may be described further by the attributes employee-id and salary. The
process of designating subgroupings within an entity set is called specialization. The specialization of person
allows us to distinguish among persons according to whether they are employees or customers.
2.Generalization
The refinement from an initial entity set into successive levels of entity subgroupings represents a top-
down design process in which distinctions are made explicit. The design process may also proceed in a bottom-
up manner, in which multiple entity sets are synthesized into a higher-level entity set on the basis of common
features. The database designer may have first identified a customer entity set with the attributes name, street,
city, and customer-id, and an employee entity set with the attributes name, street, city, employee-id, and salary.
There are similarities between the customer entity set and the employee entity set in the sense that they have
several attributes in common. This commonality can be expressed by generalization, which is a containment
relationship that exists between a higher-level entity set and one or more lower-level entity sets. In our example,
person is the higher-level entity set and customer and employee are lower-level entity sets. Higher- and lower-
level entity sets also may be designated by the terms superclass and subclass, respectively. The person entity
set is the superclass of the customer and employee subclasses.
3.Attribute Inheritance
A crucial property of the higher- and lower-level entities created by specialization and generalization is attribute
inheritance. The attributes of the higher-level entity sets are said to be inherited by the lower-level entity sets. For
example, customer and employee inherit the attributes of person. Thus, customer is described by its name,
street, and city attributes, and additionally a customer-id attribute; employee is described by its name, street, and
city attributes, and additionally employee-id and salary attributes.
A lower-level entity set (or subclass) also inherits participation in the relationship sets in which its higher-level
entity (or superclass) participates. The officer, teller, and secretary entity sets can participate in the works-for
relationship set, since the superclass employee participates in the works-for relationship. Attribute inheritance
applies through all tiers of lower-level entity sets. The above entity sets can participate in any relationships in
which the person entity set participates. Whether a given portion of an E-R model was arrived at by specialization
or generalization,the outcome is basically the same:
• A higher-level entity set with attributes and relationships that apply to all of its lower-level entity sets
• Lower-level entity sets with distinctive features that apply only within a particular lower-level entity set
Figure depicts a hierarchy of entity sets. In the figure, employee is a lower-level entity set of person and a higher-
level entity set of the officer, teller, and secretary entity sets. In a hierarchy, a given entity set may be involved as
a lower-level entity set in only one ISA relationship; that is, entity sets in this diagram have only single
inheritance. If an entity set is a lower-level entity set in more than one ISA relationship, then the entity set has
multiple inheritance, and the resulting structure is said to be a lattice.