DataBase Management System - 496
DataBase Management System - 496
Unit – II E-R model: entity, entity set, relationship & their types, mapping, constraints
Extended E-R features: generalization, specialization, aggregation, E- R
diagram
Unit – IV Advanced SQl : review of SQL ,Concept of group by, having order by clause,
nested query, join &
Its types, Different functions of SQL.
Numeric, data, data type conversion, character functions, and miscellaneous
functions.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
1
DataBase Management System
B.com IV Sem. (Computer)
The main characteristics of the database approach versus the file processing approach are as follows:-
Self-describing nature of a database system
Insulation between programs and data, and data abstraction
Multiple views of the data
Sharing of data and multiuser transaction processing
Data:-
The raw facts that can be stored or recorded and that have a clear meaning is called data.
Database:-
A collection of data designed to be used by different people is called a database. It is collection of interrelated
data stored together with controlled redundancy to serve one or more applications in an optimal fashion. A
database system is basically a computer based record keeping system. The collection of data, usually referred
to as the database, contains information about one particular enterprise.
Purpose of Database –
The database system should be repository of the data needed for an organization data processing. The data
should be accurate, private & protected from damage. It should be organized so that diverse application with
different data requirements can employ the data when needed.
Advantages of DBMS –
1) Database reduced the data redundancy to a large extent.
2) Database can control inconsistency to a large extent.
3) Database facilitate sharing of data.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
2
4) Database enforces standards.
5) Database can ensure data security & privacy.
6) Integrity can be maintained through database.
7) Conflicting requirement can be balance through database.
Characteristics of data – The data stored in database should have this characteristics –
1) Shared 5) Consistency
2) Persistence 6) Non-redundancy
3) Validity / integrity 7) Data independence.
4) Security
Database Users –
A primary goal of database system is to provide an environment for retrieving information from and storing
new information into the database. There are four different types of database users, differentiated by the way
that they expect to interact with the system –
1) Application programmer 3) Specialized users
2) Sophisticated users 4) Naïve Users
Database Administrator –
The person who has central control over the system is called database Administrator. The function of DBA
include –
1) Schema definition 4) Granting of Authorization for data access.
2) Storage structure and access method 5) Integrity – Constraints specifications
definition 6) Routine maintenance
3) Schema and physical organization,
modification
The major activities, operations & services provided by DBMS are as follows –
1) Transaction Management 5) Language Interface
2) Concurrency Control 6) Storage Management
3) Recovery Management 7) Data Catalog Management.
4) Security Management
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
3
Applications of DBMS – There are different applications of Database Management System as its competitive
era the DBMS is used in following areas.
1) Banking 6) Tele Communications
2) Airlines 7) Finance
3) Organization 8) Sales
4) Universities, 9) Human Resources
5) Credit Card Transactions 10) Manufacturing etc.
Data Models – Data models are different models that can be used to design a database. Design a database
include describing data, data relationship, data semantics and consistency constraints. Various data models are
as follows –
Object based data Model – Object based logical model are used in describing data at a logical & view levels.
They are characterized that they provide fairly flexible structure in capabilities & allow data constraints to be
specified explicitly. This model emphasis on the fact that everything is a object having a setoff attributes.
There are different data models that utilizes this characteristics –
1) The entity relationship model 3) The semantic data model
2) The object oriented model 4) The functional data model
1) The entity relationship model – Entity relationship model moves around three things –
a) Entity, b) Relationship & c) Attribute.
ER-Model is based on perception that everything that have physical properties that is entity, every two
entities can be distinguish from other. Relationship exists between these entities.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
4
2) The object oriented model – Object Oriented Model as name indicates takes everything as object is
based on collection of object. Object contains values stored in instance variable within the object. An
object also contained bodies of code that operate on the object. These bodies of code are called
method.
Limitation of DBMS –
1) High initial investment in Hardware Software & Training
2) Generality that a DBMS provide for defining and processing the data.
3) Overhead for providing security, concurrency control & Integrity function.
1) Relational Model – This is most popular among the various record based data model. This model uses
a collection of table to represent both data and the relationship among those data.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
5
2) Network Model – Data in network model are represented by collection of records and relationship
among the data are represented by links (Pointer). A Pointer is a physical address which identifies
where the next record can be found on the disk.
3) Hierarchical Model – It is very similar to network model. In this data model records are organized as
collection of tree rather than arbitrary graphs.
Physical Data Model – This model is used to describe data at the lowest level that is to describe to behavior of
data at the disk level i.e. the way of data and the data relationship are maintain by storing them on the disk.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
6
This deciding the way the DBMS is going to used secondary storage devices for storing and accessing
database.
The widely used data models are –
1) Unifying model
2) Frame memory model
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
7
Entity Relationship Data Model
Unit – II
Entity Relationship Data Model was introduced in a key article by Chen (1976) in which he describe the
main construct of the ER-Model. – Entities & relationships and their associates attribute. An Entity
Relationship Model is a detailed logical representation of the Data for an organization or a business area.
An Entity Relationship Model is normally expressed as an Entity Relationship Diagram.
Components of ER-Model:
1) Entity – An Entity is a person, place, object, event or concept in the real world i.e. distinguishable
from all other objects.
2) Entity Sets – An Entity set of Entities of the same type that share the same properties or
attributes.
a. Strong Entities – A strong entity set is one that exists independent of other entity sets. A
strong entity set that has primary key.
b. Weak Entities – A weak entity is an entity whose existence depends on some other
entities. A strong entity set that has no primary key.
3) Attributes – An entity can be simply defined as property or characteristics of an entity.
a. Simple Attribute – Simple attributes is an attributes that cannot be broken into smaller
subparts.
b. Composite Attribute – Composite Attribute is an attributes that can be broken into
smaller subparts.
c. Single Valued Attribute – An attribute is said to be single valued attribute if it can have
only one value.
d. Multi Value Attribute - An attribute is said to be single valued attribute if it can have
only more than one value.
e. Stored Attribute – An attribute which is already present as an attribute for an entity is a
stored attribute.
f. Derived Attribute - An attribute which is derived from stored attribute as it is not
present as an attribute for an entity is a derived attribute.
g. Null Attribute – An attribute that can have null value is a null attribute.
Employee
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
8
EmpID Name DeptName Salary
1001 Ravindra Agrawal Finance 20000
1002 Khelan Nagar Production 18000
1003 Himanshu Kulkarni Personnel 25000
1004 Amol Maheshwari Marketing 30000
1005 Ritesh Singh Chouhan Advertisement 22000
Relationship Sets:
A relationship is an association among several entities. Relationships are the glue that holds together the
various components of an ER Model.
A relationship set is a set of relationships of the same type. For example in a bank, any customer can
have any types of loan (Business loan, Personal loan, Home loan) given by the bank. So all the
relationship between all the customers and the loan taken by them are together called as relationship
set.
Degree of Relationship:
The degree of a relationship is the number of entity types that participate in that relationship. The three
most common relationships in E-R-Model are Unary (degree 1), Binary (degree 2) and Ternary (degree
3).
1) Unary Relationship: A unary relationship is a relationship between the instance of a single
entity type.
PERSON Is Married
to
ONE-TO-ONE
2) Binary Relationship: A binary relationship is a relationship between the instances of two entity
types and is the most common type of relationship encountered in data modeling.
ONE-TO-ONE
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
9
PRODUCT LIINE CONTAINS PRODUCT
ONE-TO-MANY
MANY-TO-MANY
3) Ternay Relationship: A ternary relationship is a simultaneous relationship among the instances
of three entity types.
COURSE
STUDENT COURSE
Keys:
Keys are attributes or set of attributes used to distinguish one entity from another in an entity set.
1) Super Key: A super key is set of one or more attributes that can uniquely identify an entity in an
entity set.
2) Candidate Key: All the attributes or set of attribute, when can uniquely identify an entity are
candidate keys. Only those key can be candidate key whose no proper subset is a superkey.
3) Primary Key: The primary key is the term used for the candidates key that is chosen by the
database designer as the principal means of identifying an entity.
4) Alternate Keys: The alternate key is term used for the candidate keys that are remaining after
the primary key has be choosen by database designer.
5) Foreign Key: A foreign key is an attribute or set of attribute in a relation of database that serve
as the primary key of another relation in the same database.
6) Composite Key: A primary key that consists of more than one attribute is called composite key.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
10
Mapping Constraints:
An E-R enterprise schema may define certain constraints to which the contents of a database must
conform. This process can be termed as Mapping Constraints.
(1) Cardinality Constraint: Cardinality Constraint specifies the number of instances of one entity that
can or must be associated with each instance of another entity.
Mapping cardinalities are most useful in describing binary relationship sets, although they can
contribute to the description of relationship sets that involve more than two entity sets.
One to One: An entity in A is associated with at most one entity in B and an entity in B is
associated with at most one entity in A.
One to Many: An entity in A is associated with any number (zero or more) of entities in B. An
entity in B, however can be associated with at most one entity in A.
Many to One: An entity in A is associated with at most one entity in B. An entity in B, however
can be associated with any number (zero or more) of entities in A.
Many to Many: An entity in A is associated with any number (zero or more) of entities in B and
an entity in B is associated with any number (zero or more) of entities in A.
A B A B
a1 b1 b1
a2 b2 a1 b2
a3 b3 a2 b3
a4 b4 a3 b4
b5
a b
A B A B
a1
a1 b1
a2 b1
a2 b2
a3 b2
a3 b3
a4 b3
a4 b4
a5
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
11
c d
(2) Existence Dependencies: Another important class of constraints is existence dependencies.
Specifically, if the existence of entity x depends on the existence of entity y, then x is said to be existence
dependent on y. If y is deleted so x has to be deleted. Entity y is said to be dominant entity and entity x
is said to be subordinate entity.
Specialization
An entity set may include subgroupings of entities that are distinct in some way from other entities in
the set. For instance, a subset of entities within an entity set may have attributes that are not shared by
all the entities in the entity set. The E-R model provides a means for representing these distinctive entity
groupings.
Consider an entity set person, with attributes name, street, and city. A person may be further classified as
one of the following:
• customer
• employee
Each of these person types is described by a set of attributes that includes all the attributes of entity set
person plus possibly additional attributes. For example, customer entities may be described further by
the attribute customer-id, whereas employee entities may be described further by the attributes
employee-id and salary. The process of designating subgroupings within an entity set is called
specialization. The specialization of person allows us to distinguish among persons according to
whether they are employees or customers.
Generalization
The refinement from an initial entity set into successive levels of entity subgroupings represents a top-
down design process in which distinctions are made explicit. The design process may also proceed in a
bottom-up manner, in which multiple entity sets are synthesized into a higher-level entity set on the
basis of common features. The database designer may have first identified a customer entity set with the
attributes name, street, city, and customer-id, and an employee entity set with the attributes name,
street, city, employee-id, and salary.
There are similarities between the customer entity set and the employee entity set in the sense that they
have several attributes in common. This commonality can be expressed by generalization, which is a
containment relationship that exists between a higher-level entity set and one or more lower-level
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
12
entity sets. In our example, person is the higher-level entity set and customer and employee are lower-
level entity sets.
Higher- and lower-level entity sets also may be designated by the terms superclass and subclass,
respectively. The person entity set is the superclass of the customer and employee subclasses.
Attribute Inheritance
A crucial property of the higher- and lower-level entities created by specialization and generalization is
attribute inheritance. The attributes of the higher-level entity sets are said to be inherited by the lower-
level entity sets. For example, customer and employee inherit the attributes of person. Thus, customer is
described by its name, street, and city attributes, and additionally a customer-id attribute; employee is
described by its name, street, and city attributes, and additionally employee-id and salary attributes. A
lower-level entity set (or subclass) also inherits participation in the relationship sets in which its higher-
level entity (or superclass) participates. The officer, teller, and secretary entity sets can participate in the
works-for relationship set, since the superclass employee participates in the works-for relationship.
Attribute inheritance applies
through all tiers of lower-level entity sets. The above entity sets can participate in any relationships in
which the person entity set participates.
Whether a given portion of an E-R model was arrived at by specialization or generalization,
the outcome is basically the same:
• A higher-level entity set with attributes and relationships that apply to all of its lower-level entity sets
• Lower-level entity sets with distinctive features that apply only within a particular lower-level entity
set
Aggregation
One limitation of the E-R model is that it cannot express relationships among relationships. To illustrate
the need for such a construct, consider the ternary relationship works-on, which we saw earlier,
between a employee, branch, and job Now, suppose we want to record managers for tasks performed by
an employee at a branch; that is, we want to record managers for (employee, branch, job) combinations.
Let us assume that there is an entity set manager. One alternative for representing this relationship is to
create a quaternary relationship manages between employee, branch, job, and manager. (A quaternary
relationship is required—a binary relationship between manager and employee would not permit us to
represent which (branch, job) combinations of an employee are managed by which manager.)
It appears that the relationship sets works-on and manages can be combined into one single relationship
set. Nevertheless, we should not combine them into a single relationship, since some employee, branch,
job combinations many not have amanager. There is redundant information in the resultant figure,
however, since every employee, branch, job combination in manages is also in works-on. If the manager
were a value rather than an manager entity,we could instead make manager amultivalued attribute of
the relationship works-on. But doing so makes it more difficult (logically as well as in execution cost) to
find, for example, employee-branch-job triples for which
a manager is responsible. Since the manager is a manager entity, this alternative is ruled out in any case.
The best way to model a situation such as the one just described is to use aggregation. Aggregation is an
abstraction through which relationships are treated as higherlevel entities.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
13
E-R Diagram:
The basic E-R model first introduced during mid 1970s. It has been suitable for modeling most common
business problems and has enjoyed widespread use.
The overall logical structure of a database can be expressed graphically by an E-R diagram.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
14
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
15
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
16
UNIT-III
There are many popular RDBMS available to work. They are as follows:-
MySQL
MS SQL Server
ORACLE
MS ACCESS
SQL:-
SQL (Structured Query Language) is a database sublanguage for querying and modifying relational
databases. It was developed by IBM Research in the mid 70's and standardized by ANSI in 1986.
SQL (pronounced "ess-que-el") stands for Structured Query Language. SQL is used to communicate with
a database.
SQL statements are used to perform tasks such as update data on a database, or retrieve data from a
database. Some common relational database management systems that use SQL are: Oracle, Sybase,
Microsoft SQL Server, Access, Ingres, etc.
Characteristics of SQL:-
Allows users to describe the data.
Allows users to define the data in database and manipulate that data.
Allows embedding within other languages using SQL modules, libraries & pre-compilers.
Allows users to create and drop databases and tables.
Allows users to create view, stored procedure, functions in a database.
Allows users to set permissions on tables, procedures, and views
SQL Process:
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
17
SQL Functions:-
SQL has many built-in functions for performing calculations on data.
Components of SQL:-
SQL commands are instructions used to communicate with the database to perform specific task that
work with data. SQL commands can be used not only for searching the database but also to perform
various other functions like, for example, you can create tables, add data to tables, or modify data, drop
the table, set permissions for users. SQL commands are grouped into four major categories depending
on their functionality:
Data Definition Language (DDL) - These SQL commands are used for creating, modifying, and
dropping the structure of database objects. The commands are CREATE, ALTER, DROP, RENAME,
and TRUNCATE.
Data Manipulation Language (DML) - These SQL commands are used for storing, retrieving,
modifying, and deleting data. These commands are SELECT, INSERT, UPDATE, and DELETE.
Transaction Control Language (TCL) - These SQL commands are used for managing changes
affecting the data. These commands are COMMIT, ROLLBACK, and SAVEPOINT.
Data Control Language (DCL) - These SQL commands are used for providing security to
database objects. These commands are GRANT and REVOKE.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
18
CREATE TABLE - creates a new table
ALTER TABLE - modifies a table
DROP TABLE - deletes a table
CREATE INDEX - creates an index (search key)
DROP INDEX - deletes an index
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
19
Operators in where clause:-
Operator Description
= Equal
IN
IN operator is used when you know the exact value you want to return for at least one of the columns
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
20
INSERT INTO table_name (column1,column2,column3,...)
VALUES (value1,value2,value3,...);
SQL UPDATE Statement
The UPDATE statement is used to update records in a table.
UPDATE table_name
SET column1=value1,column2=value2,...
WHERE some_column=some_value;
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
21
WITH GRANT OPTION - allows a user to grant access rights to other users.
SQL Operator
An operator is a reserved word or a character used primarily in an SQL statement's WHERE clause to
perform operation(s), such as comparisons and arithmetic operations.
Operators are used to specify conditions in an SQL statement and to serve as conjunctions for multiple
conditions in a statement.
Arithmetic operators Logical operators
Comparison operators Operators used to negate conditions
Set operators:-
SQL support few of set operators on the SQL tables. They are as follows:-
Union minus
Intersect
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
22
UNIT IV
ORDER BY :-
The SQL ORDER BY clause is used to sort the data in ascending or descending order, based on one or
more columns. Some database sorts query results in ascending order by default.
Syntax:-
SELECT column-list
FROM table_name
[WHERE condition]
[ORDER BY column1, column2, .. columnN] [ASC | DESC];
GROUP BY
The SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange identical data
into groups.
The GROUP BY clause follows the WHERE clause in a SELECT statement and precedes the ORDER BY
clause.
Syntax:
SELECT column1, column2
FROM table_name
WHERE [ conditions ]
GROUP BY column1, column2
ORDER BY column1, column2
HAVING
The HAVING clause enables you to specify conditions that filter which group results appear in the final
results.
The WHERE clause places conditions on the selected columns, whereas the HAVING clause places
conditions on groups created by the GROUP BY clause.
Syntax
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
Subquery
A Subquery or Inner query or Nested query is a query within another SQL query and embedded within
the WHERE clause.
A subquery is used to return data that will be used in the main query as a condition to further restrict
the data to be retrieved.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
23
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along with the
operators like =, <, >, >=, <=, IN, BETWEEN etc.
There are a few rules that subqueries must follow:
Subqueries must be enclosed within parentheses.
A subquery can have only one column in the SELECT clause, unless multiple columns are in the
main query for the subquery to compare its selected columns.
An ORDER BY cannot be used in a subquery, although the main query can use an ORDER BY. The
GROUP BY can be used to perform the same function as the ORDER BY in a subquery.
Subqueries that return more than one row can only be used with multiple value operators, such
as the IN operator.
The SELECT list cannot include any references to values that evaluate to a BLOB, ARRAY, CLOB,
or NCLOB.
A subquery cannot be immediately enclosed in a set function.
The BETWEEN operator cannot be used with a subquery; however, the BETWEEN operator can
be used within the subquery.
1) Single Row Functions: Single row or Scalar functions return a value for every row that is processed in
a query.
2) Group Functions: These functions group the rows of data based on the values returned by the query.
This is discussed in SQL GROUP Functions. The group functions are used to calculate aggregate values
like total or average, which return just one total or one average value after processing a group of rows.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
24
o Numeric Functions: These are functions that accept numeric input and return numeric values.
o Character or Text Functions: These are functions that accept character input and can return both
character and number values.
o
o Date Functions: These are functions that take values that are of datatype DATE as input and return
values of datatype DATE, except for the MONTHS_BETWEEN function, which returns a number.
o Conversion Functions: These are functions that help us to convert a value in one form to another
form. For Example: a null value into an actual value, or a value from one datatype to another datatype
like NVL, TO_CHAR, TO_NUMBER, TO_DATE etc.
Numeric Functions:
Numeric functions are used to perform operations on numbers. They accept numeric values as
input and return numeric values as output. Few of the Numeric functions are:
ABS (x)
CEIL (x)
FLOOR (x)
TRUNC (x, y)
ROUND (x, y)
Character or Text Functions:
Character or text functions are used to manipulate text strings. They accept strings or characters as
input and can return both character and number values as output.
Few of the character or text functions are as given below:
LOWER (string_value)
UPPER (string_value)
INITCAP (string_value)
LTRIM (string_value, trim_text)
RTRIM (string_value, trim_text)
TRIM (trim_text FROM string_value)
SUBSTR (string_value, m, n)
LENGTH (string_value)
LPAD (string_value, n, pad_value)
RPAD (string_value, n, pad_value)
Date Functions:
These are functions that take values that are of datatype DATE as input and return values of
datatypes DATE, except for the MONTHS_BETWEEN function, which returns a number as output.
Few date functions are as given below.
ADD_MONTHS (date, n)
MONTHS_BETWEEN (x1, x2)
ROUND (x, date_format)
TRUNC (x, date_format)
NEXT_DAY (x, week_day)
LAST_DAY (x)
SYSDATE
NEW_TIME (x, zone1, zone2)
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
25
Conversion Functions:
These are functions that help us to convert a value in one form to another form. For Ex: a null value into
an actual value, or a value from one datatype to another datatype like NVL, TO_CHAR, TO_NUMBER,
TO_DATE.
Few of the conversion functions available in SQL are:
TO_CHAR (x [,y])
TO_DATE (x [, date_format])
NVL (x, y)
DECODE (a, b, c, d, e, default_value)
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
26
UNIT V
Functional dependency
In a given table, an attribute Y is said to have a functional dependency on a set of
attributes X (written X → Y) if and only if each X value is associated with precisely one Y value.
For example, in an "Employee" table that includes the attributes "Employee ID" and "Employee Date of
Birth", the functional dependency {Employee ID} → {Employee Date of Birth} would hold. It follows
from the previous two sentences that each {Employee ID} is associated with precisely one {Employee
Date of Birth}.
Transitive dependency
A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue
of X→Y and Y→Z.
Multivalve dependency
A multivalued dependency is a constraint according to which the presence of certain rows in a
table implies the presence of certain other rows.
Join dependency
A table T is subject to a join dependency if T can always be recreated by joining multiple tables
each having a subset of the attributes of T.
Brief
Normal Defined by In Description
definition
form
Two
The domain of each attribute contains
versions: E.F.
First normal 1970and only atomic values, and the value of each
1NF Codd (1970),
form 2003 attribute contains only a single value from
C.J.
that domain.
Date (2003)
No non-prime attribute in the table
Second
2NF E.F. Codd 1971 is functionally dependent on a proper
normal form
subset of any candidate
Third normal Two 1971and Every non-prime attribute is non-transitively
3NF
form versions: E.F. 1982 dependent on every candidate key in the
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
27
Brief
Normal Defined by In Description
definition
form
Codd (1971), table. The attributes that do not contribute to
C. Zaniolo the description of the primary key are
(1982) removed from the table. In other words, no
transitive dependency is allowed.
Every non-trivial functional dependency in
Elementary
the table is either the dependency of an
EKNF Key Normal C. Zaniolo 1982
elementary key attribute or a dependency on
Form
a superkey
Raymond F.
Boyce–Codd Every non-trivial functional dependency in
BCNF Boyceand E.F. 1974
normal form the table is a dependency on a superkey
Codd
Fourth normal Every non-trivial multivalued dependency in
4NF Ronald Fagin 1977
form the table is a dependency on a superkey
Fifth normal Every non-trivial join dependency in the table
5NF Ronald Fagin 1979
form is implied by the superkeys of the table
Every constraint on the table is a logical
Domain/key
DKNF Ronald Fagin 1981 consequence of the table's domain
normal form
constraints and key constraints
C.J.
Date, Hugh Table features no non-trivial join
Sixth normal
6NF Darwen, 2002 dependencies at all (with reference to
form
and Nikos generalized join operator)
Lorentzos
Modification Anomalies
Once our E-R model has been converted into relations, we may find that some relations are not properly
specified. There can be a number of problems:
Deletion Anomaly: Deleting one fact or data point from a relation results in other information
being lost.
Insertion Anomaly: Inserting a new fact or tuple into a relation requires we have information
from two or more entities – this situation might not be feasible.
Update Anomaly: Updating one fact in a relation requires us to update multiple tuples.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
28
Here is a quick example to illustrate these anomalies: A company has a Purchase Order form:
Normalization Process
Relations can fall into one or more categories (or classes) called Normal Forms
Normal Form: A class of relations free from a certain set of modification anomalies.
Normal forms are given names such as:
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce-Codd normal form (BCNF)
Fourth normal form (4NF)
Fifth normal form (5NF)
Domain-Key normal form (DK/NF)
These forms are cumulative. A relation in Third normal form is also in 2NF and 1NF.
The Normalization Process for a given relation consists of:
a. Specify the Key of the relation
b. Specify the functional dependencies of the relation.
Sample data (tuples) for the relation can assist with this step.
c. Apply the definition of each normal form (starting with 1NF).
d. If a relation fails to meet the definition of a normal form, change the relation (most often
by splitting the relation into two new relations) until it meets the definition.
e. Re-test the modified/new relations to ensure they meet the definitions of each normal
form.
In the next set of notes, each of the normal forms will be defined along with an example of the
normalization steps.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
29
Oracle ORCL Redwood Shores, CA 09/09/2013 24.33
STOCK_PRICES relation:
Symbol Date Close Price
MSFT 09/07/2013 23.96
MSFT 09/08/2013 23.93
MSFT 09/09/2013 24.01
ORCL 09/07/2013 24.27
ORCL 09/08/2013 24.14
ORCL 09/09/2013 24.33
FD1: Symbol, Date → Close Price
In checking these new relations we can confirm that they meet the definition of 1NF (each one has well
defined unique keys) and 2NF (no partial key dependencies).
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
30
Third Normal Form (3NF)
A relation is in third normal form (3NF) if it is in second normal form and it contains no transitive
dependencies.
Consider relation R containing attributes A, B and C. R(A, B, C)
If A → B and B → C then A → C
Transitive Dependency: Three attributes with the above dependencies.
Example: At CUNY:
Consider one of the new relations we created in the STOCKS example for 2nd normal form:
Company Symbol Headquarters
Microsoft MSFT Redmond, WA
Oracle ORCL Redwood Shores, CA
The functional dependencies we can see are:
FD1: Symbol → Company
FD2: Company → Headquarters
so therefore:
Symbol → Headquarters
Company Headquarters
Microsoft Redmond, WA
Oracle Redwood Shores, CA
Again, each of these new relations should be checked to ensure they meet the definition of 1NF, 2NF and
now 3NF.
Boyce-Codd Normal Form (BCNF)
A relation is in BCNF if every determinant is a candidate key.
Recall that not all determinants are keys.
Those determinants that are keys we initially call candidate keys.
Eventually, we select a single candidate key to be the key for the relation.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
31
Consider the following example:
o Funds consist of one or more Investment Types.
o Funds are managed by one or more Managers
o Investment Types can have one more Managers
o Managers only manage one type of investment.
Relation: FUNDS (FundID, InvestmentType, Manager)
Book example:
Student has one or more majors.
Student participates in one or more activities.
StudentID Major Activities
100 CIS Baseball
100 CIS Volleyball
100 Accounting Baseball
100 Accounting Volleyball
200 Marketing Swimming
FD1: StudentID →→ Major
FD2: StudentID →→ Activities
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
32
999 Scudder Global Fund Municipal Bonds
999 Scudder Global Fund Dreyfus Short-Intermediate Municipal Bond Fund
888 Kaufmann Fund T. Rowe Price Emerging Markets Bond Fund
A few characteristics:
No regular functional dependencies
All three attributes taken together form the key.
Latter two attributes are independent of one another.
Insertion anomaly: Cannot add a stock fund without adding a bond fund (NULL Value). Must
always maintain the combinations to preserve the meaning.
Stock Fund and Bond Fund form a multivalued dependency on Portfolio ID.
PortfolioID →→ Stock Fund
PortfolioID →→ Bond Fund
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
33
Intra-relation rules
However: Does Not include time dependent constraints.
Domain: The physical (data type, size, NULL values) and semantic (logical) description of what values
an attribute can hold.
There is no known algorithm for converting a relation directly into DK/NF.
What is Normalization?
Normalization is the process of efficiently organizing data in a database. There are two goals of the
normalization process: eliminating redundant data (for example, storing the same data in more than
one table) and ensuring data dependencies make sense (only storing related data in a table). Both of
these are worthy goals as they reduce the amount of space a database consumes and ensure that data is
logically stored.
Before we begin our discussion of the normal forms, it's important to point out that they are guidelines
and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business
requirements. However, when variations take place, it's extremely important to evaluate any possible
ramifications they could have on your system and account for possible inconsistencies. That said, let's
explore the normal forms.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
34
Boyce-Codd Normal Form (BCNF or 3.5NF)
The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds one more
requirement:
Meet all the requirements of the third normal form.
Every determinant must be a candidate key.
45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com
35