0% found this document useful (0 votes)
56 views30 pages

RDBMS Unit - Ii 2023

The document discusses database integrity, normalization, and file organization in database management systems. It defines referential integrity and different types of integrity constraints. It also explains problems caused by redundancy in databases like insertion, deletion, and update anomalies. The document then discusses different types of normalization forms including 1st, 2nd, 3rd normal forms and BCNF. It also defines functional dependencies and different types of functional dependencies. Finally, it provides examples to explain concepts like entity integrity, referential integrity, redundancy problems, and different types of functional dependencies.

Uploaded by

serboyka18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views30 pages

RDBMS Unit - Ii 2023

The document discusses database integrity, normalization, and file organization in database management systems. It defines referential integrity and different types of integrity constraints. It also explains problems caused by redundancy in databases like insertion, deletion, and update anomalies. The document then discusses different types of normalization forms including 1st, 2nd, 3rd normal forms and BCNF. It also defines functional dependencies and different types of functional dependencies. Finally, it provides examples to explain concepts like entity integrity, referential integrity, redundancy problems, and different types of functional dependencies.

Uploaded by

serboyka18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

RDBMS UNIT - II Ca iii SEM

____________________________________________________
DATABASE INTEGRITY AND NORMALISATION: Relational Database Integrity - The Keys
- Referential Integrity - Entity Integrity - Redundancy and Associated Problems – Single Valued
Dependencies – Normalisation - Rules of Data Normalisation - The First Normal Form - The
Second Normal Form - The Third Normal Form - Boyce Codd Normal Form - Attribute
Preservation - Lossless-join Decomposition - Dependency Preservation.

File Organisation : Physical Database Design Issues - Storage of Database on Hard Disks - File
Organisation and Its Types - Heap files (Unordered files) - Sequential File Organisation - Indexed
(Indexed Sequential) File Organisation - Hashed File Organisation - Types of Indexes - Index and
Tree Structure - Multi-key File Organisation - Need for Multiple Access Paths - Multi-list File
Organisation - Inverted File Organisation.

Q1) What is Integrity Constraints? Explain types of Integrity


Constraints?
A) Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to
be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.

Types of Integrity Constraint

1. Entity Integrity Constraints


2. Referential Integrity Constraints

1. Entity integrity constraints


o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation and if
the primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.

Example:

pg. 1 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
3. Referential Integrity Constraints
o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key
of Table 2, then every value of the Foreign Key in Table 1 must be null or be available in
Table 2.

Example:

Q2) Explain the Problem of redundancy in Database in detail.


A) Redundancy means having multiple copies of same data in the database. This problem arises
when a database is not normalized. Suppose a table of student details attributes are: student Id,
student name, college name, college rank, course opted.

As it can be observed that values of attribute college name, college rank, course is being repeated
which can lead to problems. Problems caused due to redundancy are: Insertion anomaly, Deletion
anomaly, and Updation anomaly.
1. Insertion Anomaly –
If a student detail has to be inserted whose course is not being decided yet then insertion will not
be possible till the time course is decided for student.

pg. 2 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________

This problem happens when the insertion of a data record is not possible without adding some
additional unrelated data to the record.
2. Deletion Anomaly –
If the details of students in this table is deleted then the details of college will also get deleted
which should not occur by common sense.
This anomaly happens when deletion of a data record results in losing some unrelated
information that was stored as part of the record that was deleted from a table.
3. Updation Anomaly –
Suppose if the rank of the college changes then changes will have to be all over the database
which will be time-consuming and computationally costly.

If updation do not occur at all places then database will be in inconsistent state.

Q3) What is Functional Dependency? Explain types of functional dependency?


A)A Functional dependency is a relationship between attributes.
In functional dependency we can obtain the value of another attribute from given attribute.

It typically exists between the primary key and non-key attribute within a table.

X → Y

The left side of FD is known as a determinant, the right side of the production is known as a
dependent.

pg. 3 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.

Emp_id Emp_name Emp _address

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because
if we know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as:


Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id

Types of functional dependency:

There are mainly four types of Functional Dependency in DBMS. Following are the types of
Functional Dependencies in DBMS:
A. Multivalued Dependency
B. Trivial Functional Dependency
C. Non-Trivial Functional Dependency
D. Transitive Dependency
E. Fully Functional Dependency
F. Partial Functional Dependancy
A)Multivalued Dependency in DBMS
Multivalued dependency occurs in the situation where there are multiple independent multivalued
attributes in a single table. A multivalued dependency is a complete constraint between two sets of
attributes in a relation. It requires that certain tuples be present in a relation. Consider the
following Multivalued Dependency Example to understand.
Example:

Car_model Maf_year Color

H001 2017 Metallic

H001 2017 Green

H005 2018 Metallic

H005 2018 Blue

H010 2015 Metallic

H033 2012 Gray

pg. 4 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
In this example, maf_year and color are independent of each other but dependent on car_model.
In this example, these two columns are said to be multivalue dependent on car_model.
This dependence can be represented like this:
car_model ->maf_year
car_model-> colour
B)Trivial Functional Dependency in DBMS
The Trivial dependency is a set of attributes which are called a trivial if the set of attributes are
included in that attribute.
So, X -> Y is a trivial functional dependency if Y is a subset of X. Let's understand with a Trivial
Functional Dependency Example.
For example:

Emp_id Emp_name

AS555 Harry

AS811 George

AS999 Kevin

Consider this table of with two columns Emp_id and Emp_name.


{Emp_id, Emp_name} ->Emp_id is a trivial functional dependency as Emp_id is a subset of
{Emp_id,Emp_name}.
C)Non Trivial Functional Dependency in DBMS
Functional dependency which also known as a nontrivial dependency occurs when A->B holds true
where B is not a subset of A. In a relationship, if attribute B is not a subset of attribute A, then it is
considered as a non-trivial dependency.

Company CEO Age

Microsoft Satya Nadella 51

Google SundarPichai 46

Apple Tim Cook 57

Example:
(Company} -> {CEO} (if we know the Company, we knows the CEO name)
But CEO is not a subset of Company, and hence it's non-trivial functional dependency.
pg. 5 SHWETA K/SWAPNA A
RDBMS UNIT - II Ca iii SEM
____________________________________________________
D)Transitive Dependency in DBMS
A Transitive Dependency is a type of functional dependency which happens when t is indirectly
formed by two functional dependencies. Let's understand with the following Transitive
Dependency Example.
Example:

Company CEO Age

Microsoft Satya Nadella 51

Google SundarPichai 46

Alibaba Jack Ma 54

{Company} -> {CEO} (if we know the compay, we know its CEO's name)
{CEO } -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{ Company} -> {Age}should hold, that makes sense because if we know the company name, we
can know his age.

A transitive functional dependency is when changing a non-key column, might cause any of the
other non-key columns to change

Consider the table 1. Changing the non-key column Full Name may change Salutation.

E)Fully Functional Dependency


In full functional dependency an attribute or a set of attributes uniquely determines another
attribute or set of attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y and
X->Z which states that those dependencies are fully functional.
F) Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the composite key,
rather than the whole key. If a relation R has attributes X, Y, Z where X and Y are the composite
key and Z is non key attribute. Then X->Z is a partial functional dependency in RBDMS.

pg. 6 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
v.IMP Q4) What is Normalization? Write about types of Normalization.

https://www.youtube.com/watch?v=ABwD8IYByfk
A)
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It
is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion
Anomalies.
o Normalization` divides the larger table into the smaller table and links them using
relationship.
o The normal form is used to reduce redundancy from the database table.

There are the four types of normal forms:

First Normal Form (1NF)


o A relation(table) will be 1NF if it contains an atomic value.(single value)
o It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their
pg. 7 SHWETA K/SWAPNA A
RDBMS UNIT - II Ca iii SEM
____________________________________________________
combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

B)Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional dependent on the
primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

pg. 8 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is


a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER TEACHER_
_ID AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

C)Third Normal Form (3NF)


o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.
o Rule 1- Be in 2NF
o Rule 2- Has no transitive functional dependencies
To move our 2NF table into 3NF, we again need to again divide our table.

Example:

pg. 9 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super keys in the table above:

{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID.
The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP>
table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

pg. 10 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
There are no transitive functional dependencies, and hence our table is in 3NF

D)Boyce Codd normal form (BCNF)


o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:


1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key is {EMP-ID, EMP-DEPT} together so

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

pg. 11 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Q 5) Write about physical database design issues.


A) The database design involves the process of logical design with the help of E-R diagram,
normalisation, etc., followed by the physical design.
The Key issues in the Physical Database Design are:

• The purpose of physical database design is to translate the logical description of data
into the technical specifications for storing and retrieving data for the DBMS.
• The goal is to create a design for storing data that will provideadequate performance and
ensure database integrity, security andrecoverability.
Some of the basic inputs required for Physical Database Design are:
• Normalisedrelations
• Attributedefinitions
• Data usage: entered, retrieved, deleted,updated
• Requirements for security, backup, recovery, retention,integrity
• DBMScharacteristics.
• Performance criteria such as response time requirement with respect tovolume
pg. 12 SHWETA K/SWAPNA A
RDBMS UNIT - II Ca iii SEM
____________________________________________________
estimates.
The issues relating to the Design of the Physical Database Files

Physical File is a file as stored on the disk. The main issues relating to physical files are:
• Constructs to link two pieces of data:
• Sequential storage.
• Pointers.
• File Organisation: How the files are arranged on the disk?
• Access Method: How the data can be retrieved based on the file Organisation?

Storage of Database on Hard Disks


A file organisation refers to the organisation of the data of a file into records, blocks, and
access structures; this includes the way records and blocks are placed on the storage medium
and interlinked. An access method, on the other hand, is the way how the data can be
retrieved based on the file Organisation.

Mostly the databases are stored persistently on magnetic disks for the reasons given below:
• The databases being very large may not fit completely in the mainmemory.
• Storing the data permanently using the non-volatile storage and provide accessto the
users with the help of front endapplications.
• Primary storage is considered to be very expensive and in order to cut shortthe cost
of the storage per unit of data to substantiallyless.

pg. 13 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
Each hard drive is usually composed of a set of disk platters. Each disk platter has a layer of
magnetic material deposited on its surface.
The entire disk can contain a large amount of data, which is organised into smaller packages
called BLOCKS (or pages). On most computers, one block is equivalent to 1 KB of data (= 1024
Bytes).
A block is the smallest unit of data transfer between the hard disk and the processor of the
computer.
Each block therefore has a fixed, assigned, address. Typically, the computer processor will
submit a read/write request, which includes the address of the block, and the address of RAM
in the computer memory area called a buffer (or cache) where the data must be stored / taken
from. The processor then reads and modifies the buffer data as required, and, if required,
writes the block back to thedisk. Let us see how the tables of the database are stored on the
harddisk.

How are tables stored on Disk?

We realise that each record of a table can contain different amounts of data. This is because in
some records, some attribute values may be 'null'. Or, some attributes may be of type varchar
(), and therefore each record may have a different length string as the value of this attribute.
Therefore, the record is stored with each subsequent attribute separated by the next by a
special ASCII character called a field separator.Of course, in each block, we may place many
records. Each record is separated from the next, again by another special ASCII character
called the record separator.

File Organization
Q6) What is File?
A) File is a collection of records related to each other. The file size is limited by the size of memory
and storage medium.
There are two important features of file:
1. File Activity: File activity specifies percent of actual records which proceed in a single run.
2. File Volatility: File volatility addresses the properties of record changes. It helps to increase the
efficiency of disk design than tape.

v.IMP Q7) What is file organization? Explain types of file organization in


detail.

A) File Organization in DBMS: A database contains a huge amount of data, which is stored is
in the physical memory in the form of files. A file is a set of multiple records stored in the binary
format.

pg. 14 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
In the database management system, the file organization describes the logical relationship among
the various stored records. In simple words, we can say that this technique defines how the file
records are mapped onto disk blocks.

File Organization is also defined as storing the files in a specific order.

Objectives of File Organization

Following are the few objectives of the database file organization:

• By using file organization, the records should be read/retrieved/accessed as fast as possible.


• Any user can easily and quickly perform the operations such as insert, update, and delete on
the records present in the database.
• The storage cost is minimal because the information should be stored efficiently.
• There is no other copy of records that should be induced as a result of operations.

Types of File Organization

Following are the various methods which are introduced to organize the files in the database
management system are:

1. Sequential File Organization


2. Indexed file Organization
3. Hash File Organization
4. Heap File Organization

1)Sequential File Organization

It is a method in which the files are stored and sorted one after another on disk. This method is so
simple for file organization.

This file organization arranged the records in either descending or ascending order of the key
column. As the files are sorted in a specific order, so the binary search technique can be used
to reduce the time in searching a file.

Following are the two different ways of implementing this method:

a)Pile File Method

In this sequential file organization method, the files are entered in the same sequence in which
they are inserted into the database tables. This method is so simple.

When any user inserts the new record, the record is then placed at the end of that file. If we delete
or update the record, then the record is searched in the blocks of memory. Once it is found, then
that founded record is marked for deleting. And, the new block of record will be entered.

pg. 15 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
Insertion of New record using Pile File Method:

Suppose the four records are already stored in the sequence. And, we want to insert the new record
(R4) in the sequence, then the R4 record will be placed at the end of the sequence.

b) Sorted File Method

In this sequential organization method, the records are sorted based on the key attribute or
another key when they are entered into the database system.

Insertion of New record using Sorted File Method:

Suppose the five records are already stored in a sorted manner. And, you want to enter the new
record (R4) between the existing records, firstly it will be placed at the end of the file, and then it
will sort the specified sequence.

Advantages of Sequential File Organization

Following are the benefits or advantages of sequential file organization:

• It is a fast and efficient method for the huge amount of data.


• This method does not require so much effort to store the data in the database.
• It is basically used for generating reports and calculating the statistical data.
• Storing the files in this method is cheaper.

Disadvantages of Sequential File Organization


pg. 16 SHWETA K/SWAPNA A
RDBMS UNIT - II Ca iii SEM
____________________________________________________
Following are the two limitations or disadvantages of sequential file organization:

• The sorted file method of sequential file organization is inefficient because it takes more
space and time for sorting the records.
• It is a time-consuming process.

2)Indexed Sequential Access Method (ISAM):

ISAM method is an advanced sequential file organization. In this method, records are stored in
the file using the primary key.In indexed sequential access file, records are stored randomly on a
direct access device such as magnetic disk by a primary key. An index value is generated for each
primary key and mapped with the record.

This index contains the address of the record in the file.

If any record has to be retrieved based on its index value, then the address of the data block is
fetched and the record is retrieved from the memory.

Pros of ISAM:

pg. 17 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
• In this method, each record has the address of its data block, searching a record in a huge
database is quick and easy.
• This method supports range retrieval and partial retrieval of records. Since the index is
based on the primary key values, we can retrieve the data for the given range of value. In the
same way, the partial value can also be easily searched, i.e., the student name starting with
'JA' can be easily searched.

Cons of ISAM

• This method requires extra space in the disk to store the index value.
• When the new records are inserted, then these files have to be reconstructed to maintain the
sequence.
• When the record is deleted, then the space used by it needs to be released. Otherwise, the
performance of the database will slow down.

3)Heap File Organization

Heap file organization is the most simple and basic type of file organization. Sometimes, the heap
file is also called the unordered file. This type of organization works with the blocks of data, and
the new record is inserted at the last page of the file. This type of file organization does not require
any sorting for sorting the records.

If there is insufficient space in the last data block, then the new data block is added to the file. And,
then we can easily insert the record in that data block. This makes the insertion of records very
efficient.

As there is no particular ordering to the field values, so the linear search must be performed for
accessing the records from the file. The linear search access the blocks from the file until the data
is found.

In the heap file organization, each record has an ID which is unique, and every page or every data
block of the file is of the same size.

If we want to delete the record from the file, then the required record has to be accessed, and then
the marked record to be deleted, and after then the block is written back to the disk. The block
which contains the deleted record cannot be used as again.

Insert New Record using Heap File Organization

Suppose the three records are already stored in a heap, and we want to insert the new
record Record2 in that heap.

Let’s suppose that Data Block 2 is full. Then the Record2 will be added in any one of the data
blocks selected by the database system. Let’s say Data Block 1.

pg. 18 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________

If we want to update, search, or delete the record from the heap file, then we have to read the file
from starting until the required record is not found. Suppose, if the database contains a huge
amount of data, then the operations take a lot of time for performing on the record because the
records are not sorted or not specified in some order.

Advantages of Heap File Organization

• For the small database systems, users can access the records fastly than the sequential file
organization.
• It is a simple file organization method.
• It is the best method for loading a large amount of data in the database at a time.

Disadvantages of Heap file Organization

• It is not efficient for large database systems because this method takes more time for
performing the operations on the data.
• The main disadvantage of this file organization is that there is a problem with an unused
block of memory.

4)Hash File Organization

In DBMS, hashing is a technique to directly search the location of desired data on the disk without
using index structure. Hashing method is used to index and retrieve items in a database as it is
faster to search that specific item using the shorter hashed key instead of using its original value.

Data is stored in the form of data blocks whose address is generated by applying a hash function in
the memory location where these records are stored known as a data block or data bucket.

This file organization uses the hash functionfor calculating the block addresses.

The output of the hash function provides the disk location where the data is actually stored.

pg. 19 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________

The non-key field on which the hash function is generated is called the hash column, and the key
column on which the hash function is generated is called a hash key.

Advantages of Hash File Organization

• Users can access the record at a fast speed because the address of the block is known by the
hash function.
• It is the best method for online transactions like ticket booking, online banking, etc.

Disadvantages of Hash File Organization

• There is more chance of losing the data. For Example, In the employee table, when the hash
field is on the Employee_Name, and there are two same names – ‘Areena’, then the same
address is generated for both. In such a case, the record which is older will be overwritten
by newer.
• This method of file organization is not correct in that situation when we are searching for a
given range of data because each record in the database file will be stored at a random
address. So, in this condition, searching for records is not efficient.
• If we search on those columns which are not hash columns, then the search will not find the
correct data address because the search is done only on the hash columns.

Q8) What is Indexing? Explain types of indexes.

Video : https://www.youtube.com/watch?v=krrSzX7q30c

A) Indexing is a data structure technique which allows you to quickly retrieve records from a
database file. An Index is a small table having only two columns. The first column comprises a
copy of the primary or candidate key of a table. Its second column contains a set of pointers for
holding the address of the disk block where that specific key value stored.

An index -

• Takes a search key as input


• Efficiently returns a collection of matching records(data reference)

pg. 20 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
Index structure:

Indexes can be created using some database columns.

o The first column of the database is the search key that contains a copy of the primary key or
candidate key of the table. The values of the primary key are stored in sorted order so that
the corresponding data can be accessed easily.
o The second column of the database is the data reference. It contains a set of pointers
holding the address of the disk block where the value of the particular key can be found.

Indexing is defined based on its indexing attributes. Indexing can be of the following types −
• Primary Index − Primary index is defined on an ordered data file. The data file is ordered
on a primarykey field. The key field is generally the primary key of the relation.

pg. 21 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
Ordered Indexing or primary index is of two types −

• Dense Index
• Sparse Index

Dense Index

In dense index, there is an index record for every search key value in the database. This makes
searching faster but requires more space to store index records itself. Index records contain
search key value and a pointer to the actual record on the disk.

Sparse Index

In sparse index, index records are not created for every search key. An index record here contains
a search key and an actual pointer to the data on the disk. To search a record, we first proceed by
index record and reach at the actual location of the data. If the data we are looking for is not
where we directly reach by following the index, then the system starts sequential search until the
desired data is found.

• Secondary Index − Secondary index may be generated from a field which is a candidate
key and has a unique value in every record,

pg. 22 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________

• Clustering Index − Clustering index is defined on an ordered data file. The data file is
ordered on a non-key field. In some cases, the index is created on non-primary key
columns which may not be unique for each record. In such cases, in order to identify
the records faster, we will group two or more columns together to get the unique values
and create index out of them. This method is known as the clustering index.

Multilevel Index

Index records comprise search-key values and data pointers. Multilevel index is stored on the disk
along with the actual database files. As the size of the database grows, so does the size of the
indices. There is an immense need to keep the index records in the main memory so as to speed
up the search operations. If single-level index is used, then a large size index cannot be kept in
memory which leads to multiple disk accesses.

pg. 23 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________

Multi-level Index helps in breaking down the index into several smaller indices in order to make
the outermost level so small that it can be saved in a single disk block, which can easily be
accommodated anywhere in the main memory.
https://www.youtube.com/watch?v=c3CrNZaReNM
https://www.youtube.com/watch?v=KnXohGgIpQU
BTREE :
A B-tree index stands for “balanced tree” and is a type of index that can be created in relational
databases.
A b-tree index works by creating a series of nodes in a hierarchy. It’s often compared to a tree,
which has a root, several branches, and many leaves.
The steps to find this record would be:

1. Start at the root node and go to the first level.


2. Find the node that covers the range of values that span the value of 109 (for example,
a range of 100 to 200).
3. Move to the second level from this node.
4. Find the node that covers the range of values that span the value of 109 on the second level
(for example, 100 to 120).
5. Move to the third level from this node.
6. Find the record that has an ID of 109.

There’s a set of looking up a range of values and stepping to the next level. This is repeated a few
times until the correct row is found.

pg. 24 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
B-tree stores data such that each node contains keys in ascending order. Each of these keys has
two references to another two child nodes. T0 left side child node keys are less than the current
keys and the right side child node keys are more than the current keys. If a single node has “n”
number of keys, then it can have maximum “n+1” child nodes.

B + TREE:
A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf nodes of
a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same
height, thus balanced. Additionally, the leaf nodes are linked using a link list; therefore, a B+ tree
can support random access as well as sequential access.

Structure of B+ Tree

Every leaf node is at equal distance from the root node. A B+ tree is of the order n where n is fixed
for every B+ tree.

Internal nodes −

• Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
• At most, an internal node can contain n pointers.
Leaf nodes −

• Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
• At most, a leaf node can contain n record pointers and n key values.
• Every leaf node contains one block pointer P to point to next leaf node and forms a linked
list.

pg. 25 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
B+ Tree Insertion

• B+ trees are filled from bottom and each entry is done at the leaf node.
• If a leaf node overflows −
o Split node into two parts.
o Partition at i = ⌊(m+1)/2⌋.
o First i entries are stored in one node.
o Rest of the entries (i+1 onwards) are moved to a new node.
o ith key is duplicated at the parent of the leaf.
• If a non-leaf node overflows −
o Split node into two parts.
o Partition the node at i = ⌈(m+1)/2⌉.
o Entries up to i are kept in one node.
o Rest of the entries are moved to a new node.

B+ Tree Deletion

• B+ tree entries are deleted at the leaf nodes.


• The target entry is searched and deleted.
o If it is an internal node, delete and replace with the entry from the left position.
• After deletion, underflow is tested,
o If underflow occurs, distribute the entries from the nodes left to it.
• If distribution is not possible from left, then
o Distribute from the nodes right to it.
• If distribution is not possible from left or from right, then
o Merge the node with left and right to it.

Multi-key File Organization


Q9) What is Multi key File Organization?
A) The ability to search on many keys is enabled by building multiple index files (multikey file
organisation) “on top of” the data file. The physical database then consists of one or more data
files and many index files, and each data file contains either one or several record types. Each
index file supports access by a particular field or group of fields.

pg. 26 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
There are numerous techniques that have been used to implement multikey file organisation.
Most of the approaches are based on building indexes to provide direct access by key value. In this
section,
we will discuss two approaches for providing additional access paths into a file of data records.

• Multilist file organisation


• Inverted file organisation
Multilist File Organisation
The basic approach to providing the linkage between an index and the file of data records is called
multilist organisation. A multilist file maintains an index for each secondary key. The index for
secondary key contains, instead of a list of primary keys related to that secondary key, only one
primary key value related to that secondary key. That record will be linked to other records
containing the same secondary key in the data file.

PhysicalAddress A/CNo Name Amount A/CType


1 1111 ABC 500 01
2 2574 XYZ 2000 02
3 2389 STU 3000 03
4 3000 KBC 4000 01
5 2494 YQR 800 01
6 3678 SPZ 500 02
MultiListFileforsecondary keyA/C type
A/C Type Pointer
01
02
03

Inverted File Organisation

In inverted file organisation, a linkage is provided between an index and the file of data records. A
key’s inverted index contains all of the values that the key presently has in the records of the data
file. Each key-value entry in the inverted index points to all of the data records that have the
corresponding value. Inverted files represent one extreme of file organisation in which only the
index structures are important. The records themselves may be stored in any way (sequentially
ordered by primary key, random, linked ordered by primary key etc.).

Sequential DataFileSortedonprimarykeyA/CNo

pg. 27 SHWETA K/SWAPNA A


RDBMS UNIT - II Ca iii SEM
____________________________________________________
PhysicalAddress A/CNo Name Amount A/CType
1 1111 ABC 500 01
2 2574 XYZ 2000 02
3 2389 STU 3000 03
4 3000 KBC 4000 01
5 2494 YQR 800 01
6 3678 SPZ 500 02

Inverted Index File for secondary key A/Ctype

A/C Type Physical address

01 1,4,5

02 2,6

03 3

Both inverted files and multilist files have:


• An index for each secondary key.
• An index entry for each distinct value of the secondary key.
• The index may be tabular or tree-structured.
• The entries in an index may or may not be sorted.
• The pointers to data records may be direct or indirect.

10Q. Explain Attribute Preservation with example.


Attribute Preservation
A Relation Schema R is decomposed into 2 (or) more relation schema to eliminate anomalies
contained in its original relation schema R. When information is retrieved from these decomposed
relation schemas must give the same set of tuples as the original relation is preserved.
There are 2 types of attribute preservation
• Lossless – join decomposition preservation
• Dependency preservation
Lossless – join decomposition preservation
Any relation to be recovered from a set of decomposed relations by a series of joins.
Such decomposed relations must contain the same data as the original relation is called Loss less
join decomposition.
pg. 28 SHWETA K/SWAPNA A
RDBMS UNIT - II Ca iii SEM
____________________________________________________
• The decomposition of R into R1 and R2 is lossless with respect to F, if join of Decomposed
relations is equal to original relation
o i.e R1 R2 = R.
• The Decomposition is loss less join decomposition of R if at least one of the functional
dependencies is true.
➢ R1, R2 → R1
➢ R1, R2 → R2

Example:-
Given a relation SKILL with the following attributes WORKID, Skill Type, Bonus Rate.
Where workerid is a primary key. So functional dependency is workerID →Skill Type, Bonus Rate.
Skill
Worker ID Skill Type Bonus Rate

According to FUNCTIONAL DEPENDENCY every nonkey attribute must fully dependent on key
Attribute.
But Functional Dependency:
SkillType →Bonus Rate is also possible.
Which is violation the Functional dependency rule.
To resolve the situation, we need to decompose a SKILL relation (table).
Worker
Worker id Skill Type
Skill
Bonus Rate Workerid

Dependency Preserving Decomposition


If a relation R with a set of functional dependencies Functional Dependencies, is decomposed into
a set of relations R1 R2 R3….Rn respectively.
Every dependency in original relation must be preserved by atleast one decomposed relation is
said to be dependency – preserving.
If F+ = (F1 U F2 U F3….Fn)+
F+ is a closure of set of dependency = (F1 U F2 U F3…FN)+ is a union of all FDs
Consider a relation R (A B C) with FDs, F+= { A → B, B→C} is decomposed into a set of relations
are R1 ( A, B) R2 (B, C).
Now check whether decomposed relations are dependency preserving (or) not
For that, Check which FDs holding by decomposed relations.
pg. 29 SHWETA K/SWAPNA A
RDBMS UNIT - II Ca iii SEM
____________________________________________________
R1 (A, B), F1+ = {A→A, A →B, B →B}
R2 (B, C), F2+ = {B →B, B → C, C→C}
From the FDs F1+ and F2+ the FDs are
F1+ = {A→ B } and F2+ ={B → C}
F+ = (F1 U F2)+
{A → B, B → C} = {A →B, B →C}
Example:-
Let a Relation name SHIPPING with Attributes Ship, Capacity, Date, Cargo, Value with FDs
{Ship → capacity, ship,
Ship,date → cargo,
Cargo,capacity → value} is decomposed onto relations R1, R2, R3.
Ship Capacity Date Cargo Value Ship

R1 (ship, capacity) with the FD, F1+ { Ship → capacity}


R2 (ship, date, cargo) with FD, F2+ {Ship, date → cargo}
R3 (capacity, value, cargo) with the FD, F3+ {cargo, capacity → value}
From the above example FDs of original relation is equal to the union of FDs of decomposed
relations.
Therefore, the decomposed relations are Dependency Preserving Decomposition.

pg. 30 SHWETA K/SWAPNA A

You might also like