0% found this document useful (0 votes)
35 views67 pages

Unit 6 File Organization - Prof Gauri Y Gunjal

Uploaded by

Gauri Khotele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views67 pages

Unit 6 File Organization - Prof Gauri Y Gunjal

Uploaded by

Gauri Khotele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

[UNIT – VI]

FILE ORGANIZATION

Prof. Gauri Y. Gunjal

Prof. Gauri Y Gunjal 1


INTRODUCTION
• Operations on data are applicable to data items stored in main memory.
• However, not always the data is available in main memory.
• This is because of two main reasons.

1. There may be a program whose size is larger than the available memory
or there may be a program, which requires data that cannot fit in main
memory at once.
2. Main memory loses the data once the program is terminated or the
power supply is switched off and it may be required to store data from
one execution of a program to next.

• For these reasons, data should be stored on some external


memory. The place that usually holds the data is a file on the
disk.
Prof. Gauri Y Gunjal 2
CONCEPTS OF FIELDS, RECGRDS AND FILES
• Field: It is a smallest unit to store data, also known as attribute or column.
• A field has two properties; namely, type and size.
• Type specifies the data type and size specifies the capacity of the field to store data.
• For example, address can be of type character with some size in number of characters.
• Record: It is a collection of related fields, also known as tuple or row.
• For example, an employee record may consist of fields Employeeld, Name, Address, City etc.
• File: It is a set of related records, also known as relation or table.
• A file is identified by properties like file name, size and location.
• File can be text file or binary file. Text file stores numbers as a sequence of characters, whereas,
a binary file stores numbers in binary format.
• A file can contain any number of records.
• For example, a file containing records of employees in an organization.

Prof. Gauri Y Gunjal 3


CONCEPTS OF FIELDS, RECGRDS AND FILES
• File Organization: A file has two facets; logical and physical.
• A logical file is a set of records, whereas, physical file shows how records are
physically stored on the disk. File organization refers to the physical representation
of a file.
• Key: It is an attribute that uniquely identifies the records of a file.
• It contains unique values to which can be used to distinguish one record from
another in a file.
• For example,the field Employee ld can be taken as key for employee file, which can
be used to distinguish one record from another.
• Page: A file is loaded in the main memory to perform operations like insertion,
modification, deletion, etc., on it. If the file is too large in size, it is decomposed
into equal size pages, which is the unit of exchange between the disk and the
main memory.
• Index: It is a pointer to a record in a file, which provides efficient and fast
access to records.
Prof. Gauri Y Gunjal 4
Logical vs. Physical Organization of Data

■ logical organization
◻ the abstract way that the computer program is able to access
the data
◻ use of logical structures (e.g. linked lists)
■ physical organization
◻ the actual physical structure of data in memory
◻ i.e. what the sequence of bits look like in memory

Prof. Gauri Y Gunjal 5


Definitions
■ database
◻ collection of related files
■ file
◻ collection of related records
■ record
◻ collection of related fields (e.g. Name, Age)
■ key field
◻ uniquely identifies a record (e.g. UserID)
Prof. Gauri Y Gunjal 6
File
■ A file is an external collection of related data
treated as a unit.
■ Files are stored in auxiliary/secondary
storage devices.
◻ Disk
◻ Tapes
■ A file is a collection of data records with
each record consisting of one or more
fields.
Prof. Gauri Y Gunjal 7
Access Methods

Prof. Gauri Y Gunjal 8


Types of File Organization

■ There are three types of


organizing the file:
1. Sequential access file
organization
2. Direct access file
organization
3. Indexed sequential access
file organization

Prof. Gauri Y Gunjal 9


Assignment # 11 - sequential file

11. Department maintains a student information. The file


contains roll number, name, division and address.
Allow user to add, delete information of student.
Display information of particular employee. If record
of student does not exist an appropriate message is
displayed. If it is, then the system displays the
student details. Use sequential file to main the data.

Prof. Gauri Y Gunjal 10


FILE ORGANIZATION
• Arrangement of the records in a file plays a significant role in accessing
them. Moreover, proper organization of files on disk helps in accessing the
file records efficiently.
• There are various methods (known as file organization) of organizing the
records in a file while storing a file on disk.

(1) Sequential File Organization


(2) Random File Organization
(3) Indexed Sequential File Organization
(4) Multi-key File Organization and Access Methods

Prof. Gauri Y Gunjal 11


Basics (General Idea)
■ Records are stored at different places (different indices or
locations)
■ To find a record, we need to know its location
■ We can search for the record
OR
■ Jump to its location directly (if location is known)
OR
■ A combination of jumping and searching

Prof. Gauri Y Gunjal 12


Fixed-length vs. Variable-length Records

■ Fixed-length
◻ each record is a set size
◻ can be used with direct access file organization
■ access based on math calculations, so size must be fixed in length
■ Variable-length
◻ each record is a variable size
◻ can be used with sequential file organization
■ access is all indexed, so size does not matter

Prof. Gauri Y Gunjal 13


Sequential Files

Prof. Gauri Y Gunjal 14


Sequential file organization- concept and
primitive operations
■ Storing and sorting in contiguous block within files on tape or
disk is called as sequential access file organization.
■ In sequential access file organization, all records are stored in
a sequential order. The records are arranged in the ascending
or descending order of a key field.
■ Sequential file search starts from the beginning of the file and
the records can be added at the end of the file.
■ In sequential file, it is not possible to add a record in the
middle of the file without rewriting the file.

Prof. Gauri Y Gunjal 15


Sequential-access File

cp = current position

Here’s a visual for, perhaps, a tape drive.


For sequential files, access is always sequential as shown above.

Prof. Gauri Y Gunjal 16


Sequential file

■ Sequential file –
records can only be accessed sequentially,
one after another, from beginning to end.

Prof. Gauri Y Gunjal 17


Processing records in a sequential file
While Not EOF
{
Read the next record
Process the record
}

Prof. Gauri Y Gunjal 18


Sequential file organization
■ Advantages of sequential file
It is simple to program and easy to design.
■ Sequential file is best use if storage space.
■ Disadvantages of sequential file
Sequential file is time consuming process.
■ It has high data redundancy.
■ Random searching is not possible.

Prof. Gauri Y Gunjal 19


Primitive Operations

Prof. Gauri Y Gunjal 20


Primitive Operations

Prof. Gauri Y Gunjal 21


Advantages
◼ Simple file design
◼ Very efficient when most of the records must
be processed e.g. Payroll
◼ Very efficient if the data has a natural order
◼ Can be stored on inexpensive devices like
magnetic tape.

Prof. Gauri Y Gunjal 22


Disadvantages
◼ Entire file must be processed even if a single record
is to be searched.
◼ Transactions have to be sorted before processing
◼ Overall processing is slow, because you have to go
through each record until you get to the one you
want!

Prof. Gauri Y Gunjal 23


Applications
■ Applications –
that need to access all records from beginning to
end.
◻ Personal information
■ Because you have to process each record,
sequential access is more efficient and easier than
random access.

■ Sequential file is not efficient for random access.

Prof. Gauri Y Gunjal 24


File Handling through C++ Classes

■ In C++, files are mainly dealt by using three classes


fstream, ifstream, ofstream available in fstream headerfile.
■ ofstream: Stream class to write on files
ifstream: Stream class to read from files
fstream: Stream class to both read and write from/to
files.

Prof. Gauri Y Gunjal 25


Prof. Gauri Y Gunjal 26
Prof. Gauri Y Gunjal 27
In this C++ program we will learn how to read an employee's
details from keyboard using class and object then write that
object into the file?
• This program is using following file stream (file handling) functions

• file_stream_object.open() - to open file


• file_stream_object.close() - to close the file
• file_stream_object.write() - to write an object into the file
• file_stream_object.read() - to read object from the file

Prof. Gauri Y Gunjal 28


Random Access or
Direct Access

Prof. Gauri Y Gunjal 29


Assignment # 12 - direct access file

Implementation of a direct access file –Insertion


12. and deletion of a record from a direct access file

Prof. Gauri Y Gunjal 30


Hashing Files

Prof. Gauri Y Gunjal 31


Direct Access File- Concepts and Primitive
operations
• Direct access file is also known as random access or relative file
organization.
• In direct access file, all records are stored in direct access storage
device (DASD), such as hard disk. The records are randomly placed
throughout the file.
• The records does not need to be in sequence because they are
updated directly and rewritten back in the same location.
• This file organization is useful for immediate access to large amount
of information. It is used in accessing large databases.
• It is also called as hashing.

Prof. Gauri Y Gunjal 32


Hashing File Organization
• A hash function is computed on some other attribute of each
record. The result of the hash function specifies in which block of the
file the record should be placed.

• Unlike sequential file, records in this file organization are not stored
sequentially.
• Instead, each record is mapped to an address on disk on the basis of
its key value. One such technique for this mapping of record to an
address is called hashing.

Prof. Gauri Y Gunjal 33


Direct Access File Organization
• Record address is derived/calculated with math
• No need to search through an index
• Example:
Record Address = UserID MOD 8 + SSN MOD 3

Record Address = UserID%8 + SSN%3

• This math operation is called “key hashing” or “hashing”

Prof. Gauri Y Gunjal 34


Direct Access File
• Advantages of direct access file organization
Direct access file helps in online transaction processing system (OLTP) like
online railway reservation system.
• In direct access file, sorting of the records are not required.
• It accesses the desired records immediately.
• It updates several files quickly.
• It has better control over record allocation.

• Disadvantages of direct access file organization


Direct access file does not provide back up facility.
• It is expensive.
• It has less storage space as compared to sequential file.

Prof. Gauri Y Gunjal 35


Indexed Files

Prof. Gauri Y Gunjal 36


Indexes
• Terminology
• Primary index (one for each file)
• Secondary index for unique field or non-unique field
(several for each file)
• Clustering index for clustering attribute (non-key field
or non-unique field)
• Sparse index for some of the search key values
• Dense index for every search key value
• Types
• Linked list
• Inverted file
• Indexed sequential
• B+-tree
Prof. Gauri Y Gunjal 37
Indexing Technique
• To improve the query response time of a sequential
file

• An index is a set of <key, address> pairs

Prof. Gauri Y Gunjal 38


Mapping in an indexed file

■ To access a record in a file randomly,


you need to know the address of the record.
■ An index file can relate the key to the record address.

Prof. Gauri Y Gunjal 39


Partially-Indexed Sequential Files
Key Record Record
Address 1
A 1 2
B 6 3
4
C 11
5
D 16 6
7
8
9
10
11
12

Prof. Gauri Y Gunjal 40


Fully Indexed Files
Key Record
Address
a 4
b 7
c 5
d 3
e 12
m 9
n 10
p 2
s 11
t 6
z 1

Prof. Gauri Y Gunjal 41


Indexed files
■ An index file is made of a data file, which is a sequential file,
and an index.
■ Index – a small file with only two fields:
◻ The key of the sequential file
◻ The address of the corresponding record on the disk.
■ To access a record in the file :
1. Load the entire index file into main memory.
2. Search the index file to find the desired key.
3. Retrieve the address the record.
4. Retrieve the data record. (using the address)

■ Inverted file –
you can have more than one index, each with a different key.

Prof. Gauri Y Gunjal 42


Logical view of an indexed file

Prof. Gauri Y Gunjal 43


Index Classification
• Primary vs. secondary: If search key contains primary
key, then called primary index.
• Clustered vs. unclustered: If order of data records is
the same as, or `close to’, order of data entries, then
called clustered index.
• Alternative 1 implies clustered, but not vice-versa.
• A file can be clustered on at most one search key.
• Cost of retrieving data records through index varies
greatly based on whether index is clustered or not!

Prof. Gauri Y Gunjal 44


Clustered vs. Unclustered Index

Data entries
Data entries
(Index File)
(Data file)

Data Records Data Records


CLUSTERED UNCLUSTERED

Prof. Gauri Y Gunjal 45


Index Classification (Contd.)
• Dense vs. Sparse: If
there is at least one
data entry per search Ashby, 25, 3000
22

key value (in some Basu, 33, 4003

Bristow, 30, 2007


25

data record), then


30
Ashby
33
Cass Cass, 50, 5004

dense. Smith Daniels, 22, 6003


40

• Alternative 1 always
Jones, 40, 6003
44

leads to dense index. Smith, 44, 3000


44

50

• Every sparse index is


Tracy, 44, 5004

clustered! Sparse Index


on
Dense Index
on
Data File
• Sparse indexes are
Name Age

smaller;

Prof. Gauri Y Gunjal 46


Multilevel Index
• If primary index does not fit in memory, access becomes expensive.
• Solution: treat primary index kept on disk as a sequential file and
construct a sparse index on it.
• outer index – a sparse index of primary index
• inner index – the primary index file
• If even outer index is too large to fit in main memory, yet another
level of index can be created, and so on.
• Indices at all levels must be updated on insertion or deletion from the
file.

Prof. Gauri Y Gunjal 47


Multilevel Index (Cont.)

Prof. Gauri Y Gunjal 48


Indexed sequential file organization-concept
• Indexed sequential access file combines both sequential file and
direct access file organization.
• In indexed sequential access file, records are stored randomly on a
direct access device such as magnetic disk by a primary key.
• This file have multiple keys. These keys can be alphanumeric in which
the records are ordered is called primary key.
• The data can be access either sequentially or randomly using the
index. The index is stored in a file and read into memory when the file
is opened.

Prof. Gauri Y Gunjal 49


Indexed sequential file organization
• Advantages of Indexed sequential access file organization
In indexed sequential access file, sequential file and random file access is possible.
• It accesses the records very fast if the index table is properly organized.
• The records can be inserted in the middle of the file.
• It provides quick access for sequential and direct processing.
• It reduces the degree of the sequential search.
• Disadvantages of Indexed sequential access file organization
Indexed sequential access file requires unique keys and periodic reorganization.
• Indexed sequential access file takes longer time to search the index for the data access or
retrieval.
• It requires more storage space.
• It is expensive because it requires special software.
• It is less efficient in the use of storage space as compared to other file organizations.

Prof. Gauri Y Gunjal 50


Advantages

◼ Provides flexibility for users who need both type of access


with the same file
◼ Faster than sequential

Prof. Gauri Y Gunjal 51


Disadvantages

◼Extra storage space for the index is required,


just like in a book: your text book would be 372 pages
without the index (go on, check!) but is 380 pages with the
index.

Prof. Gauri Y Gunjal 52


B+-Tree Index Files
B+-tree indices are an alternative to indexed-sequential
files.

• Disadvantage of indexed-sequential files


• performance degrades as file grows, since many overflow blocks get created.
• Periodic reorganization of entire file is required.
• Advantage of B+-tree index files:
• automatically reorganizes itself with small, local,
changes, in the face of insertions and deletions.
• Reorganization of entire file is not required to maintain
performance.
• (Minor) disadvantage of B+-trees:
• extra insertion and deletion overhead, space
overhead.

Prof. Gauri Y Gunjal 53


Example of +
B -Tree

Prof. Gauri Y Gunjal 54


+
B -Tree Index Files (Cont.)
A B+-tree is a rooted tree satisfying the following
properties:

• All paths from root to leaf are of the same length


• Each node that is not a root or a leaf has between ⎡n/2⎤ and n children.
• A leaf node has between ⎡(n–1)/2⎤ and n–1 values
• Special cases:
• If the root is not a leaf, it has at least 2 children.
• If the root is a leaf (that is, there are no other nodes in the tree), it can have
between 0 and (n–1) values.

Prof. Gauri Y Gunjal 55


Multi-Indexed File

Prof. Gauri Y Gunjal 56


Factors affecting the File Organization

Prof. Gauri Y Gunjal 57


Factors involved in selecting the File Organization

Prof. Gauri Y Gunjal 58


Files using C++

Prof. Gauri Y Gunjal 59


File I/O Classes

Prof. Gauri Y Gunjal 60


Primitive Functions

Prof. Gauri Y Gunjal 61


Prof. Gauri Y Gunjal 62
Linked Organization- multi list files

Prof. Gauri Y Gunjal 63


Multilist files

Prof. Gauri Y Gunjal 64


Multilist file

Prof. Gauri Y Gunjal 65


coral rings

Prof. Gauri Y Gunjal 66


~ End ~

Prof. Gauri Y Gunjal 67

You might also like