0% found this document useful (0 votes)
13 views

Module-8

This document provides an overview of various concepts related to physical and logical records, including definitions, blocking factors, pinned and unpinned organizations, heap files, and indexing. It discusses the characteristics and operations of heap files, including their advantages and disadvantages, as well as file operations and organization methods. Additionally, it covers hashing techniques for efficient record retrieval and the importance of choosing the right file organization for optimal performance.

Uploaded by

sky2022n
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Module-8

This document provides an overview of various concepts related to physical and logical records, including definitions, blocking factors, pinned and unpinned organizations, heap files, and indexing. It discusses the characteristics and operations of heap files, including their advantages and disadvantages, as well as file operations and organization methods. Additionally, it covers hashing techniques for efficient record retrieval and the importance of choosing the right file organization for optimal performance.

Uploaded by

sky2022n
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 204

Topic 3.

6: Review of Terms - Physical and Logical Records, Blocking


Factor, Pinned and Unpinned Organization, Heap Files, Indexing

Module 3
What We’ll Cover in This Session:

1. Introduction to Physical and Logical Records


● Definitions and Differences
2. Blocking Factor
● What is it? Why is it important?
3. Pinned vs. Unpinned Organization
● Characteristics and Use Cases
4. Heap Files
● Structure and Behavior
5. Indexing
● Overview and Importance
6. Recap and Key Takeaways
Physical vs. Logical Records
· T
&

Definition:


Da
Logical Record: A record as perceived by the user or application (e.g., a row in a table).
Physical Record: How the logical record is stored on disk (e.g., blocks or pages).
S ⑧ i

-
-

~ ~

~
~
Example:
A logical record might be a single employee record, while the physical record could store multiple employee records in a
block on disk. -

~
-
·
-

i
·

I
~ ~

-
Three record storage formats

-

- I
nee a

- (a) A fixed-length record with six fields and size of 71 bytes. -


F
Y

M
&
-

&

~ . (b) A record with two variable-length fields and three fixed-length fields.
-

& L -
-
-

o
&

~ (c) A variable-field record with three types of separator characters.


-
Blocking Factor

Definition:
Fi
● The blocking factor is the number of logical records that can fit into a single physical block (or page)
on disk.
Formula:
Blocking Factor = Block Size / Record Size

Importance:
● Maximizes disk space utilization.
● Reduces I/O operations by reading/writing multiple records at once.

Example:
If a block size is 4 KB and each record is 1 KB, the blocking factor is 4 (4 records per block).
Pinned vs. Unpinned Organization

- Dop
eit
Pinned Organization:


Records are fixed in specific locations on disk. ~
Useful for systems requiring predictable access patterns.
Unpinned Organization:
~ O ·


~
Records can move around on disk (e.g., during reorganization). D
● More flexible but may require additional overhead to track record locations. ~
Comparison:

- -

-
-
~
~
Pinned vs. Unpinned (Buffer Management & Memory Pages)

● Pinned Pages: A page (block of data) in memory is pinned when the database system ~
prevents it from being removed or swapped out because it is actively being used. -
○ Example: A query that scans a table might pin pages in memory to ensure they stay
-
available while the query executes.~ -

○ Once the query is done, the pages are unpinned, meaning they can be replaced by
other data if needed.
● Unpinned Pages: A page is unpinned when the database allows it to be removed from
memory. - ~ -
○ Example: If a page is no longer needed, it is unpinned, and the buffer manager can
evict it to free space.

🔹 Use Case: Used in buffer pool management to decide when to keep or remove data from
memory.
Spanned vs. Unspanned (Storage Allocation & Records)
● Spanned Records: A record is spanned when it crosses multiple pages because it is too
large to fit in a single page. - - -

○ Example: A large row (e.g., a long text or blob field) that doesn’t fit in one page will be
~
split across multiple pages. -
○ The database uses pointers to link the fragments of the record across different pages.
● Unspanned Records: A record is unspanned when it fits entirely within a single page. -
○ Example: A small row that can be stored in one page is an unspanned record.
○ This makes retrieval faster since no extra page reads are needed.
- ~
🔹 Use Case: Used in disk storage and file organization to decide whether records are stored
within one page or across multiple pages.
-
- -
I A
B
B
~ - -
-

*
~ -

-
- ...
Y
-

S
3
- -

- &

~
- -

-
Operations on Files

File operations are categorized into:

·
● Retrieval Operations: Locate and examine records without modifying them.
● Update Operations: Modify records by insertion, deletion, or modification.
- -
Basic File Operations

1. Open: Prepares the file for reading/writing.


2. Reset: Moves file pointer to the beginning.-
3. Find (Locate): Searches for the first matching record. ~
4. Read (Get): Retrieves the current record. ~
5. FindNext: Locates the next matching record. -
6. Delete: Removes the current record. -

7. Modify: Updates field values of the current record. -


8. Insert: Adds a new record. -
9. Close: Completes access and releases resources. ~
&
·

Record-at-a-Time vs. Set-at-a-Time Operations wer


~ X
● Record-at-a-time operations: Apply to single records (e.g., Find, Read, Modify).
● Set-at-a-time operations: Work on multiple records at once:
S
○ FindAll: Retrieves all records matching a condition.
○ Find n: Locates n matching records. ~ -

○ FindOrdered: Retrieves records in a specific order. -


- see
○ Reorganize: Sorts and optimizes file records.

File Organization vs. Access Method

● -File Organization: How records are structured and stored (ordering, hashing, indexing). ~
● ~Access Method: The set of operations available for accessing and managing records.-
● Some access methods are specific to certain file organizations.
-
Choosing an Efficient File Organization

● Different file organizations optimize different operations:


○ If most searches are based on Ssn, sorting records by Ssn or using an index is ideal.
○ If paychecks are grouped by department, ordering by department and name is
preferable.
● Compromise is needed when multiple access patterns exist.
Heap Files

Files of Unordered Records (Heap Files)

-
● The simplest and most basic type of file organization
● Records are inserted in the order they arrive
● New records are added at the end of the file
● Also known as heap or pile file
....

Some blocks of an ordered -


(sequential) file of
EMPLOYEE records with
Name as the ordering key
field.
-
Characteristics of Heap Files -11]
● No specific order for storing records -
● Efficient insertion of new records (appended at the end) -
● Linear search required for retrieval -

● -
Often used with secondary indexes to improve search performance
Commonly used for temporary data storage

~
DYM

·
Inserting Records in Heap Files

Steps:
1. Copy the last block of the file into a buffer
2. Add the new record to the buffer
3. Rewrite the block back to disk
4. Update the file header with the new last block address
-
zud
✅ Fast and efficient insertion process
Searching Records in Heap Files

Process:
● Linear search is required (expensive operation) -o
● On average, (b/2) blocks must be read before finding a record
● If no match is found, all b blocks must be searched ~
● Search time increases as the file grows

⚠ Inefficient for large datasets

~
i
Deleting Records in Heap Files

Two methods:

1. Physical Deletion
○ Find the block containing the record ~
-
○ Remove the record and rewrite the block to disk
○ Leaves unused space in blocks (wasted storage) -
2. Logical Deletion (Using Deletion Marker) ~
~
○ Use an extra byte/bit as a deletion marker
○ Mark records as deleted instead of removing them ~
○ Search programs ignore marked records
~

🔁 Requires periodic reorganization to reclaim space


=

D
Reorganizing Heap Files

● Compacts the file by removing deleted records


● Accesses blocks sequentially to repack records
● Restores file to optimal storage efficiency ~
● Alternative: Use deleted space for new records (requires bookkeeping)

✅ Improves storage efficiency and search performance


S
Heap Files with Fixed vs. Variable Length Records

Fixed-Length Records:

● Simple to manage ~

● Direct access using record position -

Variable-Length Records:
-
● May require deleting and reinserting modified records
● Complicated space management
~
🛠 Both spanned and unspanned storage methods can be used
E
Direct Access in Heap Files

● Used in fixed-length record files with contiguous allocation


● Records are numbered 0, 1, 2, ..., r-1 -
or
-

● Each block contains bfr (blocking factor) records

n
● The ith record is located in:
○ Block ⎣(i/bfr)⎦ -

○ Record position (i mod bfr) within that block

⚡ Enables quick access based on position but not by search conditions


Heap Files

Definition:
● A heap file is an unsorted collection of records stored on disk.
● Records are inserted without any specific order.
Characteristics: ~
● Simple to implement.
● Fast for insertions since no sorting or indexing is required.
● Slow for searches since records must be scanned sequentially.
Example:
A heap file might store employee records in the order they were added, regardless of
their IDs or names.
Summary(Heap)
✅ Heap Files Pros:

● Simple and easy to implement


● Efficient insertion
● Useful for temporary storage

⚠ Heap Files Cons:

● Slow retrieval (linear search)


● Wasted space due to deletions
● Requires periodic reorganization

🔗 Used with secondary indexes to improve performance


·
Ordered Records(Sorted Files)

Introduction to Ordered Records(Sorted Files)



Definition: A file where records are physically ordered based on an ordering field.
Also known as a sequential file. ~ v
● A
If the ordering field is a key field (unique per record), it is called an ordering key.


-
~ ~
Benefits of Ordered Files

● Efficient reading: No sorting needed when reading records in order.


● Faster search: Binary search improves retrieval speed. -
● Efficient access: The next record is often in the same block, reducing disk I/O.

① The we
Binary Search in Ordered Files

● Searching based on an ordering key field enables binary search.


T
● Instead of scanning the entire file, only log₂(b) block accesses are needed. .
Al

Fin
-

● Algorithm: Divide file blocks into halves, check midpoint, and adjust search range accordingly.

X
-

Algorithm - Binary Search on an Ordering Key


- -
1. Set lower (l) and upper (u) bounds on block numbers. -
2. While (l ≤ u): &

○ Read the middle block. ~


○ Compare key values in the block.
O U
○ Adjust l and u accordingly.
3. If found, return record; else, conclude not found.

~
y
Some blocks of an ordered (sequential) file of EMPLOYEE records with
Name as the ordering key field. -


Limitations of Ordered Files


~
Slow insertions: Maintaining order requires shifting records, increasing disk I/O.
i
-
● Expensive deletions: Removing a record creates gaps, requiring file reorganization. ~
● Limited flexibility: Only optimized for searches on the ordering field. 20
~
Managing Insertions & Deletions -

To
● Insertion: -
○ Records must be placed in their correct position.
○ -
Often requires shifting half of the records on average.
● Deletion:
○ Can leave empty spaces (fragmentation). -
○ Using deletion markers & periodic reorganization helps. -

Overflow Files for Optimization

● Solution for slow insertions: Use an overflow file


○ New records are added to an unordered overflow file.
○ Periodic merging with the ordered file restores order.
● Trade-offs: Faster insertions but increases search complexity.
Comparison of Search Methods

&

&
&

&

- &
&
Hashing Techniques
1. Internal Hashing -
~
2. External Hashing for Disk Files
-
3. Hashing Techniques That Allow Dynamic File Expansion
a. Extendible Hashing -

b. Dynamic Hashing -
c. Linear Hashing ~
Hashing

-
Introduction to Hashing

● Hashing is a technique for fast record retrieval using a hash function.


● Converts a search key into a bucket address where the record is stored.
● Used for both internal and external file storage.

Internal vs. External Hashing

● Internal Hashing: Uses an array-based hash table. -


~
● External Hashing: Uses disk-based buckets for better storage management.
● Key Concept: Hashing field is used to determine data location.

Hash Functions ' , --


● Hash function h(K) = K mod M maps a key K to an integer M.
● Must ensure uniform distribution to avoid clustering. ~
● Example: Hashing a student ID to find their record.
Ero3
O
10
O
104
/
=
10

·
1 .
=

10

=
1
E

=
- 104
-
:
Internal hashing data structures.

(a) Array of M positions for use in internal


hashing.
N
(b) Collision resolution by chaining
records.
~
The diagram represents internal hashing using two structures:

● (a) An Array for Internal Hashing (Direct Mapping)


● (b) Collision Resolution using Chaining (Linked Overflow Area)
#
=

J
% i
-100 i -
-

-2
z

calsiut
2 -
-

optam
.
a) Internal Hashing Array (Figure 17.8 (a)):

-
1⃣ Internal Hashing Array (Figure 17.8 (a)):
-

● This is a fixed-size table (array) with M positions (or slots) used for hashing
data.
● Each row consists of:
○ Data fields (e.g., Name, SSN, Job, Salary). -
○ Each position can store a record directly.

🔑 How Hashing Works:


-
● A hash function h(K) maps the key K to an index in the table (from 0 to
M-1).
● Example: h(SSN) = SSN % M → Maps the SSN to a table slot.

⚡ Example:

If M = 10, and SSN = 12345, the record would be stored at:


h(12345) = 12345 % 10 = 5 → Position 5 in the table.
- -
-
b) Collision Resolution by Chaining -
Collision:

● A collision occurs when two different keys map to the same index.
● Example: Both h(12345) and h(54325) might map to index 5.
T -
Chaining Mechanism: &

● Uses an overflow space to store collided records.


● Each slot in the table has an overflow pointer:
○ Points to the next record in the overflow space (linked list).
○ If there's no overflow, the pointer is -1 (null).
z
🛠 How Overflow Works:

● When a collision happens:


1. The collided record is stored in the overflow space.
2. The original slot’s overflow pointer is updated to point to the overflow location.
3. A linked list is formed through pointers for all records that hash to the same index.
- -
Collision resolution techniques in hashing

~Open Addressing

E
If a collision occurs (i.e., two different records get the same hash address), the program looks for the next available
position in a sequential manner.
● This process continues until an empty space is found.
● A common technique under open addressing is linear probing, where the next available slot is checked one by

D
one. -
Chaining ~
~ -
● Instead of looking for a new location in the main hash table, extra overflow locations are maintained.
● Each record slot has a pointer field that links to an overflow record if a collision occurs.
● This creates a linked list of records that share the same hash address.
-
Multiple Hashing
-
~● If the first hash function leads to a collision, a second hash function is applied to find an alternative address.
● If that also results in a collision, either a third hash function is applied or open addressing is used to resolve it.
● This method reduces clustering and improves efficiency.
ple

~
No
D
Q

a
External Hashing for Disk Files

&

Figure 17.9 Matching bucket


numbers to disk block addresses.
-
External Hashing for Disk Files
~
1⃣ Bucket Number:

● Represents a logical position in the hash table.


● Each bucket corresponds to an entry produced by a hash function:
○ Example: h(Key) = Key % M X
● The hash function maps keys to buckets in the range 0 to M-1.
-
2⃣ Block Address on Disk:

● Each bucket is mapped to a specific block address on the disk.


● The block address indicates the physical location on the disk where the corresponding data is stored.
● This mapping ensures that each bucket directly points to a specific block for fast access.

3⃣ Disk Blocks:

● A disk block is the smallest unit of data read from or written to a disk.
● Each block contains:
○ Multiple records or entries. -
○ Overflow pointers if records exceed the block size.
External Hashing for Disk Files

How Disk Block Mapping Works:

1. Inserting a Record:
-
○ The system computes the hash value using a function like h(Key) = Key % M. -
○ The result points to a bucket number.
○ The bucket number is mapped to a block address on the disk.
2. Data Retrieval:
○ To retrieve a record:
■ Use the same hash function to determine the bucket number. -
■ Follow the mapping to quickly locate the corresponding block address.
■ Retrieve the record directly from the disk block.
3. Handling Collisions:
○ If multiple records map to the same block:
■ Overflow records are stored in linked blocks (chaining). -
■ Pointers in the block direct the system to overflow areas.
External Hashing for Disk Files

● Assume M = 4 (4 buckets).
● Hash function: h(Key) = Key % 4

Inserting Records:

● Record A → Key = 8 → h(8) = 0 → Stored in bucket 0,


mapped to Block Address 0.
● Record B → Key = 10 → h(10) = 2 → Stored in bucket 2,
mapped to Block Address 2.
● Record C → Key = 14 → h(14) = 2 → Collision occurs →
Stored in overflow block linked to Block Address 2.
-
External Hashing for Disk
-
aol ,
-
Files ~ --

al ~
-
-

-
1 -
&
-

i
j
-
~

~
-

Figure 17.10 : Handling overflow for buckets by chaining.


~
External Hashing-Explanation of the Diagram:
~
1. Main Buckets:

● The hash table is divided into buckets (like Bucket 0, Bucket 1, Bucket 2, etc.).
● Each bucket can hold one or more records (or pointers to the records).
● For example:
-
○ Bucket 0 holds two records: 340 and 460.
○ Bucket 1 holds three records: 321, 761, and 91. -

2. Overflow Buckets:

● When a bucket is full and more elements hash to the same bucket, overflow occurs.
● Instead of overwriting or rejecting the record, an overflow chain (linked list) is created.
● The additional records are stored in overflow buckets, and pointers link them sequentially.
For example: - -
○ In Bucket 1, after adding 321, 761, and 91, there’s still more data (981 and 182) that needs to be stored.
○ These extra elements are placed in overflow buckets linked to Bucket 1.
-
3. Chaining (Linked List Structure):
-

● Overflow records (981 and 182) are stored in a linked list attached to the corresponding bucket.
● If an overflow bucket becomes full, another overflow bucket can be linked, forming a chain (as shown with record 652 for
Bucket 2).
-
4. NULL Pointers:

● When there are no further overflow records, the chain ends with a NULL pointer indicating the end of the overflow list.

Advantages of Chaining for Collision Resolution:

● Efficient storage: Allows dynamic handling of overflow without the need for a fixed-size bucket.
● Simplified insertions: New records can be easily appended to the end of the chain.
● Minimizes rehashing: No need for rehashing as records overflow naturally using linked lists.
Disadvantages: Did
~
● Slower searches: In the worst case (when many records hash to the same bucket), searching becomes linear as it needs to
traverse the entire chain.
● Extra memory usage: Additional pointers (overhead) are needed for chaining.
~
Extendible Hashing -
~

-
-
o
i

~
~
>

~
·
Structure of the extendible hashing scheme.
Extendible Hashing

1. Directory:
● The directory acts as a table that maps binary hash prefixes to the corresponding data
buckets.
● Each entry in the directory corresponds to a binary combination of bits (in this case, 3 bits):
○ 000, 001, 010, 011, 100, 101, 110, 111 -
● The directory points to the correct bucket based on the first d bits (where d is the global
depth).

2⃣ Global Depth (d):

● The global depth (d = 3) indicates how many bits from the hash value are used to index
into the directory.
● With d = 3, there are 2^3 = 8 directory entries (ranging from 000 to 111).
~
Extendible Hashing

3. Local Depth (d'):

● Each bucket has its own local depth (d'), which represents how many bits are used to distinguish
records within that specific bucket.
● If a bucket overflows:
○ If d' < d: Only the bucket splits, and a new bit is used to differentiate records.
○ If d' = d: The directory size doubles, increasing the global depth (d).

4⃣ Data File Buckets:

● Each bucket stores actual records whose hash values match the corresponding binary prefix:
○ E.g., The bucket for 000 stores all records whose hash values start with 000.
● Buckets can hold multiple records until a threshold is reached (bucket overflow).
● When a bucket overflows, it either splits or triggers an expansion of the directory.
Extendible Hashing

How Insertion Works:

1. Insertion of a Record:
○ Suppose a record’s hash value is 101110.
○ The system uses the first 3 bits (101) because the global depth (d) is 3.
○ The directory entry for 101 will point to the appropriate bucket for insertion.
2. Handling Overflow:
○ If the corresponding bucket overflows:
■ Case 1: If d' < d, split the bucket and increase its local depth by 1.
■ Case 2: If d' = d, double the directory size and increment the global depth (d =
4).
3. Directory Doubling:
○ Doubling the directory adds one more bit to all directory entries.
○ This creates new pointers for the newly created buckets after splitting.
Dynamic Hashing
~
-- ~
-
-
-

Structure of the dynamic hashing scheme


Dynamic Hashing

1⃣ Directory Structure (Binary Tree Representation):

● ~
The directory is structured as a binary tree of nodes.
● Each internal node represents a decision point based on bits (0 or 1) from the hash value. ~
● Paths from the root to a leaf correspond to binary prefixes of the hash value.
-

Node Types:
-
● Internal Directory Node: Represents a branching decision based on bits (shown as circles in the figure).d
● Leaf Directory Node: Points directly to the buckets (shown as rectangles).

2⃣ Data File Buckets: ~

● Each leaf node points to a bucket in which data records are stored.
● Records are placed into buckets based on the prefix of their hash values.
○ Example: A record with hash 001011 will go to the bucket where the hash starts with 001.
-
-
Dynamic Hashing

3⃣ Splitting Buckets:

● When a bucket becomes full: -


~
○ A new internal node is created, and the bucket splits.
○ The binary tree grows deeper at that specific branch by considering more -
bits from the hash. -
● Only a part of the tree expands when needed, ensuring efficient memory usage.
g
Dynamic Hashing 001]
🔄 How Dynamic Hashing Works:

1. Inserting a Record: -
im
Di
○ The hash value of the record is computed.
○ The system follows the binary path: -
■ At each node, based on the bit (either 0 or 1), it follows the corresponding branch.
■ When a leaf node is reached, the record is placed into the associated bucket.
2. Handling Overflow (Bucket Splitting):
○ If a bucket overflows:
■ A new internal node is introduced.
■ The records are redistributed into two new buckets based on an additional bit from their hash
values.
3. Directory Expansion: ~
○ Unlike Extendible Hashing, the directory does not double entirely. ~
○ Only the required part of the tree expands, leading to more efficient space utilization.
-
-

-
Handling Collisions

● Collision: When two records hash to the same address.


● Techniques to resolve:
○ Open Addressing: Find next available slot.
○ Chaining: Use linked lists for overflow.
-
○ Multiple Hashing: Use secondary hash function.

External Hashing (Disk Storage)

● Uses buckets instead of individual memory addresses.


● Each bucket holds multiple records.
● Uses a directory to map hash values to disk locations.
-
Static vs. Dynamic Hashing

● Static Hashing: Fixed number of buckets.


○ Issues: Cannot efficiently handle increasing records.
● Dynamic Hashing: Adapts to changing data sizes.
○ Uses Extendible Hashing and Linear Hashing.

Extendible Hashing -

● Uses a directory that expands as more records are added.


● Key idea: Increase number of bits used in the hash.
● Benefits:
○ No need to reorganize all records when a bucket overflows.
○ Efficient searching.

Linear Hashing -
● Buckets split dynamically based on data growth.
● Uses progressive expansion rather than directory doubling.
● Reduces large-scale reorganization costs.
-

Comparison of Hashing Methods

~ ~ w

~
~ ~

z ~ ↑
Summary & Conclusion
-
● Hashing provides efficient record access.
● Collision resolution is crucial for performance. -
● Extendible and Linear Hashing adapt well to data growth. -
● Used in database indexing, memory allocation, and file storage. -
800
Practical Exercise

Task 1:
Calculate the blocking factor for a system where:
● Block size = 8 KB
● Record size = 2 KB
Hint:
Use the formula:
plaintext
Rt

o
Copy
1
~
Blocking Factor = Block Size / Record Size
Task 2:
Explain why heap files are inefficient for large-scale databases with frequent search operations.
Hint:
Consider the sequential scanning process and lack of ordering.
Recap and Key Takeaways 3.6

What We Learned Today:


1. Logical Records represent data from the user's perspective, while Physical Records focus on storage efficiency.
2. Blocking Factor determines how many records fit into a block, optimizing disk usage.
3. Pinned Organization fixes record locations, while Unpinned Organization allows flexibility.
4. Heap Files store records without order, making them simple but inefficient for searches.
5. Indexing improves query performance by organizing data for faster retrieval.
Next Steps:
● Explore advanced indexing techniques like B-Trees and hashing in future sessions.

Questions?
Topic 3.7: Single-Level Indices, Numerical Examples
Module 3
What We’ll Cover in This Session:

1. Introduction to Indexing
● Why do we need indices?
2. Types of Single-Level Indices
● Primary Index
● Secondary Index
3. Numerical Examples
● Calculating Index Size and Search Efficiency
4. Practical Exercises
5. Recap and Key Takeaways
What is Indexing?

Definition:
● An index is a data structure that improves the speed of data retrieval operations on a
database table. ~
● It works like an index in a book, allowing quick access to specific data without scanning
the entire table. - - -

Why Use Indexing?


● Faster query performance. -
● Enables efficient range queries and sorting. -
Drawbacks:
● Increases storage requirements.
● Slows down insertions and updates due to index maintenance.
·
& BST
I

+

a
Indexing

Definition:
● Indexing is a technique used to improve the speed of data retrieval operations on a database.
● An index is a data structure (e.g., B-tree, hash table) that maps keys to record locations.
Types of Indexes:

-
1. Primary Index: Built on the primary key.
2. Secondary Index: Built on non-primary key columns.
3. Clustered Index: Determines the physical order of data.
4. Non-Clustered Index: Stores a separate structure pointing to data.
Benefits of Indexing:
● Faster query performance.
● Enables efficient range queries and sorting.
Drawbacks:
● Increases storage requirements.
● Slows down insertions and updates due to index maintenance.
Single-Level Indices

Definition:
● A single-level index is an index that uses a single level of entries to map keys to
record locations. - -

~
● It is simpler than multi-level indices but may not scale well for very large datasets.
Types of Single-Level Indices:
1. Primary Index: Built on the primary key of a table. ~
● Assumes records are stored in sorted order by the primary key.
2. Secondary Index: Built on non-primary key columns.
● Can be created on unsorted data.
~
Primary Index

Definition:
● A primary index is built on the primary key of a table.
● It assumes that records are stored in sorted order by the primary key.
Structure:
● Each entry in the index contains:
● Key value (e.g., primary key).
● Pointer to the block where the record is stored.
Advantages:
● Efficient for range queries.
● Reduces the number of disk accesses.
Example:Suppose we have a table with 1000 records sorted by EmployeeID. The primary index might look like this:

- -
Secondary Index

Definition:


A secondary index is built on non-primary key columns.
It allows indexing on fields other than the primary key.
~
Structure:
● Each entry in the index contains:
● Key value (e.g., a secondary column like LastName).
● Pointer to the record location.

~
Advantages:
● Enables fast searches on non-primary key columns.
● Useful for tables with multiple search criteria.

-
Disadvantages:
● Requires additional storage.
● May slow down insertions and updates.
Example:Suppose we want to index employees by LastName. The secondary index might look like this:
Numerical Example - Primary Index

Scenario: -
The same file has a secondary index on LastName, with each index entry being 20 bytes. Calculate:
1. Number of index entries. -
2. Size of the secondary index. -
Solution:
1.Number of Index Entries:

Since the secondary index is built on all records: ~

-
Number of Entries = Total Records = 10,000
-

2.Size of Secondary Index:

Index Size = Number of Entries × Entry Size = 10,000 × 20 = 200,000 bytes ≈ 200 KB
- -
--
-

-
Practical Exercise

Task 1: -
A file contains 5000 records, each of size 200 bytes. The block size is 4 KB. Calculate:
1. Blocking factor.
2. Number of blocks needed to store the file.
3. Size of the primary index if each index entry is 12 bytes.
Task 2:
If the same file has a secondary index on DepartmentID , with each index entry being 18 bytes, calculate the size of the
secondary index.
Topic 3.8: Multi-Level Indices, Numerical Examples
What We’ll Cover in This Session:
1. Introduction to Multi-Level Indices
● Why do we need multi-level indices?
2. Structure of Multi-Level Indices
● How they work and their advantages
3. Numerical Examples
● Calculating index levels and search efficiency
4. Practical Exercises
5. Recap and Key Takeaways
6.
Op
·
What are Multi-Level Indices?

Definition:
● A multi-level index is an indexing technique that uses multiple levels of indices to map keys to record locations.~
● It is used to overcome the limitations of single-level indices when dealing with very large datasets.

-
Why Use Multi-Level Indices?
● Reduces the number of disk accesses required for searching.
● Scales better than single-level indices for large datasets.
Key Characteristics:
● The top level contains pointers to the next level.
● The bottom level points to actual data blocks.
Structure of Multi-Level Indices

Explanation:

~
A multi-level index is like a tree structure where each level reduces the search space.
a
● The top level (root) points to intermediate levels.
● The bottom level points to data blocks.
-
Example:
Suppose we have a file with 1 million records. A multi-level index might look like this:

~
f
- -
-
~
Advantages of Multi-Level Indices

1. Efficient Search:
● Reduces the number of disk accesses by narrowing down the search space at each level.

-
2. Scalability:
● Handles very large datasets more effectively than single-level indices.
3. Flexibility:
● Can be combined with other indexing techniques like B-Trees.
Disadvantages:
● Increased storage requirements due to multiple levels.
● More complex to implement and maintain.
-
~
Numerical Example - Multi-Level Index

Scenario:
A file contains 1,000,000 records, each of size 200 bytes. The block size is 4 KB. Each index entry
-
...
- - -
~is 12 bytes. Calculate:
1. Blocking factor for data blocks. ~
2. Number of blocks needed to store the file.
3. Number of levels in the multi-level index.
Solution:
1.Blocking Factor: ~
Blocking Factor = Block Size / Record Size = 4096 / 200 ≈ 20 records per block
-
-

2.Number of Blocks: > -


Number of Blocks = Total Records / Blocking Factor = 1,000,000 / 20 = 50,000 blocks
-
-
Numerical Example - Multi-Level Index

T
Scenario:

-
1. Number of levels in the multi-level index.
Solution 3:
Number of Levels in Multi-Level Index:
Each index block can hold:
Entries per Index Block = Block Size / Entry Size = 4096 / 12 ≈ 341 entries
-
-
Bottom Level: Points to 50,000 data blocks. Requires:
Blocks in Bottom Level = 50,000 / 341 ≈ 147 blocks
-

--
Intermediate Level: Points to 147 blocks. Requires:
Blocks in Intermediate Level = 147 / 341 ≈ 1 block
● Top Level (Root): Points to 1 block.-
Total Levels: 3 (Root → Intermediate → Bottom).
Search Efficiency with Multi-Level Indices

Scenario:
How many disk accesses are required to retrieve a record using a multi-level index? -S
Explanation:
● Each level reduces the search space. -
● For the previous example:

"I
● Access the root block (1 access).
● Access the intermediate block (1 access).
● Access the bottom-level block (1 access).
● Access the data block (1 access).
Total Disk Accesses: 4
Comparison with Single-Level Index:
● Single-level index would require accessing all 50,000 blocks in the worst case.
Practical Exercise

Task 1:
A file contains 2,000,000 records, each of size 150 bytes. The block size is 8 KB. Each
index entry is 16 bytes. Calculate:
1. Blocking factor for data blocks.
2. Number of blocks needed to store the file.
3. Number of levels in the multi-level index.
Task 2:
If the same file uses a single-level index, how many disk accesses are required in the
worst case? Compare it with the multi-level index.
Recap and Key Takeaways

What We Learned Today:


1. Multi-Level Indices reduce the number of disk accesses by organizing data into hierarchical levels.
2. They scale better than single-level indices for very large datasets.
3. Numerical Examples demonstrate how multi-level indices improve search efficiency.
4. Multi-level indices trade off increased storage for faster retrieval.
Next Steps:
● Explore advanced indexing techniques like B-Trees and hashing in future sessions.

Questions?
Topic 3.9: B-Trees and B+Trees (Structure Only, Algorithms Not
Required)
m-wasine =
Bu
BST - d K

In
/
wo

eiio
BTree-. chint) ,

#
ag
What We’ll Cover in This Session:

1. Introduction to B-Trees
● What are B-Trees? -
● Key Characteristics -
2. Structure of B-Trees
● Nodes, Keys, and Pointers~
3. Introduction to B+Trees
-

● What are B+Trees?


● Differences from B-Trees
4. Practical Examples
5. Recap and Key Takeaways
-
What are B-Trees?

Definition:
● A B-Tree is a self-balancing tree data structure that maintains sorted data and allows efficient
insertion, deletion, and search operations.
● It is widely used in databases and file systems for indexing large datasets.
Key Characteristics: ~

~
1. Balanced Structure: All leaf nodes are at the same level.
2. Multi-Way Search: Each node can have multiple keys and pointers.
3. Efficient Disk Access: Minimizes the number of disk I/O operations. ~
Applications:
● Database indexing. ~
● File system organization. -
(a) A node in a B-tree with q – 1 search values

v - -
~
-
-


-

~ ~ ~
A B-tree of order p = 3.The values were inserted in the
order 8, 5, 1, 7, 3, 12, 9, 6.
~
&
&

~

& ↑
Structure of B-Trees

Explanation:
● A B-Tree consists of nodes , each containing keys and pointers .
-
● Internal Nodes: Contain keys and pointers to child nodes.
● Leaf Nodes: Contain actual data or pointers to data blocks. -
Example of a B-Tree Node:
| Pointer 1 | Key 1 | Pointer 2 | Key 2 | Pointer 3 |

--
Rules for B-Trees: a
-
1. Each node can have up to m children (where m is the order of the tree).
2. Each node contains between ceil(m/2) and m keys.
3. All leaf nodes are at the same level.
~
Example of a B-Tree

Scenario:


A B-Tree of order 3 (each node can have up to 3 children).

Explanation:
● Root node contains keys [10, 20]. -- . -
● Left child contains keys [5, 7].
● Middle child contains key [15].
● Right child contains keys [25, 30].
Search Process:
d
To search for key 15:

~
1. Start at the root [10, 20].
2. Compare 15 with 10 and 20.
f
~
3. Move to the middle child [15].
A B-Tree is a specialized m-way tree designed to optimize data access, especially on disk-based storage
systems. -

- - ~
● In a B-Tree of order m, each node can have up to m children and m-1 keys, allowing it to efficiently
manage large datasets.
~

● The value of m is decided based on disk block and key sizes.


● One of the standout features of a B-Tree is its ability to store a significant number of keys within a
single node, including large key values. It significantly reduces the tree’s height, hence reducing
costly disk operations.
-● B Trees allow faster data retrieval and updates, making them an ideal choice for systems requiring
efficient and scalable data management. By maintaining a balanced structure at all times,
● B-Trees deliver consistent and efficient performance for critical operations such as search, insertion,
and deletion.
What are B+Trees?
·
Definition: ~
● A B+Tree is a variation of the B-Tree optimized for range queries and sequential access.
● Unlike B-Trees, all data is stored in the leaf nodes , and internal nodes only contain keys for
navigation. Y = -
-
Key Characteristics:
1. Leaf Nodes: Contain all the data and are linked together for sequential access. ~
2. Internal Nodes: Act as indexes to guide searches. X -
3. Efficient Range Queries: Leaf nodes are connected, enabling fast traversal. -
Applications: ~
● Database indexing (e.g., MySQL uses B+Trees for clustered indexes).
● File systems.
~
-
-

↑ -
-

-
-
-
--- -
-

T
&
10M
-
es-24
/ L

-
pointe
Need of a B-Tree
The B-Tree data structure is essential for several reasons:

~
Sim

-
-

-
-
-

~
-

-
- -

~
-
Searching in a B-Tree
● Searching in a B-Tree is similar to searching in a Binary Search Tree (BST).
● Let the key to be searched be k.

Search Algorithm:

1. Start from the root and recursively traverse down. Example:


2. For every visited non-leaf node: Input: Search 120 in a given B-Tree.
Output: If found, return the node; otherwise,
○ If the current node contains k, return the node.
return NULL.
○ Otherwise, determine the appropriate child to traverse:
■ This is the child just before the first key greater than k.
3. If we reach a leaf node and don’t find k, return NULL.

Key Insights:

● Recursive approach: Similar to searching a BST.


● Optimized search at each level.
● Separation values (keys) guide the search direction.
● If k is out of range, it must be in a different branch.
~
B+ Tree Data Structure



B+ Tree is a variation of B-Tree, designed for efficient indexing and searching.
Key Characteristics:
○ Data pointers are stored only at leaf nodes.~
M ~
&
○ Leaf nodes contain every value of the search field along with a pointer to the actual data (record or block).
○ Internal nodes do not store actual data; they guide the search process. -
○ Some values from leaf nodes are repeated in internal nodes for navigation. T
○ Leaf nodes are linked together to provide ordered access to data.
~
Advantages of B+ Tree:

✅ Efficient range queries due to linked leaf nodes. ~


✅ Faster sequential access compared to B-Trees. ~
✅ Used in database indexing and file systems.

📌 Key Insight:
~
● Internal nodes help in searching, but actual data is always stored in leaf nodes. ↑
● This structure makes B+ Trees more efficient for large databases! 🚀

-
Features of B+ Trees
1⃣ Balanced: -
Self-balancing structure. -
● Automatically adjusts when data is added or removed.
● Ensures consistent search time regardless of tree size.
2⃣ Multi-Level Structure: -
● Root node at the top, internal nodes in between, leaf nodes at the bottom.
● Leaf nodes store actual data.
3⃣ Ordered: -
● Maintains sorted order of keys, making range queries efficient.
4⃣ High Fan-out: -

● Each node can have many child nodes.


● Reduces tree height, improving search and indexing speed.
5⃣ Cache-Friendly:
● Optimized for modern CPU caching mechanisms.
● Improves data retrieval performance.
6⃣ Disk-Oriented: -
● Used in database indexing and file systems.-
● Efficient for disk-based storage and retrieval.
✅ B+ Trees are widely used in databases, file systems, and indexing due to their efficiency in searching, sorting, and storing
data! 🚀
Why Use B+ Trees?
-
✅ Efficient Disk Access

● Minimizes I/O operations for faster data retrieval.


● Ideal for storage systems with slower data access.

✅ Balanced Structure -

● Ensures predictable performance for a variety of tasks.


● Self-balancing structure helps with efficient searches.

✅ Optimized for Range Queries -


● Leaf nodes are linked, making range-based queries faster.
● Used in database indexing and file systems.
B+ Tree vs. B Tree

E
-

-
~
-

-
-

-
-

-
-

~ -

~ -
Example of a B+Tree

Scenario:

A B+Tree of order 3 (each node can have up to 3 children).

Explanation:
● Root node contains keys [10, 20] .
● Leaf nodes [5, 7] , [15] , and [25, 30] store all the data.
● Leaf nodes are linked sequentially for range queries. -
Search Process:
To search for key 15:
1. Start at the root [10, 20] .
2. Compare 15 with 10 and 20.
3. Move to the middle child [15] .
Range Query Example:
To retrieve all keys between 10 and 25:
1. Start at the leaf node containing 10. -
2. Traverse the linked list of leaf nodes until reaching 25. -
Differences Between B-Trees and B+Trees
Practical Exercise

Task 1:
Draw a B-Tree of order 3 for the following keys:
[5, 10, 15, 20, 25, 30].
Task 2:
Draw a B+Tree of order 3 for the same set of keys. Highlight the differences
between the two structures.
Recap and Key Takeaways

What We Learned Today:


1. B-Trees are balanced, multi-way search trees used for efficient indexing.
2. B+Trees are optimized for range queries and sequential access by storing all data in leaf
nodes.
3. Both structures minimize disk I/O operations, making them ideal for databases and file
systems.
4. Understanding their structure helps us design efficient indexing mechanisms.
Next Steps:
● Explore how B+Trees are implemented in real-world databases like MySQL.
Topic 3.10 Extendible Hashing
What We’ll Cover in This Session:

1. Introduction to Hashing
● What is Hashing?
● Why Extendible Hashing?
2. Structure of Extendible Hashing
● Directory and Buckets
● Key Characteristics
3. How Extendible Hashing Works
● Splitting Buckets
● Expanding the Directory
4. Practical Examples
5. Recap and Key Takeaways
What is Hashing?

Definition:
● Hashing is a technique used to map keys to specific locations (buckets) in a hash table.
● It enables fast data retrieval by using a hash function to compute the location of a record.
Why Use Hashing?
● Provides constant-time search performance (O(1) on average).
● Efficient for equality-based queries (e.g., "Find record with key = X").
Limitations of Static Hashing:
● Fixed number of buckets leads to overflow when the dataset grows.
● Poor performance due to collisions in large datasets.
Solution:
● Extendible Hashing dynamically adjusts the hash table size to handle growing datasets.
What is Extendible Hashing?

Definition:
● Extendible Hashing is a dynamic hashing technique that allows the hash table to grow or shrink
as needed.
● It uses a directory structure to manage buckets and ensures efficient storage utilization.
Key Characteristics:
1. Directory: A table of pointers to buckets.
2. Buckets: Store actual records or pointers to records.
3. Dynamic Growth: Buckets split when they overflow, and the directory expands as needed.
Applications:
● Database indexing.
● File systems.

Structure of Extendible Hashing

Components:
1. Directory:
● An array of pointers to buckets.
● Each entry corresponds to a prefix of the hash value.
2. Buckets:
● Store records or pointers to records.
● Each bucket has a fixed capacity (e.g., can hold up to n records).
Example Diagram:

Key Concepts:
● Global Depth: Number of bits used to index the directory.
● Local Depth: Number of bits used to identify a bucket.
How Extendible Hashing Works

Step-by-Step Process:
1. Hash Function:
● Compute the hash value of a key.
● Use the first few bits (determined by the global depth) to locate the bucket.
2. Bucket Overflow:
● If a bucket exceeds its capacity, it splits into two buckets.
● The directory is updated to point to the new buckets.
3. Directory Expansion:
● If all buckets at a given depth are full, the directory doubles in size.
● The global depth increases by 1.
Example Scenario:
● Insert keys 5, 10, 15, 20 into an initially empty hash table.
● Show how buckets split and the directory expands as more keys are added.
Example of Extendible Hashing

Initial State:
● Global Depth = 1.
● Directory:
| Prefix 0 | -> Bucket A
| Prefix 1 | -> Bucket B

Insert Key 5:
● Hash value of 5: 01.
● Insert into Bucket B.
Insert Key 10: | Prefix 00 | -> Bucket A
● Hash value of 10: 10. | Prefix 01 | -> Bucket B1
● Insert into Bucket B.
| Prefix 10 | -> Bucket B2
Bucket B Overflows:
● Split Bucket B into two buckets (B1 and B2).
| Prefix 11 | -> Bucket B2
● Update the directory:
Advantages of Extendible Hashing

1. Dynamic Growth:
● Handles growing datasets without wasting space.
2. Efficient Search:
● Constant-time search performance (O(1) on average).
3. Collision Handling:
● Splits buckets to resolve collisions dynamically.
Disadvantages:
● Increased complexity due to directory management.
● Potential for directory expansion overhead.
Practical Exercise

Task 1:
Given the following keys: [5, 10, 15, 20, 25, 30] , simulate the process of inserting them into an
extendible hash table. Assume each bucket can hold up to 2 keys.
Task 2:
Draw the final state of the directory and buckets after all keys are inserted. Highlight any bucket splits
or directory expansions.
Recap and Key Takeaways 3.10

What We Learned in this session:


1. Extendible Hashing is a dynamic hashing technique that adjusts the hash table size as needed.
2. It uses a directory to manage buckets and ensures efficient storage utilization.
3. Buckets split when they overflow, and the directory expands to accommodate more buckets.
4. Extendible Hashing provides constant-time search performance for equality-based queries.
Next Steps:
● Explore other hashing techniques like Linear Hashing or compare Extendible Hashing with
B+Trees.
Topic 3.11: Indexing on Multiple Keys – Grid Files
What We’ll Cover in This Session 3.11:

1. Introduction to Multi-Key Indexing


● Why Index on Multiple Keys? -
2. What are Grid Files?
● Structure and Key Characteristics
3. How Grid Files Work -
● Partitioning Data into Cells -
● Handling Overflows
4. Practical Examples
5. Recap and Key Takeaways
What is Multi-Key Indexing?

Definition:
- #
● Multi-key indexing allows efficient querying on multiple attributes (keys) simultaneously. ~
~
● It is particularly useful for range queries or multi-dimensional data.
Why Use Multi-Key Indexing?
● Enables fast retrieval for queries involving multiple attributes (e.g., "Find employees with salary >
50,000 AND age < 30"). -
---
● Commonly used in spatial databases, GIS systems, and multi-dimensional datasets.
Challenges:
● Traditional single-key indexes (e.g., B-Trees) are inefficient for multi-dimensional data.
● Grid files provide a solution by partitioning data into cells based on multiple keys.
·
What are Grid Files?
-

Definition:
● A grid file is a data structure used for indexing multi-dimensional data.
● It divides the data space into a grid of cells, where each cell corresponds to a range of values for each key.
Key Characteristics:
1. Partitioning:
-
● Each dimension (key) is divided into intervals.

-
● The intersection of intervals forms a grid of cells.
2. Dynamic Growth:
● Cells can split dynamically to handle overflows.
3. Efficient Range Queries:

~
● Queries involving ranges on multiple keys can be resolved by examining relevant cells.

-
T
Structure of Grid Files

Components:
1. Grid Directory:
● Maps cells to buckets (storage locations). -
● Each cell corresponds to a specific range of values for each key. -
2. Buckets:
● Store actual records or pointers to records.
-
● Each bucket has a fixed capacity.
-
Example Diagram:

Key 1 (Salary): [0-50k, 50k-100k]


Key 2 (Age): [0-30, 30-60]

Grid Cells:
| Salary 0-50k, Age 0-30 | Salary 50k-100k, Age 0-30 |
| Salary 0-50k, Age 30-60 | Salary 50k-100k, Age 30-60 |
-
How Grid Files Work

Step-by-Step Process:
1. Partitioning the Data Space: -
● Divide each key’s range into intervals. ~
● Form a grid of cells by combining intervals from all keys.
2. Mapping Records to Cells:
● Each record is mapped to a cell based on its key values.
-
● For example, an employee with Salary = 60,000 and Age = 25 would map to the cell [50k-100k,
0-30]. -

D
3. Handling Overflows:
● If a cell’s bucket overflows, the cell splits into smaller sub-cells.
-
● The grid directory is updated to reflect the new structure.
Example Scenario: -
-
● Insert records into a grid file with two keys: Salary and Age.
● Show how cells split when buckets overflow.
-
it
T
Example of Grid Files

Scenario:
A dataset contains employees with attributes Salary and Age .
Initial Grid:
7

-
-
-
● Salary intervals: [0-50k, 50k-100k].
Age intervals: [0-30, 30-60].

:

Insert Records: -
-
1. Employee 1: Salary = 40,000, Age = 25 → Cell [0-50k, 0-30].
2. Employee 2: Salary = 60,000, Age = 28 → Cell [50k-100k, 0-30].
3. Employee 3: Salary = 45,000, Age = 27 → Cell [0-50k, 0-30].
Bucket Overflow:


Cell [0-50k, 0-30] overflows.
Split the cell into sub-cells: [0-25k, 0-30] and [25k-50k, 0-30].
W
-
Advantages of Grid Files

1.
~
Efficient Multi-Key Queries:
● Handles queries involving multiple keys efficiently. ~
2. Dynamic Growth:
● Cells split dynamically to accommodate growing datasets.
3. Range Queries: -

● Supports range queries on multiple dimensions.


Disadvantages:
-
● Increased complexity due to dynamic splitting.
X
● Potential for uneven distribution of data across cells.
Practical Exercise

Task 1:
Given the following records:
● Record 1: Salary = 30,000, Age = 22
● Record 2: Salary = 55,000, Age = 28
● Record 3: Salary = 40,000, Age = 25
● Record 4: Salary = 60,000, Age = 35
Simulate the process of inserting these records into a grid file. Assume initial intervals:
● Salary: [0-50k, 50k-100k].
● Age: [0-30, 30-60].
Task 2:
Draw the final state of the grid after all records are inserted. Highlight any cell splits.
Recap and Key Takeaways 3.11

What We Learned in this session:


1. Grid Files are used for indexing multi-dimensional data.
2. They partition the data space into a grid of cells based on key intervals.
3. Cells split dynamically to handle overflows, ensuring efficient storage
utilization.
4. Grid files support efficient range queries on multiple keys.
Next Steps:
● Explore other multi-dimensional indexing techniques like R-Trees or
KD-Trees.

Questions?
PREVIOUS YEAR
UNIVERSITY QUESTION PAPER
UNIVERSITY QUESTIONS 2021 July

~
UNIVERSITY QUESTIONS 2023 June
-


-

-
UNIVERSITY QUESTIONS 2023 June
-
-

- - -

-
QP 2024 December
-

ii. Suppose that the file is ordered by the key field Ssn and we want to construct a
primary index on Ssn. Calculate The number of first-level index entries and the
number of first-level index blocks

iii. Calculate the number of levels needed if we make it into a multilevel index.
~
b) What is a grid file? What are its advantages and disadvantages?

j
~

-
SOLUTIONS
Illustrate the structure of B-Tree and B+ Tree and explain how they are different?
B-Tree Structure:
-
A B-Tree is a self-balancing search tree that maintains sorted data and allows searches, sequential
access, insertions, and deletions in logarithmic time. It is commonly used in databases and file systems.

Characteristics of B-Tree:

● Each node can have multiple children. -


● A node with n keys has n+1 children.
● Keys in a node are sorted. -
2

● Internal nodes contain keys and data. - -


● The tree remains balanced after insertion and deletion.
A B-tree of order p = 3.The values were inserted in the
order 8, 5, 1, 7, 3, 12, 9, 6.
(a) A node in a B-tree with q – 1 search values
SOLUTIONS
B+ Tree Structure:

A B+ Tree is an extension of B-Tree where all values (data) are stored in leaf nodes, and
internal nodes only store keys.

Characteristics of B+ Tree:

● Internal nodes do not store data, only keys.


● All data records are at the leaf level.
● Leaves are linked to form a linked list (allowing fast sequential access).
SOLUTIONS

-
What are the different types of single-level ordered indices? -

Explain.
A single-level ordered index is an auxiliary data structure that helps in faster retrieval of records by keeping sorted

deput
references to data entries.

~
Types of Single-Level Ordered Indices:
if
1. Primary Index:

-
-
○ Used for a sorted file where records are arranged sequentially based on a key field.
○ One index entry per block of data. - M
○ Example: A database table where students are sorted by roll number. -

P
2. Clustering Index: ~
○ Used when records are clustered on a non-key field. ~
○ Multiple records can have the same index key.
○ Example: Employees grouped by department.

E
-
-
Secondary Index:
- -
● Used when the data is not stored in a sorted manner.
● Can be created on non-primary key attributes. -
● Example: Index on a "Salary" column in an employee table.
What are the different types of single-level ordered indices?
Explain.
A single-level ordered index is an auxiliary data structure that helps in faster retrieval of records by keeping sorted references to data
entries.

Types of Single-Level Ordered Indices:

1. Primary Index:
○ Used for a sorted file where records are arranged sequentially based on a key field.
○ One index entry per block of data.
○ Example: A database table where students are sorted by roll number.
2. Clustering Index:
○ Used when records are clustered on a non-key field.
○ Multiple records can have the same index key.
○ Example: Employees grouped by department.
What are the different types of single-level ordered indices?
Explain.


Differentiate between static hashing and dynamic hashing.
Hashing is a technique to map keys to a fixed-size table for fast retrieval.

Static Hashing:





The size of the hash table is fixed. -
The hash function does not change over time. -
Collisions are handled using chaining or open addressing.
Example: Hash tables used in memory-based applications.
-
-
200
Dynamic Hashing:
-
-
● The size of the hash table grows or shrinks dynamically.
~
● Uses extendible hashing or linear hashing.
● Reduces collisions by allowing buckets to expand. -
● Example: Databases and file systems.
Differentiate between static hashing and dynamic hashing.

&

&

&

& &
Write short notes on Nested Queries. ↑

A nested query (or subquery) is a query inside another SQL query. It is used to retrieve data
dynamically based on results from another query.

Example of Nested Query: Here, the inner query calculates the average salary, and the outer
query selects employees earning above this average.

~ -

SELECT name FROM Employees


WHERE salary > (SELECT AVG(salary) FROM Employees);
-
~
Write short notes on Nested Queries(Continued)
Types of Nested Queries:

1. Scalar Subqueries: Returns a single value. &

&
~ ↑

SELECT name FROM Employees WHERE department_id = (SELECT


department_id FROM Departments WHERE name = 'HR'); -

---
2.Multi-Row Subqueries: Returns multiple values using IN, ANY, or ALL #
SELECT name FROM Employees WHERE salary IN (SELECT salary
FROM Employees WHERE department_id = 2);
Write short notes on Nested Queries(Continued)
3.Correlated Subqueries: The inner query depends on the outer query

SELECT name FROM Employees e WHERE salary > (SELECT AVG(salary) FROM
-
Employees WHERE department_id = e.department_id);

Nested queries enhance query flexibility but can be performance-heavy. Optimizations like indexing and
JOIN usage can improve efficiency.
Solution 16 a [2023 June]
To calculate the record size RRR in bytes, we need to sum up the sizes of all the fields in a single record.

Given Data:

Each employee record consists of the following fields:


-
1. Name = 30 bytes
-
2. SSN (Social Security Number) = 9 bytes
3. Department Code = 9 bytes -
4. Address = 40 bytes -
5. Phone = 10 bytes
-
6. Birth Date = 8 bytes -
7. Sex = 1 byte
8. Job Code = 4 bytes -
9. Salary = 4 bytes
10. Deletion Marker = 1 byte (used for deletion marking)\ -
Solution 16 a [2023 June] (Continue)

--

~
Solution 16 a [2023 June] (Continue)
ii. Suppose that the file is ordered by the key field Ssn and we want to construct a

O
-
-

primary index on Ssn. Calculate The number of first-level index entries and the
-
-
number of first-level index blocks .
-

(ii) Constructing a Primary Index on SSN:


- -
A primary index is built on the key field (SSN in this case), meaning there is one index
entry for each record. -

-
-
~

~
-
~ -

-
~
- ~
-
X
&

↑ - -
-

- T ~
(iii) Calculating the Number of Levels for a Multilevel Index
To construct a multilevel index, we treat the first-level index as a file and build an
index on it. - -

-
-

-
-

~
-

-

Thus, we need three levels (first-level,


- second-level, and third-level) to index the
records efficiently.

-
~
(b) Grid File: Definition, Advantages, and Disadvantages
What is a Grid File?
~
A grid file is a type of multi-attribute index structure used in databases to efficiently
support range queries and multidimensional data retrieval. It organizes data into a grid
based on multiple attributes.

It consists of:

● Grid Directory: A table-like structure dividing data into grid cells based on attribute
values. ↑
● Linear Scales: One for each attribute, mapping data values into grid cell coordinates.
● Data Buckets: Store actual records. -

-
(b) Grid File: Definition, Advantages, and Disadvantages
Advantages of Grid Files -
~
1. Efficient Multidimensional Indexing – Handles queries on multiple attributes efficiently.
2. Faster Search – Reduces search space using a grid structure.
3. Dynamic Growth – Adapts well when data size increases.

4. Direct Access – Uses direct pointers to locate records without full scans.
-

Disadvantages of Grid Files ~


-
1. High Storage Overhead – The grid directory consumes additional storage.
2. Performance Degrades with High Dimensions – Works well for 2D or 3D data, but struggles with higher
-
dimensions (curse of dimensionality).
3. ~ Splitting Complexity – When a data bucket overflows, splitting and restructuring can be complex.
4. Less Efficient for Skewed Data – If data distribution is uneven, some grid cells may be overloaded,
-

affecting performance. ~
Final Answers -
(ii) First-level index blocks = 938 blocks
(iii) Number of levels for multilevel index = 3 levels
(b) Grid File: A multidimensional index with fast search but high overhead for
large dimensions.


-

-
-
-
1. Retrieve the names of employees along with their department
names
To get employee names and their department names, we need to join the

8
Employees and Departments tables using the department_id field.
- -
SELECT e.employee_name, d.department_name
FROM Employees e - - -
JOIN Departments d ON e.department_id = d.department_id;
- --

Explanation:
-
● We select employee_name from the Employees table.
● We join the Departments table using department_id, which is the foreign key in
Employees.
● This query fetches the employee's name along with their respective department name.
-
2. Find the total salary expenditure per department

To calculate the total salary expenditure per department, we need to group employees
based on department_id and sum their salaries. -

-
-
SELECT d.department_name, SUM(e.salary) AS total_salary_expenditure
FROM Employees e
-

JOIN Departments d ON e.department_id = d.department_id


GROUP BY d.department_name;
-

● The SUM(e.salary) function calculates the total salary expenditure for each
department.
● We use GROUP BY d.department_name to group employees by department.
● The result shows each department's name along with the total salary expenditure.
3. List employees who are currently assigned to a project ~
To find employees currently assigned to a project, we need to use the Assignments table. The
employees should have a record in this table.
-

SELECT DISTINCT e.employee_name


FROM Employees e
JOIN Assignments a ON e.employee_id = a.employee_id;

The Assignments table contains records of employee-project assignments.

We join Employees with Assignments on employee_id.

The DISTINCT keyword ensures that each employee appears only once, even if assigned to
multiple projects.
4. Find the average salary of employees in projects that started after 22/11/2022
~
To calculate the average salary of employees in projects that started after 22/11/2022, we
need to:

1. Identify projects that started after 22/11/2022 in the Projects table.


2. Find employees assigned to those projects using Assignments.
3. Compute the average salary of those employees.
~
SELECT AVG(e.salary) AS average_salary
FROM Employees e -

JOIN Assignments a ON e.employee_id = a.employee_id


-

- JOIN Projects p ON a.project_id = p.project_id


-

WHERE p.start_date > '2022-11-22';


-
4. Find the average salary of employees in projects that started after 22/11/2022
(Continued)
Explanation:

● The WHERE p.start_date > '2022-11-22' condition filters projects that started after 22nd
November 2022.
● We join Employees, Assignments, and Projects tables to find employees assigned to these projects.
● The AVG(e.salary) function calculates the average salary of these employees.
Summary
-

-
(a) Calculating the Blocking Factor
Given Data:

● Block size (B) = 256 bytes


● Record size (R) = 40 bytes

-
(b) Differences
5. Illustrate the Concept of Trigger in SQL with an Example(UQ)
-
What is a Trigger?

A trigger in SQL is a special kind of stored procedure that automatically executes when a specific event occurs
in a table. Triggers are mainly used for:

● Maintaining data integrity


● Automating tasks like logging or notifications
● Enforcing business rules

Example of an SQL Trigger

Let’s consider a Employees table where we want to log any salary updates in a separate Salary_Audit table.

Step 1: Create the Employees Table


Step 2: Create the Salary_Audit Table
sql
CopyEdit
CREATE TABLE Salary_Audit ( -
audit_id INT PRIMARY KEY AUTO_INCREMENT,
employee_id INT,
old_salary DECIMAL(10,2),
new_salary DECIMAL(10,2),
change_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (employee_id) REFERENCES Employees(employee_id)
);
Step 3: Create a Trigger to Log Salary Changes
sql
CopyEdit
DELIMITER //

CREATE TRIGGER after_salary_update


AFTER UPDATE ON Employees
FOR EACH ROW
BEGIN
IF OLD.salary <> NEW.salary THEN
INSERT INTO Salary_Audit (employee_id, old_salary, new_salary)
VALUES (OLD.employee_id, OLD.salary, NEW.salary);
END IF;
END;

//

DELIMITER ;
Explanation:

● This trigger executes after an UPDATE on the Employees table.


● It checks if the salary has changed (OLD.salary <> NEW.salary).
● If the salary is updated, it inserts a log record into the Salary_Audit table.
Step 4: Test the Trigger
-- Insert an employee
INSERT INTO Employees (employee_id, employee_name, salary)
VALUES (1, 'John Doe', 50000);

-- Update salary (Trigger will activate)


UPDATE Employees SET salary = 55000 WHERE employee_id = 1;

-- Check the Salary_Audit table


SELECT * FROM Salary_Audit;
6. Compare DDL and DML with an Example -
What is DDL (Data Definition Language)?

DDL commands are used to define and modify database schema and structure. Common DDL commands:

● CREATE → Creates tables, views, databases


● ALTER → Modifies the structure of a table
● DROP → Deletes a table or database
● TRUNCATE → Removes all records from a table
6. Compare DDL and DML with an Example
Example of DDL
-- Creating a table (DDL)

CREATE TABLE Students (

student_id INT PRIMARY KEY,

name VARCHAR(100),

age INT

);

-- Altering the table to add a new column

ALTER TABLE Students ADD COLUMN grade VARCHAR(5);


What is DML (Data Manipulation Language)?

DML commands deal with data within tables. Common DML commands:

● INSERT → Adds new records


● UPDATE → Modifies existing records
● DELETE → Removes records
Example of DML
-- Inserting a record (DML)

INSERT INTO Students (student_id, name, age, grade)

VALUES (1, 'Alice', 20, 'A');

-- Updating a record (DML)

UPDATE Students SET grade = 'A+' WHERE student_id = 1;

-- Deleting a record (DML)

DELETE FROM Students WHERE student_id = 1;


-

~
5. Difference Between WHERE and HAVING Clause with
Example - -
-
WHERE Clause

● Used to filter rows before grouping operations (GROUP BY).


● Works on individual rows of a table.
● Cannot be used with aggregate functions (SUM(), COUNT(), etc.).

HAVING Clause

● Used to filter groups after the GROUP BY operation.


● Works on grouped data.
● Can be used with aggregate functions (SUM(), AVG(), etc.).
Using WHERE Clause:

Retrieve employees in the IT department.

SELECT * FROM Employees

WHERE department = 'IT';


Using HAVING Clause:

Find departments where the total salary exceeds 100000.

SELECT department, SUM(salary) AS total_salary

FROM Employees

GROUP BY department

HAVING SUM(salary) > 100000;


6. Difference Between Hash Indexes and B+ Tree Indexes
1. Hash Indexes -

● Uses hash functions to map keys to specific locations.


● Provides fast lookup for exact matches (= operator).
● Not efficient for range queries (BETWEEN, <, >).
● Data is not stored in sorted order.
● Used in hash-based indexing systems (e.g., NoSQL databases).

Example of Hash Index Use Case


CREATE INDEX emp_hash_index ON Employees(employee_id) USING HASH;

● If we search WHERE employee_id = 3, the hash function quickly locates the record.
2. B+ Tree Indexes

● Uses a balanced tree structure where data is stored in sorted order.


● Supports range queries efficiently (BETWEEN, <, >).
● Used in relational databases for indexing.
● Internal nodes store keys while leaf nodes store actual records.
● Provides better performance for ordered scans.

Example of B+ Tree Index Use Case


CREATE INDEX emp_btree_index ON Employees(salary);

● If we search WHERE salary BETWEEN 50000 AND 70000, the B+ Tree efficiently
retrieves the range.
5. Role of Triggers in SQL Databases
What is a Trigger?

A trigger in SQL is a special type of stored procedure that is automatically executed when a specific event occurs in a database. These
events include INSERT, UPDATE, DELETE operations on a table.

Role of Triggers in SQL Databases

1. Enforcing Business Rules


○ Example: Ensuring that employee salaries do not go below a minimum value.
2. Maintaining Data Integrity
○ Example: Automatically updating the last_modified_date column when a record is updated.
3. Automating Auditing & Logging
○ Example: Storing changes in a log table for tracking.
4. Synchronizing Tables
○ Example: If data is modified in a main table, triggers ensure changes reflect in related tables.
5. Restricting Invalid Transactions
○ Example: Preventing deletion of a customer if they have pending orders.
5. Role of Triggers in SQL Databases
Example of a Trigger

This trigger ensures that an employee’s salary cannot be set below 30,000. What Happens?

DELIMITER //CREATE TRIGGER prevent_low_salary ● If someone tries INSERT INTO


BEFORE INSERT ON Employees
Employees (name, salary)
VALUES ('John', 25000);, the
FOR EACH ROW trigger prevents it.
BEGIN

IF NEW.salary < 30000 THEN

SIGNAL SQLSTATE '45000'

SET MESSAGE_TEXT = 'Salary must be at least 30,000';

END IF;

END;

DELIMITER ;
6. Difference Between Correlated and Non-Correlated Nested
Queries
1. Correlated Nested Queries

● The inner query depends on the outer query for each row processed.
● The inner query executes once per outer row, making it less efficient.

Example of a Correlated Nested Query


The inner query runs for each employee,
Find employees who earn more than the average salary of their department. calculating the department’s average salary.
It depends on the outer query
SELECT employee_name, salary, department_id (e1.department_id =
e2.department_id).
FROM Employees e1

WHERE salary > (

SELECT AVG(salary)

FROM Employees e2

WHERE e1.department_id = e2.department_id

);
2. Non-Correlated Nested Queries

● The inner query runs independently of the outer query.


● The inner query executes only once and returns a result used by the outer query.

Example of a Non-Correlated Nested Query

Find employees who earn more than the company’s average salary.

SELECT employee_name, salary


The inner query calculates the average
FROM Employees salary once.
The outer query uses this value to filter
WHERE salary > ( employees.

SELECT AVG(salary) FROM Employees

);
Thank You

You might also like