0% found this document useful (0 votes)
2K views

Dsa Mock Insem Question Bank

The document explains file concepts in C++, including file opening modes and inverted files, which map content to locations for efficient searching. It also covers external sorting methods for large data sets, factors affecting file organization, and various indexing techniques. Additionally, it details B+ trees, their structure, and differences from B trees, along with step-by-step construction examples for B+ trees of different orders.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views

Dsa Mock Insem Question Bank

The document explains file concepts in C++, including file opening modes and inverted files, which map content to locations for efficient searching. It also covers external sorting methods for large data sets, factors affecting file organization, and various indexing techniques. Additionally, it details B+ trees, their structure, and differences from B trees, along with step-by-step construction examples for B+ trees of different orders.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Q1 What is file? List different file opening modes in C++.

Explain concept
of inverted files.

What is a File?
In computing, a file is a collection of related data stored on a non-volatile
storage device like a hard drive or SSD. In C++, files are used for permanent
storage of data — unlike variables which lose their values when a program
ends.
In C++, files are handled using file streams, provided by the <fstream>
header.

File Opening Modes in C++


In C++, you open files using objects like ifstream, ofstream, or fstream,
along with file modes to specify how the file should be accessed.

Common File Modes:

Mode Description

ios::in Open file for reading

ios::out Open file for writing (overwrites existing content)

ios::app Append to the end of the file

ios::ate Move to the end of file after opening

ios::trunc Truncate file (delete contents if it exists)

ios::binary Open file in binary mode

🛠 Example:
cpp
CopyEdit
fstream file;
file.open("data.txt", ios::in | ios::out);

Inverted Files

What is an Inverted File?


An inverted file (or inverted index) is a data structure used to map
content (like words) to their locations in a file or database.
It is widely used in search engines, databases, and information retrieval
systems.

Key Concept:
Instead of storing documents with their contents, it stores a dictionary of
words and for each word, a list of documents (or positions) where it
appears.

Example:
Documents:
vbnet
CopyEdit
Doc1: "apple banana mango"
Doc2: "banana apple"
Doc3: "mango fruit"
Inverted File:

Word Document(s)

apple Doc1, Doc2

banana Doc1, Doc2

mango Doc1, Doc3

fruit Doc3
Q2 Define external sort with eaxmple.

External sorting is a class of algorithms used to sort data that is too large
to fit into main memory (RAM) and must reside in external storage (like a
hard drive).
It is typically used when dealing with huge files, like in database systems or
big data processing.

Why Not Use Normal Sorting?


Traditional algorithms like QuickSort or MergeSort work well in-memory. But
when the dataset is larger than RAM, we need to sort it in chunks, using disk
I/O efficiently, which is where external sorting comes in.

Example: External Merge Sort


Let’s say we need to sort 1 GB of data, but we only have 100 MB of RAM.
Step-by-Step:
1. Divide into Chunks:
Break the 1 GB file into 10 chunks of 100 MB each.
2. Sort Each Chunk:
Load each chunk into RAM one at a time, sort it using MergeSort or
QuickSort, and save the sorted chunks back to disk as temporary
files.
3. Merge Sorted Chunks:
Use a k-way merge algorithm to merge the 10 sorted chunks into a
single sorted file.
o You read a small buffer from each chunk into memory
o Repeatedly pick the smallest element from the buffers and write it
to the output file
Example (Simple Version)
Original File (on disk):
F, A, Z, B, D, M, E, C
Step 1: Split into 2 chunks (say RAM can hold 4 items)
• Chunk 1: F, A, Z, B → sort → A, B, F, Z
• Chunk 2: D, M, E, C → sort → C, D, E, M
Write both to disk as temp files.
Step 2: Merge Chunks
Merge A, B, F, Z and C, D, E, M →
Final sorted file: A, B, C, D, E, F, M, Z

Applications of External Sort:


• Database management systems
• Sorting logs, big text files
• Data warehousing
• External memory algorithms in Big Data
Q3 Write short notes on: i) Factors affecting the file organization ii) Indexed
sequential files iii) Indexing technique
i) Factors Affecting the File Organization
File organization refers to the way data is stored and accessed in a file. The
choice depends on several factors:
1. Access Method:
o Sequential or random access?
o Fast read/write needed?
2. File Size:
o Larger files may need indexing or hashing for performance.
3. Update Frequency:
o How often is data added, deleted, or modified?
4. Redundancy and Duplication:
o Some organizations reduce duplication better than others.
5. Search Efficiency:
o Is fast searching a requirement?
6. Data Volatility:
o How often the structure of data changes?
7. Storage Medium:
o Hard drive, SSD, or cloud? Each affects access speed.

ii) Indexed Sequential Files


This is a hybrid file organization method that combines:
• Sequential access (like in a sorted file)
• Indexed access (like in a database)
Structure:
• A main data file sorted by key
• An index file with key-pointer pairs
Example:
If you're storing student records sorted by roll number, the index might look
like:
yaml
CopyEdit
Index:
1001 → block 1
1050 → block 2
1100 → block 3

Main File:
[1001, 1002, ..., 1049], [1050, ..., 1099], ...
Advantages:
• Faster than pure sequential search
• Easier to update than fully indexed structures

iii) Indexing Technique


Indexing is a technique used to improve searching speed in files or
databases by maintaining a separate index structure.
Types of Indexing:
1. Primary Index
o Built on a unique field (like student ID)
2. Secondary Index
o Built on non-unique fields (like city)
3. Dense Index
o One index entry per record
4. Sparse Index
o One index entry per block/page
5. Multilevel Index
o Index on top of another index (like a hierarchy)
Example:
mathematica
CopyEdit
Index:
"apple" → Line 10
"banana" → Line 50
"grape" → Line 90
You can quickly jump to data without scanning the whole file.
Q4 Define sequential file organization. Give it’s advantages and
disadvantages.

Sequential File Organization


In Sequential File Organization, records are stored one after another in a
specific order, usually based on a key field (like Roll Number, ID, etc.).
• New records are always added at the end of the file.
• To access a specific record, the system must read from the beginning
and search in sequence.

Example:
Let's say you are storing student records sorted by Roll Number:
CopyEdit
101 John
102 Alice
103 Bob
104 Zara
If you want to find Roll No. 103, the system reads:
• 101 → 102 → 103 (stop here)
Advantages of Sequential File Organization

Advantage Description

Simple Design Easy to understand and implement

Efficient for Full Read Fast when accessing all records in order

Minimal Storage
No indexing or extra data structures needed
Overhead

Good for Batch Ideal for tasks like payroll, billing, report
Processing generation

Disadvantages of Sequential File Organization

Disadvantage Description

Slow Random
Must search from start → slow for large files
Access

Difficult to Update Inserting in order requires rewriting the file

No Flexibility Cannot handle dynamic data well

Deletion is Requires shifting or marking deleted records


Inefficient manually

Use Cases:
• Payroll systems
• Monthly billing systems
• Bank transaction logs
• Historical data processing
Q5 What is B+ tree? Give structure of it’s internal note. What is the difference
between B and B+ tree.

What is a B+ Tree?
A B+ Tree is a balanced search tree used for storing large amounts of
sorted data and allows efficient searching, insertion, and deletion.
It is widely used in:
• Databases
• File systems
• Indexing structures

Key Properties of B+ Tree:


• All values (data records) are stored only at the leaf level.
• Internal nodes store keys only, used for navigation.
• Leaves are linked together for fast range queries.
• Tree remains balanced (all leaves are at the same level).

Structure of an Internal Node (for order m):


An internal node of a B+ Tree contains:
• Up to m – 1 keys
• Up to m pointers to child nodes
Example (Order 4 B+ Tree internal node):
yaml
CopyEdit
| K1 | K2 | K3 |
/ | | \
P1 P2 P3 P4
Where:
• K1 < K2 < K3
• P1 → subtree with keys < K1
• P2 → subtree with keys between K1 and K2
• etc.

Difference Between B Tree and B+ Tree

Feature B Tree B+ Tree

Stored in internal + leaf


Data Storage Stored only in leaf nodes
nodes

Search
Slower range search Faster due to linked leaves
Efficiency

Leaves are linked for


Leaf Link Leaves are not linked
traversal

Must go to leaf for actual


Traversal Can stop at internal node
data

Mostly for in-memory Databases and file


Used In
structures systems

Example:
Suppose we insert keys: 10, 20, 30, 40, 50
B Tree (Order 3):
css
CopyEdit
[30]
/ \
[10,20] [40,50]
B+ Tree (Order 3):
css
CopyEdit
[30]
/ \
[10,20] [30,40,50]

• All data is in leaf nodes
• Leaves are linked
Q6 Build B+ tree of order 3 for the following data: F, S, Q, K, C, L, H, T, V, W, M,
R

Given Data:
F, S, Q, K, C, L, H, T, V, W, M, R

B+ Tree Order 3: What it means?


• Maximum 2 keys per internal node (order − 1)
• Maximum 3 children per internal node
• Leaf nodes can hold 2 keys
• All data is stored in the leaf nodes
• Leaf nodes are linked
We'll insert in alphabetical order for clarity:
Sorted Input:
C, F, H, K, L, M, Q, R, S, T, V, W
Step-by-Step Insertion

Step 1: Insert C, F
makefile
CopyEdit
Leaf: [C, F]

Step 2: Add H → [C, F, H] → Exceeds leaf capacity → Split


• [C] and [F, H]
• Promote middle key F to parent
css
CopyEdit
[F]
/ \
[C] [F,H]

Step 3: Add K → into right leaf → [F, H, K] → Split


• Split [F, H, K] → [F] and [H, K]
• Promote middle H to root (already has F)
Now root has 2 keys: [F, H]
less
CopyEdit
[F, H]
/ | \
[C] [F] [H, K]
Step 4: Insert L → into [H, K] → becomes [H, K, L] → Split
• [H] and [K, L] → promote K
Now root has: [F, H, K] → Exceeds capacity → Split root
Promote H, new root becomes [H]
css
CopyEdit
[H]
/ \
[F] [K]
/ \ / \
[C] [F] [H] [K, L]

Step 5: Insert M → goes to [K, L] → becomes [K, L, M] → Split


• [K], [L, M] → promote L to [K]
Update internal node [K] to [K, L]
less
CopyEdit
[H]
/ \
[F] [K, L]
/ \ / | \
[C] [F] [H] [K] [L, M]

Continue Inserting: Q, R, S
• Q goes to [L, M] → [L, M, Q] → Split → [L], [M, Q] → promote M
• Insert R → goes to [M, Q] → becomes [M, Q, R] → Split → [M], [Q, R] →
promote Q
• Insert S → goes to [Q, R] → becomes [Q, R, S] → Split → [Q], [R, S] →
promote R
Update internal nodes accordingly.

Final Inserts: T, V, W
• Continue inserting with same logic into rightmost leaf
• Manage splits and promotion as needed

Final B+ Tree (Structure Only)


less
CopyEdit
[H, M, R]
/ | | \
[F] [K, L] [Q] [T]
/\ /|\ | |\
[C] [F] [H][K][L] [M][Q][R,S,T,V,W]
(Note: Leaf nodes are linked left to right)

Final Leaf Nodes (in order, linked):


css
CopyEdit
[C] ↔ [F] ↔ [H] ↔ [K] ↔ [L] ↔ [M] ↔ [Q] ↔ [R, S] ↔ [T, V] ↔ [W]
Q6 Construct the B+ Tree of order 4 for the following data: 1, 4, 7, 10, 17, 21,
31, 25, 19, 20, 28, 42.
Let’s construct a B+ Tree of order 4 using the following keys:

Input:
1, 4, 7, 10, 17, 21, 31, 25, 19, 20, 28, 42

B+ Tree of Order 4:
• Max 3 keys in an internal node
• Max 4 children per internal node
• Max 3 keys per leaf node
• All actual data is stored in leaves
• Leaf nodes are linked

Step-by-Step Construction
We'll insert the keys in the given order and split as needed.

Step 1: Insert 1, 4, 7
Leaf node: [1, 4, 7] — no split needed

Step 2: Insert 10
Leaf: [1, 4, 7, 10] → 4 keys → Split
• Split into: [1, 4] and [7, 10]
• Promote 7 to parent
css
CopyEdit
[7]
/ \
[1, 4] [7, 10]

Step 3: Insert 17
Goes to [7, 10] → becomes [7, 10, 17] → OK

Step 4: Insert 21
Leaf: [7, 10, 17, 21] → split into [7, 10] and [17, 21]
Promote 17
Update root to: [7, 17]
less
CopyEdit
[7, 17]
/ | \
[1, 4] [7,10] [17, 21]

Step 5: Insert 31
Goes to [17, 21] → becomes [17, 21, 31] → OK

Step 6: Insert 25
Leaf [17, 21, 31] → becomes [17, 21, 25, 31] → split
→ [17, 21], [25, 31] → promote 25 to parent
Now parent [7, 17] becomes [7, 17, 25]
less
CopyEdit
[7, 17, 25]
/ | | \
[1,4] [7,10] [17,21] [25,31]

Step 7: Insert 19
Goes to [17, 21] → becomes [17, 19, 21] — OK

Step 8: Insert 20
Leaf [17, 19, 21] → becomes [17, 19, 20, 21] → split
→ [17, 19], [20, 21] → promote 20
Now parent [7, 17, 25] is full → needs split
Split [7, 17, 25] → [7], [20] → promote 17 to new root
css
CopyEdit
[17]
/ \
[7] [20]
/ \ / \
[1,4][7,10] [17,19][20,21][25,31]

Step 9: Insert 28
Goes to [25, 31] → becomes [25, 28, 31] → OK

Step 10: Insert 42


Leaf: [25, 28, 31] → becomes [25, 28, 31, 42] → split
→ [25, 28], [31, 42] → promote 31
Now [20] becomes [20, 31]

Final B+ Tree Structure:


less
CopyEdit
[17]
/ \
[7] [20, 31]
/ \ / | \
[1,4][7,10] [17,19][20,21][25,28][31,42]
• All data is stored in the leaf nodes
• Leaf nodes are linked like:
css
CopyEdit
[1,4] ↔ [7,10] ↔ [17,19] ↔ [20,21] ↔ [25,28] ↔ [31,42]

You might also like