DBMS NOTES
DBMS NOTES
● What is a Database? A database is an organized collection of data, typically stored and accessed
electronically from a computer system. Think of it like a super-organized digital filing cabinet where data is
stored in tables, and you can easily retrieve or update it.
○ Example: A university database might store student info (name, ID, grades), course details, and
faculty records.
● Database Users:
○ End Users: People who interact with the database via applications (e.g., students checking grades
online).
○ Database Administrators (DBAs): Manage the database, ensuring security, backups, and
performance.
○ Application Developers: Write programs that interact with the database.
○ Database Designers: Create the structure (schema) of the database.
● Key Point for Exam: Know the roles of different users and how they interact with the database system.
● Data Integrity: Ensures data is accurate and consistent (e.g., no duplicate student IDs).
● Data Independence: Changes to the database structure (e.g., adding a new table) don’t affect the
applications using it.
● Concurrency Control: Allows multiple users to access the database simultaneously without conflicts.
● Data Security: Restricts unauthorized access (e.g., only faculty can update grades).
● Query Processing: Efficiently retrieves data using languages like SQL.
● Backup and Recovery: Protects data from failures (e.g., restoring data after a crash).
● Exam Tip: Be ready to list 4–5 characteristics with a brief explanation or example for each.
● DBMS Architecture:
○ Three-Schema Architecture:
1. External Schema (View Level): How users see the data (e.g., a student sees only their
grades).
2. Conceptual Schema (Logical Level): The overall structure of the database (e.g., tables for
students, courses).
3. Internal Schema (Physical Level): How data is stored on disk (e.g., file formats, indexes).
○ Purpose: Separates user views from physical storage for flexibility and security.
● Data Independence:
○ Logical Data Independence: Changes to the conceptual schema (e.g., adding a new table) don’t
affect external views.
○ Physical Data Independence: Changes to storage (e.g., switching to a new hard drive) don’t affect
the conceptual schema.
● Exam Tip: You might get a question asking you to explain the three-schema architecture with a diagram or
example. Practice sketching it!
4. Data Models, Schemas & Instances
● We covered this under Concepts and Architecture, but it’s worth noting:
○ DBMS architecture ensures modularity and abstraction.
○ Data independence is critical for maintaining flexibility in large systems.
○ Example: If a university adds a new column to the "Students" table (e.g., "Email"), the application
students use to check grades doesn’t break (logical independence).
● Database Languages:
○ DDL (Data Definition Language): Defines the database structure (e.g., CREATE TABLE Students).
○ DML (Data Manipulation Language): Manipulates data (e.g., INSERT INTO Students VALUES (101,
'John', 3.5)).
○ DCL (Data Control Language): Defines access permissions (e.g., GRANT SELECT ON Students TO
faculty).
○ TCL (Transaction Control Language): Manages transactions (e.g., COMMIT, ROLLBACK).
● Interfaces:
○ Query Interfaces: Tools like SQL command-line or GUI (e.g., MySQL Workbench).
○ Application Interfaces: APIs like JDBC/ODBC for developers.
○ User Interfaces: Web or mobile apps for end users.
● Exam Tip: Know the difference between DDL, DML, DCL, and TCL with examples of commands.
● ER Model: A visual way to design databases using entities, attributes, and relationships.
○ Entities: Objects like Student, Course.
○ Attributes: Properties like Student ID, Course Name.
○ Relationships: Connections like "Student enrolls in Course."
○ Example ER Diagram:
■ Entity: Student (Attributes: ID, Name, GPA).
■ Entity: Course (Attributes: Code, Title).
■ Relationship: Enrolls (Student takes Course).
● Key Symbols:
○ Rectangle: Entity.
○ Oval: Attribute.
○ Diamond: Relationship.
● Exam Tip: Practice drawing an ER diagram for a simple scenario (e.g., library system with Books, Borrowers,
and Loans).
8. Enhanced ER Concepts
● Specialization/Generalization:
○ Generalization: Combining similar entities into a general entity (e.g., Student and Faculty generalized
into Person).
○ Specialization: Breaking an entity into specific subtypes (e.g., Person specialized into Undergraduate
and Graduate).
○ Represented by an ISA relationship in ER diagrams.
● Aggregation:
○ Treating a relationship as an entity to simplify complex relationships.
○ Example: If Students work on Projects under Advisors, the "Works On" relationship can be
aggregated into an entity to capture additional attributes.
● Mapping ER Model to Relational Model:
○ Entities become tables (e.g., Student entity → Students table).
○ Attributes become columns.
○ Relationships:
■ One-to-One: Merge tables or use a foreign key.
■ One-to-Many: Add a foreign key in the "many" side (e.g., Student table has a CourseID
column).
■ Many-to-Many: Create a junction table (e.g., Student_Course table with StudentID and
CourseID).
● Exam Tip: Be able to convert a simple ER diagram into relational tables. Practice with 1:1, 1:N, and N:N
relationships.
● SQL Basics:
○ DDL (Data Definition Language):
■ CREATE TABLE Students (ID INT, Name VARCHAR(50), GPA FLOAT);
■ ALTER TABLE Students ADD Email VARCHAR(100);
■ DROP TABLE Students;
○ DML (Data Manipulation Language):
■ INSERT INTO Students VALUES (101, 'John', 3.5);
■ UPDATE Students SET GPA = 3.7 WHERE ID = 101;
■ DELETE FROM Students WHERE ID = 101;
■ SELECT * FROM Students WHERE GPA > 3.0;
○ DCL (Data Control Language):
■ GRANT SELECT ON Students TO faculty;
■ REVOKE INSERT ON Students FROM public;
● Views:
○ Virtual tables based on a query.
○ Example: CREATE VIEW HighGPA AS SELECT Name, GPA FROM Students WHERE GPA > 3.5;
○ Use: Simplifies queries, enhances security.
● Indexes:
○ Improve query performance by creating a lookup structure.
○ Example: CREATE INDEX idx_student_id ON Students(ID);
○ Trade-off: Faster searches, but slower inserts/updates.
● Exam Tip: Memorize at least one example for each SQL command type (CREATE, INSERT, GRANT, etc.).
1. Practice ER Diagrams: Draw ER diagrams for scenarios like a hospital, library, or e-commerce system.
Convert them to relational tables.
2. SQL Queries: Write SQL commands for DDL, DML, DCL, and constraints. Use a tool like MySQL or SQLite
to test them.
3. Flashcards: Create flashcards for terms like schema, instance, data independence, and SQL commands.
4. Past Papers: If you have access to previous exams, practice those questions to get a feel for the format.
5. Focus on Weak Areas: If ER modeling or SQL constraints feel tricky, let me know, and we can dive deeper
with examples or problems.
Question: Design an ER diagram for a library system with Books, Members, and Loans. Convert it to a relational
model and write SQL to create the tables with appropriate constraints.
Answer:
● ER Diagram:
○ Entity: Book (Attributes: ISBN, Title, Author)
○ Entity: Member (Attributes: MemberID, Name, Email)
○ Entity: Loan (Attributes: LoanID, LoanDate, ReturnDate)
○ Relationship: Borrows (Member borrows Book)
● Relational Model:
○ Books (ISBN, Title, Author)
○ Members (MemberID, Name, Email)
○ Loans (LoanID, MemberID, ISBN, LoanDate, ReturnDate)
■ Foreign Keys: MemberID references Members(MemberID), ISBN references Books(ISBN)
● SQL:
sql
Copy
CREATE TABLE Books (
ISBN VARCHAR(13) PRIMARY KEY,
Title VARCHAR(100) NOT NULL,
Author VARCHAR(50)
);
The Relational Model is the foundation of most modern DBMSs, organizing data into tables (relations) with rows
(tuples) and columns (attributes).
● Key Concepts:
○ Relation: A table (e.g., Students with columns ID, Name, GPA).
○ Tuple: A row in the table (e.g., (101, 'John', 3.5)).
○ Attribute: A column in the table (e.g., GPA).
○ Domain: The set of allowed values for an attribute (e.g., GPA is a float between 0 and 4.0).
○ Primary Key: Uniquely identifies each tuple (e.g., ID).
○ Foreign Key: Links one table to another (e.g., CourseID in Enrollments references Courses).
Example:
Students Table:
ID | Name | GPA
----+-------+-----
101 | John | 3.5
● Exam Tip: Be ready to define terms like tuple, attribute, and domain, and explain how they fit into the
relational model.
● Domain Constraints: Attributes must follow their defined domain (e.g., GPA must be a float).
● Key Constraints: Every table must have a unique primary key (e.g., ID in Students).
● Entity Integrity: Primary key values cannot be null.
● Referential Integrity: Foreign key values must match a primary key in the referenced table or be null (e.g.,
CourseID in Enrollments must exist in Courses).
● Check Constraints: Custom rules (e.g., GPA >= 0 AND GPA <= 4.0).
Example:
CREATE TABLE Enrollments (
StudentID INT,
CourseID INT,
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Students(ID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
● Exam Tip: Practice writing CREATE TABLE statements with constraints and identifying violations (e.g.,
inserting a null primary key).
3. Relational Algebra
Relational Algebra is a theoretical query language used to manipulate and retrieve data from relational databases.
It uses operators to build queries.
● Basic Operators:
○ Select (σ): Filters rows based on a condition.
■ Example: σ_{GPA > 3.5}(Students) returns students with GPA > 3.5.
○ Project (π): Selects specific columns.
■ Example: π_{Name, GPA}(Students) returns only the Name and GPA columns.
○ Union (∪): Combines rows from two relations (removes duplicates).
■ Example: Students ∪ Faculty combines their rows if compatible.
○ Difference (−): Returns rows in one relation but not another.
■ Example: Students − Graduates returns non-graduate students.
○ Cartesian Product (×): Combines all rows from two relations.
■ Example: Students × Courses pairs every student with every course.
○ Join (⨝): Combines rows based on a condition.
■ Example: Students ⨝_{Students.ID = Enrollments.StudentID} Enrollments.
● Additional Operators:
○ Natural Join: Joins tables on common attributes.
○ Division: Finds values in one relation that match all values in another.
○ Rename (ρ): Renames a relation or attribute for clarity.
● Example: To find names of students with GPA > 3.5 enrolled in a course:
π_{Name}(σ_{GPA > 3.5}(Students) ⨝_{Students.ID = Enrollments.StudentID} Enrollments)
● Exam Tip: Practice writing relational algebra expressions for queries like “Find students enrolled in a specific
course.” Know the symbols and their purposes.
4. Relational Calculus
Relational Calculus is a non-procedural query language that describes what data to retrieve, not how to retrieve it.
It’s more declarative than relational algebra.
● Types:
○ Tuple Relational Calculus (TRC):
■ Uses tuples and conditions to define queries.
■ Example: { t | t ∈ Students ∧ t.GPA > 3.5 } returns tuples t from Students where GPA > 3.5.
○ Domain Relational Calculus (DRC):
■ Uses attribute domains instead of tuples.
■ Example: { <Name, GPA> | <ID, Name, GPA> ∈ Students ∧ GPA > 3.5 }.
● Key Difference:
○ Relational Algebra: Procedural (step-by-step operations).
○ Relational Calculus: Declarative (describes the result).
● Exam Tip: Understand the difference between algebra and calculus. TRC questions are more common, so
practice writing TRC queries for simple conditions.
5. SQL – Functions (Aggregate Functions)
Aggregate Functions perform calculations on a set of values and return a single value.
Example:
SELECT AVG(GPA), MAX(GPA)
FROM Students
WHERE Major = 'CS';
● Exam Tip: Combine aggregate functions with GROUP BY (covered later) for questions like “Find average
GPA per department.”
6. Built-in Functions
SQL provides built-in functions for numeric, date, and string operations.
● Numeric Functions:
○ ROUND(number, decimals): Rounds a number.
■ Example: SELECT ROUND(GPA, 1) FROM Students; (e.g., 3.56 → 3.6).
○ ABS(number): Returns absolute value.
○ CEIL(number), FLOOR(number): Rounds up or down.
● Date Functions:
○ CURRENT_DATE: Returns today’s date.
■ Example: SELECT CURRENT_DATE;.
○ DATEDIFF(date1, date2): Days between two dates.
■ Example: SELECT DATEDIFF(ReturnDate, LoanDate) FROM Loans;.
○ EXTRACT(unit FROM date): Extracts part of a date (e.g., year).
■ Example: SELECT EXTRACT(YEAR FROM LoanDate) FROM Loans;.
● String Functions:
○ CONCAT(str1, str2): Concatenates strings.
■ Example: SELECT CONCAT(FirstName, ' ', LastName) AS FullName FROM Students;.
○ UPPER(str), LOWER(str): Changes case.
■ Example: SELECT UPPER(Name) FROM Students;.
○ LENGTH(str): Returns string length.
■ Example: SELECT LENGTH(Name) FROM Students;.
● Exam Tip: Memorize 2–3 examples per function type. Be ready to use them in SELECT queries.
7. Set Operations
● Exam Tip: Practice identifying correlated vs. non-correlated subqueries. Correlated subqueries are slower
but powerful for complex conditions.
● GROUP BY: Groups rows with the same values into summary rows (used with aggregate functions).
Example: Find average GPA by major:
SELECT Major, AVG(GPA)
FROM Students
○ GROUP BY Major;
● HAVING: Filters groups based on a condition (like WHERE for GROUP BY).
● ORDER BY: Sorts results (ASC for ascending, DESC for descending).
● Exam Tip: Remember: WHERE filters rows before grouping, HAVING filters groups after grouping.
● Types of Joins:
○ INNER JOIN: Returns rows where the condition matches in both tables.
■ Example: SELECT S.Name, E.CourseID FROM Students S INNER JOIN Enrollments E ON
S.ID = E.StudentID;.
○ LEFT OUTER JOIN: Returns all rows from the left table, with matching rows from the right (nulls if no
match).
○ RIGHT OUTER JOIN: Returns all rows from the right table, with matching rows from the left.
○ FULL OUTER JOIN: Returns all rows from both tables, with nulls where there’s no match.
○ NATURAL JOIN: Joins on common column names (less common).
● Exam Tip: Practice writing INNER and LEFT JOIN queries. Be able to explain the difference with a Venn
diagram or example.
11. EXISTS, ANY, ALL
Example: Find students with GPA greater than at least one CS major:
SELECT Name
FROM Students
WHERE GPA > ANY (SELECT GPA FROM Students WHERE Major = 'CS');
● Exam Tip: Practice rewriting ANY and ALL queries using MIN or MAX for clarity.
● Types of Views:
○ Simple Views: Based on a single table, updatable.
○ Complex Views: Involve joins, aggregates, or subqueries, usually not updatable.
○ Materialized Views: Physically store data for performance (not always updatable).
● Exam Tip: Know how to create and query a view. Be ready to explain when a view is updatable.
Example:
SAVEPOINT save1;
INSERT INTO Students VALUES (103, 'Bob', 3.2);
ROLLBACK TO save1;
Example:
BEGIN TRANSACTION;
INSERT INTO Students VALUES (104, 'Eve', 3.9);
SAVEPOINT save1;
UPDATE Students SET GPA = 4.0 WHERE ID = 104;
ROLLBACK TO save1; -- Undoes the UPDATE
COMMIT; -- Saves the INSERT
● Exam Tip: Understand the flow of a transaction and how SAVEPOINT works.
Question: Write a SQL query to find the names of students who have a GPA greater than the average GPA of all
Computer Science majors. Also, express this query in relational algebra.
Answer:
SQL:
SELECT Name
FROM Students
WHERE GPA > (SELECT AVG(GPA) FROM Students WHERE Major = 'CS');
Relational Algebra:
π_{Name}(σ_{GPA > avg_gpa}(Students))
where avg_gpa = π_{AVG(GPA)}(σ_{Major = 'CS'}(Students))
Unit 3:Relational Database Design
A functional dependency is a constraint where one set of attributes (A) uniquely determines another set (B), written
as A → B. For every value of A, there’s exactly one corresponding value of B. FDs are the foundation of
normalization and help identify keys and dependencies in a relation.
● Types of FDs:
○ Trivial FD: B is a subset of A (e.g., {StudentID, Name} → Name). Always true.
○ Non-trivial FD: B is not a subset of A (e.g., StudentID → Name).
○ Partial Dependency: A non-prime attribute depends on part of a candidate key.
○ Transitive Dependency: A non-prime attribute depends on another non-prime attribute via a third
attribute.
● Closure of Attributes: To find all attributes determined by a set A (denoted A⁺), use Armstrong’s Axioms:
○ Reflexivity: If B ⊆ A, then A → B.
○ Augmentation: If A → B, then A ∪ C → B ∪ C.
○ Transitivity: If A → B and B → C, then A → C.
○ Derived rules: Union (A → B, A → C implies A → BC), Decomposition (A → BC implies A → B, A →
C).
Exam Tip: Practice computing closures to identify candidate keys (attributes whose closure includes all attributes in
the relation).
Normalization
Normalization decomposes a relation into smaller relations to eliminate redundancy and anomalies (insertion,
deletion, update) while preserving data and dependencies. Each normal form builds on the previous one.
Example: A table {StudentID, Name, Courses} where Courses contains {“DBMS, OS”} violates 1NF. Fix by splitting
into rows:
Before: | StudentID | Name | Courses|
| 101 | Alice | DBMS,OS|
After: | StudentID | Name | Course |
| 101 | Alice | DBMS |
| 101 | Alice | OS |
2NF (Second Normal Form)
● Requirement: In 1NF, and no partial dependency (non-prime attributes depend on the entire candidate key,
not part of it).
● Example: Table {StudentID, CourseID, Instructor, Dept} with FDs {StudentID, CourseID → Instructor,
CourseID → Dept}. Candidate key: {StudentID, CourseID}. CourseID → Dept is a partial dependency (Dept
depends only on CourseID). Decompose:
○ R1: {CourseID, Dept}
○ R2: {StudentID, CourseID, Instructor}
● Requirement: In 2NF, and no transitive dependency (non-prime attributes don’t depend on other non-prime
attributes).
● Example: Table {StudentID, Dept, DeptHead} with FDs {StudentID → Dept, Dept → DeptHead}. Dept →
DeptHead is transitive. Decompose:
○ R1: {Dept, DeptHead}
○ R2: {StudentID, Dept}
Ensures the join of decomposed tables reconstructs the original relation. A decomposition is lossless if at least one
decomposed table contains a key of the original relation or satisfies the chase algorithm.
Example: For R(A, B, C) with FD {A → B}, decompose into R1(A, B) and R2(A, C). Since R1 ∩ R2 = {A} and A → B,
the join on A preserves all tuples.
Ensures all FDs are enforceable in the decomposed tables. Check if the closure of FDs in the decomposed tables
covers the original FDs.
Example: For R(A, B, C) with FDs {A → B, B → C}, decompose into R1(A, B) and R2(B, C). FDs are preserved
because {A → B} is in R1, {B → C} is in R2, and together they cover all FDs.
Practice Problem: Normalize the table R(A, B, C, D) with FDs {A → B, B → C, C → D} to BCNF, ensuring lossless
join and dependency preservation.
Exam Tip: Practice normalizing step-by-step and checking properties. Expect questions like “Normalize this table” or
“Is this decomposition lossless?”
● Requirement: In BCNF, and no non-trivial multi-valued dependencies (MVDs). An MVD A ↠ B holds if for a
value of A, the set of B values is independent of other attributes.
● Requirement: No non-trivial join dependencies (JDs). A JD exists if a relation can be decomposed into
smaller relations and joined back losslessly. 5NF is rare and ensures no anomalies from complex joins.
● Example: A table {Supplier, Part, Project} where a supplier supplies a part to a project only if they supply
both independently. Decompose into projections to eliminate JDs.
● Requirement: Every constraint is a result of domain constraints or key constraints. Theoretical and hard to
achieve but ensures no anomalies.
● Example: A table with complex constraints (e.g., “salary must be positive unless employee is intern”) may
violate DKNF unless simplified.
Practice Problem: For {Student, Course, Book} with MVD Student ↠ Course, decompose to 4NF.
Exam Tip: Focus on 4NF with MVDs. 5NF/DKNF are less common but understand their purpose.
3. Properties of Transactions
A transaction is a sequence of operations (e.g., read, write) treated as a single unit. Transactions ensure database
reliability via ACID properties:
● Atomicity: All operations complete, or none do. Example: A bank transfer debits one account and credits
another; if it fails, both are rolled back.
● Consistency: Database remains in a valid state (e.g., foreign key constraints hold).
● Isolation: Transactions don’t interfere (partial changes aren’t visible).
● Durability: Committed changes persist, even after a crash.
Transaction States
Practice Problem: Draw the transaction state diagram and explain how a failed transaction is handled.
Exam Tip: Memorize ACID and states. Be ready to explain with real-world examples (e.g., online shopping).
Example: Schedule: T1: R(A), T2: W(A), T2: W(B), T1: W(B).
Practice Problem: Is T1: R(A), T2: W(A), T1: W(B), T2: R(B) conflict serializable?
Locking Techniques
Example: T1: S(A), R(A), X(B), W(B); T2: X(A), W(A). T1 and T2 follow 2PL, ensuring serializability.
Timestamp Ordering
Example: T1 (TS=10): R(A); T2 (TS=20): W(A). T1 reads A if no write with TS > 10 occurred.
Granularity
Recoverable Schedules
Practice Problem: In 2PL, show how T1: R(A), W(B) and T2: W(A), R(B) avoid conflicts.
Exam Tip: Compare 2PL vs. timestamp ordering. Practice identifying recoverable schedules.
Example: T1: X(A), wants B; T2: X(B), wants A. Wait-for graph: T1 → T2, T2 → T1 (cycle). Abort T2.
Practice Problem: Draw a wait-for graph for T1: X(A), T2: X(B), T1: wants B, T2: wants A.
7. Recovery Techniques
Recovery ensures consistency after failures.
● Log-Based Recovery:
○ Logs store before/after images.
○ Undo: Rollback uncommitted changes.
○ Redo: Reapply committed changes.
● Checkpoints: Periodic snapshots to reduce recovery time.
● Backup: Full/incremental backups for catastrophic failures.
● Recovery from Catastrophic Failures: Restore backup, apply logs.
Example: Log: <T1, A, 100, 200>. If T1 fails, undo by restoring A=100. If T1 commits, redo A=200.
8. Database Programming
● Control Structures: PL-SQL supports:
○ IF-THEN-ELSE: IF salary < 50000 THEN bonus := 1000; END IF;
○ Loops: FOR i IN 1..10 LOOP ... END LOOP;
● Exception Handling:
○ BEGIN ... EXCEPTION WHEN NO_DATA_FOUND THEN ... END;
Stored Procedures:
CREATE PROCEDURE UpdateSalary(empID IN NUMBER, amount IN NUMBER) AS
BEGIN
UPDATE Employees SET salary = salary + amount WHERE ID = empID;
COMMIT;
EXCEPTION
WHEN OTHERS THEN
ROLLBACK;
END;
Triggers:
CREATE TRIGGER LogSalaryChange
AFTER UPDATE OF salary ON Employees
FOR EACH ROW
BEGIN
INSERT INTO SalaryLog(empID, oldSalary, newSalary)
VALUES (:OLD.ID, :OLD.salary, :NEW.salary);
END;
Secondary storage devices (e.g., hard disk drives (HDDs), solid-state drives (SSDs)) store database data
persistently. Unlike main memory (RAM), secondary storage is non-volatile but slower, so efficient organization and
access methods are critical.
● Key Characteristics:
○ Blocks: Data is stored in fixed-size blocks (e.g., 4KB or 8KB). A block is the unit of transfer between
disk and memory.
○ Access Time: Includes seek time (moving disk head to track), rotational latency (waiting for sector
to rotate under head), and transfer time (reading/writing data). SSDs are faster due to no mechanical
parts.
○ I/O Cost: Measured in block accesses, as these dominate query performance.
● Example: A database table with 1 million records, each 200 bytes, stored on a disk with 4KB blocks. Each
block holds 4096 / 200 ≈ 20 records. To read 100 records, you need 100 / 20 = 5 block accesses (assuming
records are contiguous).
Exam Tip: Understand block-based storage and how to calculate block accesses for queries. Be ready to explain
why SSDs are faster than HDDs.
2. Operations on Files
Files store database records, and operations include insertion, deletion, modification, and retrieval. The efficiency
of these operations depends on the file organization.
● Common Operations:
○ Insert: Add a new record. May require shifting records or appending.
○ Delete: Remove a record. May mark as deleted (logical deletion) or reorganize file (physical deletion).
○ Modify: Update a record’s fields.
○ Retrieve: Fetch records by key (exact match) or range (e.g., all records with age > 30).
● Challenges:
○ Insertion/deletion in sorted files is costly due to shifting records.
○ Retrieval is slow without indexes.
Example: In a file with records {ID: 1, Name: Alice}, {ID: 2, Name: Bob}, inserting {ID: 3, Name: Charlie} is simple
(append), but deleting {ID: 1} may require marking or reorganizing.
Exam Tip: Know the cost of operations in different file organizations (heap, sorted, hashed). Expect questions like
“How many block accesses to insert a record?”
3. File Organizations
Heap Files
Sorted Files
Hashed Files
● Structure: Records stored based on a hash function applied to a key (e.g., hash(ID) = block number).
● Pros: Fast exact-match retrieval (O(1) block accesses on average).
● Cons: Poor for range queries (hash scatters records). Collisions require resolution (e.g., chaining or open
addressing).
● Use Case: High-speed lookups (e.g., primary key access).
● Example: A table with StudentID hashed to blocks. hash(101) = block 3 directly retrieves the record.
Practice Problem: Compare the number of block accesses to retrieve a record with ID = 50 in a heap file vs. a
hashed file (assume 1000 records, 10 records per block).
Exam Tip: Be ready to compare file organizations for specific operations (e.g., “Which is best for range queries?”).
Practice calculating I/O costs.
4. Indexing
Indexes are data structures that speed up retrieval by providing quick access to records based on key values. They
store key-pointer pairs, where the pointer references a disk block or record.
Single-Level Indexes
● Primary Index: Built on the primary key of a sorted file. Each entry maps a key to a block.
○ Structure: Sparse (one entry per block, not per record).
○ Example: A sorted file with 1000 records, 10 records per block (100 blocks). The primary index has
100 entries, each pointing to a block’s first key.
○ Search: Binary search on index (O(log m), where m is number of blocks), then 1 block access.
● Clustering Index: Built on a non-key attribute that orders the file (e.g., DepartmentID if records are sorted by
department).
● Secondary Index: Built on a non-ordering attribute (e.g., Name in a file sorted by ID). Dense (one entry per
record).
○ Example: A secondary index on Name maps each name to a record’s disk address. Requires more
storage but supports queries on non-key attributes.
Multi-Level Indexes
● Structure: A hierarchy of indexes (index on an index) to reduce search time for large indexes.
● Example: A primary index with 10,000 entries is too large to fit in memory. Create a second-level index on
the first-level index, reducing search to O(log log n).
● Use Case: Large databases where single-level indexes are too big.
● B-Tree:
○ A balanced, multi-way search tree where each node holds multiple keys and pointers.
○ Properties:
■ All leaves at the same level.
■ A node with k keys has k+1 pointers.
■ Minimum and maximum keys per node (e.g., order m means max m-1 keys).
○ Operations:
■ Search: O(log n) disk accesses.
■ Insert/Delete: Split or merge nodes to maintain balance.
○ Example: A B-tree of order 3 (max 2 keys per node). To find key 50, traverse root to leaf, accessing
2–3 disk blocks.
● B+ Tree:
○ A variant of B-tree optimized for databases.
○ Differences:
■ Only leaf nodes store data pointers; internal nodes store keys for navigation.
■ Leaf nodes linked sequentially for efficient range queries.
○ Advantages:
■ Better for range queries (sequential leaf access).
■ Less storage for internal nodes (no data pointers).
○ Example: A B+ tree with keys {10, 20, 30} in leaves, linked for range query 10 ≤ key ≤ 25.
Practice Problem: In a B+ tree of order 4 (max 3 keys per node), insert keys {10, 20, 30, 15, 25}. Show the tree
structure.
Exam Tip: Understand the structure of B+ trees and how they support range queries. Practice inserting keys and
calculating search costs (e.g., “How many disk accesses to find a key?”).
● Key Features:
○ Objects: Data and methods (behavior) stored together (e.g., a Student object with attributes name, ID
and method calculateGPA()).
○ Classes and Inheritance: Objects belong to classes, which can inherit properties (e.g., GradStudent
inherits from Student).
○ Encapsulation: Data and methods bundled, with access control (e.g., private attributes).
○ Polymorphism: Methods can behave differently based on object type.
○ Complex Data Types: Support for nested objects, arrays, and references (e.g., a Course object with
a list of Student objects).
○ Persistence: Objects stored permanently, with query capabilities.
String name;
Int ID;
List<Course> enrolledCourses;
}
Query: “Find all students with GPA > 3.5” directly accesses the calculateGPA method.
● Advantages:
○ Natural modeling for complex data (e.g., multimedia, CAD).
○ Faster for object-based applications (no impedance mismatch with OOP languages).
● Disadvantages:
○ Complex querying compared to SQL.
○ Less mature than relational DBMS.
Practice Problem: Design an OODBMS schema for a library system with Book, Author, and Borrower classes,
including inheritance (e.g., TextBook inherits from Book).
Exam Tip: Compare OODBMS vs. RDBMS (e.g., “Why use OODBMS for a CAD system?”). Be ready to define OOP
concepts in a database context.
● Key Features:
○ Data Distribution: Data split across sites via:
■ Fragmentation: Divide tables (e.g., horizontal: rows split by condition; vertical: columns split).
■ Replication: Copies of data at multiple sites for availability.
○ Transparency: Users see a single logical database (hides distribution details).
■ Location Transparency: No need to know where data is stored.
■ Replication Transparency: No need to know about copies.
○ Distributed Query Processing: Queries optimized across sites, minimizing data transfer.
○ Distributed Transactions: Ensure ACID properties across sites using protocols like Two-Phase
Commit (2PC):
■ Prepare Phase: All sites agree to commit.
■ Commit Phase: All sites commit or abort.
● Example: A bank with branches in New York and London. Customer data is fragmented:
○ New York: Customers with ID < 1000.
○ London: Customers with ID ≥ 1000.
○ A query for a customer’s balance is routed to the appropriate site.
● Advantages:
○ Localized access reduces latency.
○ Scalability and fault tolerance via replication.
● Disadvantages:
○ Complex query optimization and transaction management.
○ Network latency and reliability issues.
● Key Challenges:
○ Concurrency Control: Use distributed locking or timestamp ordering.
○ Deadlock Detection: Global wait-for graphs across sites.
○ Recovery: Ensure consistency after site failures (e.g., via logs and 2PC).
Practice Problem: For a DDBMS with two sites, explain how a query “SELECT * FROM Customers WHERE
balance > 1000” is processed if data is horizontally fragmented by region.
Exam Tip: Understand fragmentation, replication, and 2PC. Be ready to explain transparency types or compare
DDBMS vs. centralized DBMS.