DBMS
DBMS
This schema describes how data is physically stored in the database, including file
structures, indexing, and storage details. It focuses on optimizing storage,
retrieval efficiency, and the physical management of data. The internal schema is
hidden from users and application developers.
Conceptual Schema (Logical Level):
This is the central level that defines the overall structure of the database in
terms of entities, relationships, and constraints, without focusing on physical
storage. It represents the entire database’s logical view, covering what data is
stored and the relationships between the data elements. The conceptual schema
provides a unified view of data to database administrators and developers,
abstracting away the complexities of physical storage.
External Schema (View Level):
Data models are conceptual frameworks that describe how data is organized, stored,
and managed within a database system. They define the structure of the data,
relationships between data elements, and constraints, offering a blueprint for
designing databases.
Similar to the hierarchical model, but allows each child to have multiple parents.
It uses a graph structure, making it more flexible for many-to-many relationships.
Example: Early telecommunications and banking systems.
Relational Data Model:
Represents data in tables (relations), where each row is a record and each column
is an attribute. Relationships are established through foreign keys.
Example: SQL-based databases like MySQL, PostgreSQL.
Object-Oriented Data Model:
1. Generalization:
Generalization is the process of combining two or more lower-level entities into a
higher-level, more generalized entity. It highlights common features shared by
multiple entities.
Example: In a university database, entities like Professor and Student can be
generalized into a higher-level entity Person, since both share common attributes
like name, address, and ID. This reduces redundancy by consolidating shared
attributes in a general entity.
2. Specialization:
Specialization is the opposite of generalization, where a higher-level entity is
divided into two or more specialized lower-level entities based on unique
attributes or behaviors.
Example: The Person entity can be specialized into Student and Professor, where
Student has additional attributes like course_enrolled, and Professor has
attributes like department and salary.
3. Aggregation:
Aggregation is a concept where a relationship itself is treated as a higher-level
entity. It is useful when entities and relationships need to be abstracted together
to represent more complex associations.
Example: In a project management database, a Works_On relationship between Employee
and Project can be aggregated into a higher-level entity like Project_Assignment,
showing how multiple employees contribute to various projects.
Q-4 Write the comparison between the Relational model, Object-oriented and Semi-
Structured model?
1. Relational Model:
Structure: Data is organized in tables (relations) with rows (tuples) and columns
(attributes). Each table has a primary key to uniquely identify rows, and foreign
keys to establish relationships.
Data Flexibility: Less flexible, requires predefined schemas. Data must adhere to
strict tabular format.
Query Language: Uses SQL for querying and managing data.
Use Case: Best for applications with structured, consistent data and well-defined
relationships (e.g., banking, HR systems).
Example: MySQL, PostgreSQL.
2. Object-Oriented Model:
Structure: Data is stored as objects, which encapsulate both data (attributes) and
behaviors (methods). Supports inheritance, polymorphism, and encapsulation.
Data Flexibility: More flexible as objects can represent complex data and
relationships. It aligns closely with object-oriented programming.
Query Language: Often uses OQL (Object Query Language) or proprietary languages.
Use Case: Suitable for complex applications involving multimedia, CAD, and real-
time systems.
Example: ObjectDB, db4o.
3. Semi-Structured Model:
Structure: Data is stored without a rigid schema, often in hierarchical or tree-
like formats (e.g., XML, JSON). Attributes can vary across entries.
Data Flexibility: Highly flexible, accommodating irregular or incomplete data.
Query Language: Uses languages like XPath or XQuery.
Use Case: Ideal for web data, document-based systems, and applications where schema
flexibility is needed.
Example: MongoDB, CouchDB (NoSQL databases).
Q-5 Explain about the DBA and also its roles and responsibilities?
Design and develop the database structure, including tables, indexes, and
relationships, based on application needs. They may also assist in database
normalization and schema refinement.
Database Security:
Perform regular database backups and establish disaster recovery plans. In the
event of data loss, the DBA restores databases to minimize downtime and data loss.
Performance Tuning:
Monitor and optimize database performance through indexing, query optimization, and
resource allocation (e.g., CPU, memory). This ensures efficient data retrieval and
system responsiveness.
Data Integrity and Quality:
Enforce data integrity rules (constraints, triggers) to ensure the accuracy and
consistency of stored data.
Monitoring and Maintenance:
Regularly monitor the database system’s health, perform updates, patches, and
resolve any issues (e.g., database crashes, slowdowns).
Key Components:
Entities: Represent real-world objects or concepts (e.g., Employee, Department).
They are depicted as rectangles.
Attributes: Characteristics of entities (e.g., Employee has attributes like name,
ID, and salary). Attributes are shown as ovals.
Relationships: Define how entities are connected (e.g., an Employee works for a
Department). Relationships are shown as diamonds and can be one-to-one, one-to-
many, or many-to-many.
Example:
Consider a university database with the following:
ASSIGNMENT- II
The Relational Data Model is a framework for organizing data into tables (also
called relations), where each table consists of rows and columns. It is one of the
most widely used models in database management systems due to its simplicity,
flexibility, and efficiency. In this model, data is stored in tabular format, and
relationships between tables are established using keys.
Key Components:
Relation (Table): A table is a collection of data organized into rows (records) and
columns (attributes). Each row represents a unique instance of the data.
Attributes (Columns): Define the characteristics or properties of the data stored
in a table. Each column has a specific data type.
Tuples (Rows): Represent individual records in the table.
Primary Key: A unique identifier for each row in a table.
Foreign Key: A key used to establish relationships between tables, connecting a
primary key from one table to another.
Example:
Consider a database for a school system with two tables: Students and Courses.
The Students table has columns like Student_ID (Primary Key), Name, and Age.
The Courses table has columns like Course_ID (Primary Key), Course_Name, and
Student_ID (Foreign Key referencing Students).
Q-2 Explain about the concepts of keys? Explain in detail about candidate key,
super key with example?
In the Relational Data Model, keys are used to uniquely identify records in a table
and establish relationships between tables. They ensure data integrity and avoid
duplication.
Key Types:
Primary Key: A column (or combination of columns) that uniquely identifies each row
in a table. It cannot be NULL and must contain unique values.
Foreign Key: A column in one table that references the primary key in another
table, creating a relationship between the two tables.
Unique Key: Ensures all values in a column are unique, but unlike the primary key,
it can accept one NULL value.
Concepts of Super Key and Candidate Key:
Super Key:
A super key is any combination of attributes that can uniquely identify a row in a
table. It may contain more attributes than necessary.
Example: In a table Students (Student_ID, Name, Email), both Student_ID and
(Student_ID, Email) can be super keys because they uniquely identify each student.
However, (Student_ID, Email) has redundant attributes.
Candidate Key:
A candidate key is a minimal super key, meaning it contains only the essential
attributes needed to uniquely identify a record. There can be multiple candidate
keys, but none of them will have unnecessary attributes.
Example: In the same Students table, Student_ID and Email are candidate keys
because each can uniquely identify a student without extra attributes.
Q-3 Illustrate about integrity and key constraints with suitable examples?
Integrity constraints are rules that ensure the accuracy and consistency of data in
a database. They enforce data validity and prevent invalid data from being stored.
Key constraints are a subset of integrity constraints that define how keys behave
in a relational database. Together, they play a crucial role in maintaining data
integrity.
Ensures that every table has a primary key, and that key must contain unique, non-
null values.
Example: In a Students table, the Student_ID column (primary key) must have a
unique value for every student, and it cannot be NULL.
Referential Integrity:
Ensures that foreign keys correctly reference primary keys in other tables,
maintaining relationships between tables.
Example: If a Courses table has a foreign key Student_ID that references the
Student_ID in the Students table, it guarantees that no course can be assigned to a
non-existent student. If a student is deleted, the database will either restrict
the deletion or cascade the changes.
Key Constraints:
Primary Key Constraint:
A column (or a combination of columns) designated as the primary key must be unique
and cannot contain NULL values.
Example: The Employee_ID column in an Employees table ensures each employee is
uniquely identifiable.
Foreign Key Constraint:
Ensures that the values in a foreign key column correspond to valid values in the
referenced table’s primary key.
Example: In an Orders table, the Customer_ID foreign key must reference a valid
Customer_ID in the Customers table.
### Operations:
While relational calculus itself does not define operations like relational
algebra, it uses logical operations (AND, OR, NOT) and quantifiers (∃, ∀) to
express conditions. The primary goal is to focus on the “what” of data retrieval,
making it a powerful tool for defining queries in a more declarative manner.
ASSIGNMENT- III
Redundant data leads to unnecessary storage consumption and can cause data
inconsistency. For example, if the same customer’s information is stored in
multiple tables, updating it in one place but not the other leads to mismatched
data. Normalization ensures data is stored in one place, avoiding duplication.
Improve Data Integrity:
With normalized tables, complex queries are easier to write and maintain. By
reducing redundant data, the database becomes more efficient, as fewer data
manipulations are needed, and search operations can be optimized.
Avoid Data Anomalies:
Boyce-Codd Normal Form (BCNF) is an advanced version of the Third Normal Form
(3NF). Both are used in the process of database normalization to eliminate
redundancy and prevent update anomalies, but BCNF is stricter than 3NF in dealing
with certain types of functional dependencies.
It is in 3NF.
For every non-trivial functional dependency (A → B), the left-hand side (A) must be
a super key (i.e., A should uniquely identify every tuple in the table).
BCNF resolves the limitations of 3NF by ensuring that even if a relation is in 3NF,
it does not have partial dependencies or transitive dependencies in cases where the
primary key is composite or where functional dependencies involve super keys.
It is in 2NF.
For every functional dependency (A → B), B is either:
A part of a candidate key.
A non-prime attribute (an attribute not part of any candidate key).
Difference Between BCNF and 3NF:
3NF allows a non-prime attribute to depend on another non-prime attribute if the
latter is part of a candidate key (a more lenient condition).
BCNF requires all determinants (left side of dependencies) to be super keys,
eliminating certain anomalies that 3NF may not handle.
Definition:
A functional dependency is represented as X → Y, where:
Example:
Consider a table Students with the following attributes:
Q-4 Compute the closure of the following set F of functional dependencies for
relation schema R (ABCDE). F = {A->BC, CD->E, B->D, E->A} List the candidate keys
for R.
From A⁺ = {A, B, C, D, E}, we see that A alone can determine all attributes of R.
Therefore, A is a candidate key.
Step 4: Check for Other Candidate Keys
Now, check other combinations:
i) Lossless Decomposition:
Lossless decomposition is a property of database decomposition that ensures no data
is lost when a relation (table) is divided into two or more sub-relations. In other
words, after decomposing a table, the original relation can be perfectly
reconstructed by joining the sub-relations. Lossless decomposition is crucial for
preserving the integrity and completeness of data during normalization.
Example: A table R(A, B, C) is decomposed into two tables R1(A, B) and R2(B, C). If
the join of R1 and R2 results in the original R without loss or duplication of
information, the decomposition is lossless.
ii) Dangling Tuples:
Dangling tuples refer to records in one table that have no corresponding matching
records in a related table when foreign key constraints are not enforced or
properly maintained. This situation can occur in database joins, leading to
incomplete or incorrect query results.
Example: In a table Movie(Title, Actor, Genre), there can be multiple actors for a
single movie title, and multiple genres, leading to an MVD: Title →→ Actor and
Title →→ Genre. MVD is used in 4th Normal Form (4NF) to remove redundancy
associated with such cases.
How it Works:
When a query is submitted to a database management system (DBMS), the query
optimizer generates multiple possible execution plans. Each execution plan
represents a different strategy to retrieve the required data, involving different
paths like table scans, index lookups, or joins. The optimizer evaluates these
plans based on factors like:
ASSIGNMENT- IV
Q-1 Discuss the basic concepts of transaction and write ACID properties.
Example:
A bank transfer transaction might involve two operations: debiting one account and
crediting another. Both operations must either succeed together or fail together to
ensure data consistency.
ACID Properties:
The ACID properties (Atomicity, Consistency, Isolation, and Durability) are key
principles that ensure reliable transaction processing in a database.
Atomicity:
Ensures that all operations within a transaction are completed successfully or none
are applied. If any part of the transaction fails, the entire transaction is rolled
back.
Example: In a money transfer, either both debit and credit happen, or neither
happens.
Consistency:
Ensures that a transaction takes the database from one consistent state to another.
The integrity of the database must be maintained after the transaction.
Example: Bank account balances must remain valid after a transfer.
Isolation:
Guarantees that once a transaction is committed, its results are permanent, even in
the event of system failure.
Example: After a bank transfer is completed, the changes remain saved even if the
system crashes.
The log file contains entries for every transaction operation, including
timestamps, data modifications, and transaction start/commit/abort actions. This
provides a comprehensive record of all changes made.
Write-Ahead Logging (WAL):
Before any changes are applied to the database, the corresponding log entry is
written first. This ensures that in case of a crash, the system can use the log to
replay or undo operations as needed.
Recovery Process:
Undo: If a transaction is aborted, the recovery process uses the log to revert
changes made by that transaction, ensuring that no partial changes persist in the
database.
Redo: After a crash, the system can replay the log entries for committed
transactions to restore the database to its most recent state.
Locking techniques are essential mechanisms in database management systems used for
concurrency control, ensuring that multiple transactions can operate on the
database without interfering with one another. Locks prevent conflicts by
regulating access to data items, thereby maintaining consistency and integrity
during simultaneous transactions.
Types of Locks:
Shared Lock (S Lock):
Grants a transaction exclusive access to a data item for both reading and writing.
Only one transaction can hold an exclusive lock on a data item at any given time.
Example: If Transaction T1 holds an exclusive lock, no other transaction can
acquire any lock (shared or exclusive) on that data item until T1 releases the
lock.
Locking Protocols:
Two-Phase Locking (2PL):
Transactions acquire locks in the growing phase and release them in the shrinking
phase. This ensures that once a transaction releases a lock, it cannot acquire any
new locks, preventing deadlock situations.
Strict Two-Phase Locking (S2PL):
A stricter version of 2PL where transactions are not allowed to release any locks
until they have completed, ensuring serializability.
Q-5 Discuss on strict two phase –locking protocol and time-stamp based protocol?
Growing Phase: A transaction can acquire locks but cannot release any.
Shrinking Phase: A transaction releases locks and cannot acquire new ones.
Key Features:
Serializability: S2PL guarantees serializability, meaning the execution of
concurrent transactions is equivalent to some serial execution.
Deadlock Prevention: Although it does not prevent deadlocks outright, it simplifies
deadlock detection because the order of lock acquisition is well-defined.
Time-Stamp Based Protocol
Time-stamp based protocols use timestamps to manage concurrency. Each transaction
is assigned a unique timestamp upon its initiation, which determines the order of
transaction execution. The key components include:
Q-6 How concurrency is performed? Explain the protocol that is used to maintain
the concurrency concept?
Two-Phase Locking (2PL): Transactions acquire locks on data items before accessing
them. It has two phases: a growing phase, where locks are acquired, and a shrinking
phase, where locks are released. This method ensures serializability but may lead
to deadlocks.
Strict Two-Phase Locking (S2PL): A variant of 2PL where transactions hold locks
until they complete, preventing the release of locks before finishing.
Timestamp-Based Protocols:
Each transaction is assigned a unique timestamp, and transactions are executed
based on these timestamps. This ensures that conflicting operations adhere to a
predetermined order, allowing the system to maintain consistency without the need
for locking.
Optimistic Concurrency Control:
Transactions execute without restrictions, but before committing, the system checks
for conflicts. If a conflict is detected, the transaction is rolled back and must
be re-executed.
Create Nodes: For each transaction in the schedule, create a node in the graph.
Add Edges:
For every pair of conflicting operations (e.g., reads and writes) between
transactions:
If transaction T1 executes an operation before transaction T2, add a directed edge
from T1 to T2.
Conflicting operations are those that access the same data item, where at least one
is a write.
Check for Cycles:
ASSIGNMENT- V
Example:
class Example {
int value;
Example() {
this(0); // Calls the parameterized constructor
}
Example(int value) {
this.value = value;
}
}
Example:
SELECT
CASE
WHEN score >= 90 THEN 'A'
WHEN score >= 80 THEN 'B'
ELSE 'C'
END AS grade
FROM students;
Key Characteristics:
1. Inner Join:
Returns only the rows that have matching values in both tables.
Example:
Example:
Example:
5. Cross Join:
Returns the Cartesian product of both tables, combining all rows from the first
table with all rows from the second.
Example:
Q-3 Explain triggers and assertion and explain it with through with proper
query?
Triggers:
A trigger is a set of instructions automatically executed in response to certain
events on a specified table. These events can include INSERT, UPDATE, or DELETE
operations. Triggers are used for enforcing business rules, auditing changes, or
maintaining derived data.
(i) Sequence
In Oracle, a sequence is a database object that generates a series of unique
numbers, often used for creating primary key values. Sequences help ensure that no
two rows have the same identifier, making them crucial for maintaining data
integrity. You can define parameters such as the starting value, increment, and
maximum limit.
Example:
CREATE SEQUENCE emp_seq
START WITH 1
INCREMENT BY 1
NOCACHE;
(ii) Triggers
Triggers are procedural code automatically executed in response to specific events
on a table or view, such as INSERT, UPDATE, or DELETE. They are used to enforce
business rules, validate input, and maintain audit trails.
Example:
CREATE OR REPLACE TRIGGER salary_check
BEFORE INSERT ON Employees
FOR EACH ROW
BEGIN
IF :NEW.salary < 0 THEN
RAISE_APPLICATION_ERROR(-20001, 'Salary cannot be negative.');
END IF;
END;
LOAD DATA
INFILE 'data.csv'
INTO TABLE Employees
FIELDS TERMINATED BY ','
(id, name, salary)
1. Tablespaces:
A tablespace is a logical storage unit in Oracle that groups related logical
structures. It contains segments that store data. Each database has at least one
tablespace, often named SYSTEM, which holds essential system data. Other
tablespaces can be created for user data, indexes, etc.
2. Segments:
Each tablespace contains segments, which are the next level of organization. A
segment is a set of extents that stores data for a specific logical structure, such
as a table or an index.
3. Extents:
An extent is a contiguous block of data blocks in the storage space. When a segment
needs more space, Oracle allocates additional extents.
4. Data Blocks:
The smallest unit of storage in Oracle, a data block contains rows of a table or
index entries. The size of a data block can be configured during database creation.
5. Schemas:
A schema is a collection of database objects owned by a single user, including
tables, views, indexes, and procedures. Each user has its own schema.
Dedicated Server
In a dedicated server architecture, each client connection to the Oracle database
is assigned a separate server process. This means that for every user or
application that connects to the database, a new dedicated process is created.
Advantages:
Resource Intensive: This approach consumes more memory and CPU resources,
especially with many concurrent users, as each connection requires its own server
process.
Multi-Threaded Server (MTS)
In a multi-threaded server architecture, multiple client connections share a pool
of server processes. Instead of creating a dedicated server process for each
connection, MTS allows multiple connections to be serviced by fewer server
processes.
Advantages:
Efficiency: Reduces resource usage as multiple client connections share the same
server processes, leading to better scalability for high-concurrency environments.
Lower Overhead: It minimizes the overhead associated with managing numerous
separate processes.
Disadvantages: