4
4
1. Transaction
A transaction is a sequence of operations performed as a single logical unit of work. It may involve
INSERT, UPDATE, DELETE, or SELECT operations.
Example: Transferring money from one account to another involves two operations: debiting one
account and crediting another.
A. Atomicity
Ensures that either all operations of a transaction are completed or none are.
Example: If a money transfer fails midway, all changes made so far are rolled back.
B. Consistency
C. Isolation
Transactions are executed independently of each other, preventing intermediate states from being
visible to other transactions.
Example: Two users transferring money simultaneously won't affect each other's transactions.
D. Durability
Once a transaction is committed, its changes are permanently stored, even in case of system
failure.
Example: After a successful payment, the database ensures the payment record persists.
3. Transaction States
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 1/18
Transactions go through the following states:
4. Concurrency Control
Concurrency issues arise when multiple transactions execute simultaneously. To handle this:
Locking Mechanisms:
Shared Lock (Read)
Exclusive Lock (Write)
Concurrency Control Protocols:
Two-Phase Locking (2PL)
Timestamp Ordering
5. Recovery
In case of system failure, the DBMS uses logs and recovery techniques to maintain consistency:
6. Schedules
7. Deadlock
Occurs when two or more transactions wait for each other to release resources. Deadlock handling
techniques:
These concepts ensure efficient, secure, and reliable transaction processing in DBMS.
Serializability
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 2/18
Serializability in DBMS
Serializability is a fundamental concept in concurrency control within DBMS. It ensures that the outcome
of executing multiple transactions concurrently is equivalent to the outcome of executing them
serially (one after the other), preserving the consistency of the database.
Types of Serializability
1. Conflict Serializability
Example:
T 1 : R(A), W(A)
T 2 : R(A), W(A)
If the schedule is: T 1: R(A) → T 2 : R(A) → T 1 : W(A) → T 2 : W(A) This schedule can be
rearranged to a serial schedule T 1 → T 2, maintaining conflict serializability.
2. View Serializability
Testing Serializability
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 3/18
Edges represent dependencies between transactions (e.g., T 1 → T 2 if T 1's write is read or
overwritten by T 2).
If the graph is acyclic, the schedule is serializable.
Example:
Importance of Serializability
Both conflict serializability and view serializability are methods for determining whether a schedule
is equivalent to a serial schedule. However, they differ in approach and strictness.
Conflict Serializability
Key Points:
1. Conflicting Operations:
Two operations conflict if:
They belong to different transactions.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 4/18
in table
Example:
Schedule:
1. T 1 : R(A), W(A)
2. T 2 : R(A), W(A)
Conflict-Serializability: If operations conflict (e.g., T 1's W (A) and T 2's R(A)), the precedence
graph may become cyclic, making the schedule non-conflict-serializable.
View-Serializability: If T 2 reads A written by T 1 and the final writes match a serial schedule, the
schedule may still be view-serializable.
concurrency contol
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 5/18
Concurrency control in DBMS ensures that multiple transactions can execute simultaneously without
interfering with each other, maintaining data consistency, isolation, and integrity. It resolves issues
like data conflicts, deadlocks, and anomalies while adhering to the ACID properties.
1. Ensure Data Consistency: Prevent inconsistent database states due to concurrent transactions.
2. Maintain Isolation: Ensure transactions do not affect one another’s intermediate states.
3. Increase Throughput: Maximize the number of transactions processed in a given time.
4. Prevent Anomalies:
Dirty Reads: Reading uncommitted changes.
Non-repeatable Reads: Different values read for the same data in a transaction.
Lost Updates: Overwriting updates from concurrent transactions.
Concurrency Problems
1. Dirty Reads: Reading data that is modified but not yet committed.
2. Lost Updates: Two transactions simultaneously update a data item, and one overwrites the other.
3. Non-repeatable Reads: A value read multiple times in a transaction differs due to another
transaction's updates.
4. Phantom Reads: A transaction reads a set of rows, and another transaction inserts or deletes
rows, causing a different result upon re-reading.
1. Lock-Based Protocols
Shared (S) Lock: Allows multiple transactions to read but not write a data item.
Exclusive (X) Lock: Allows only one transaction to read/write the data item.
Types:
Two-Phase Locking (2PL):
Growing Phase: Transaction acquires all locks.
Shrinking Phase: Transaction releases locks.
Ensures serializability but can lead to deadlocks.
Strict 2PL: Holds all locks until the transaction commits or aborts.
Rigorous 2PL: Holds all locks until the transaction finishes.
2. Timestamp-Based Protocols
Assumes conflicts are rare and allows transactions to proceed without locking.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 6/18
Phases:
1. Read Phase: Transactions read data and make changes locally.
2. Validation Phase: Checks for conflicts before committing.
3. Write Phase: Commits changes if validation succeeds.
Maintains multiple versions of data items to allow simultaneous reads and writes.
Transactions read the version of the data item that was committed before their start time.
5. Deadlock Handling
Deadlock Prevention: Enforces rules to avoid circular wait conditions (e.g., timeout, wait-die,
wound-wait schemes).
Deadlock Detection: Periodically checks for cycles in the wait-for graph.
Deadlock Recovery: Aborts one or more transactions to resolve the deadlock.
Performance Trade-offs
Lock-Based Protocols: High consistency but may lead to deadlocks and reduced parallelism.
Optimistic Protocols: High parallelism but potential rollback overhead.
MVCC: Handles read-heavy workloads efficiently but requires more storage.
Key Concepts
Granularity of Locking: Level at which locks are applied (e.g., row, page, table).
Isolation Levels:
Read Uncommitted: Least strict; allows dirty reads.
Read Committed: Prevents dirty reads.
Repeatable Read: Prevents non-repeatable reads.
Serializable: Strictest; prevents all anomalies.
Concurrency control is critical for maintaining a balance between performance and consistency in a
multi-user environment.
Recovery in a concurrent transaction environment ensures that the database remains consistent after
failures, such as system crashes or transaction aborts, while allowing multiple transactions to execute
simultaneously. Recovery mechanisms focus on maintaining atomicity and durability, even when
transactions interact.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 7/18
Key Concepts
1. Transaction States:
Active: Transaction is executing.
Partially Committed: All operations are complete, but changes are not yet durable.
Committed: Transaction is successfully completed, and changes are permanent.
Failed: Transaction cannot proceed due to an error.
Aborted: Transaction is rolled back, and all changes are undone.
2. Logs:
Recovery relies on logs, which record details about each transaction.
Write-Ahead Logging (WAL):
Log entries must be written to stable storage before the actual data is updated.
Types of log entries:
UNDO Log: Used to reverse changes made by aborted or incomplete transactions.
REDO Log: Used to reapply changes made by committed transactions.
Recovery Techniques
1. Checkpointing
Periodically saves the current state of the database and logs to reduce recovery overhead.
Steps:
1. Suspend new transactions momentarily.
2. Write all modified data and logs to disk.
3. Record a checkpoint entry in the log.
2. Recovery Phases
Analysis Phase:
Identifies active, committed, and aborted transactions at the time of failure.
Determines dirty pages (pages in memory but not yet written to disk).
Redo Phase:
Reapplies changes from the log for committed transactions to ensure durability.
Starts from the checkpoint.
Undo Phase:
Rolls back changes of uncommitted transactions to maintain atomicity.
Processes log entries in reverse order.
1. Interleaving of Operations:
Operations from multiple transactions are interleaved, complicating recovery.
Logs must preserve the order of operations for correct recovery.
2. Dirty Reads and Cascading Aborts:
If a transaction reads uncommitted changes of another transaction, a failure may require
rolling back multiple transactions (cascading aborts).
Prevented by strict protocols like Strict Two-Phase Locking (2PL).
3. Concurrency Control During Recovery:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 8/18
Recovery must handle concurrent transactions without introducing inconsistencies.
Transactions accessing the database during recovery may encounter delays.
Example of Recovery
Given Log:
sql
START T1
WRITE T1 A=50
START T2
WRITE T2 B=30
COMMIT T1
WRITE T2 C=40
CRASH
Recovery Process:
1. Analysis:
Transactions active at crash: T 2.
Committed transactions: T 1.
2. Redo:
Apply changes of T 1 (committed).
3. Undo:
Revert changes made by T 2 (active).
To simplify recovery, use cascadeless schedules, where transactions are only allowed to read
committed data, preventing cascading aborts.
Recovery with concurrent transactions combines logging, checkpointing, and concurrency control to
ensure data consistency and durability, even in complex multi-transaction scenarios.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 9/18
oodbms vs dbms
Complex Data Handles complex data types like multimedia, Less suited for complex data types; best
Support hierarchies, and user-defined data types easily. for structured data.
Data Relationships are represented using object Relationships are represented using
Relationships references. foreign keys.
Supports class inheritance, enabling object No direct support for inheritance; must
Inheritance
reuse and extension. be modeled manually.
Highly flexible for dynamic and complex Less flexible for hierarchical or non-
Flexibility
applications. tabular data.
Efficient for applications with complex data Performs better for structured data and
Performance
and frequent object navigation. transactional systems.
Supports easier schema evolution without Schema changes are more complex and
Schema Evolution
requiring extensive restructuring. often require data migration.
Relatively newer and less mature compared to Established and mature technology with
Maturity
RDBMS. extensive industry support.
Key Takeaways:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 10/18
You need robust query capabilities using SQL.
Industry standards, maturity, and interoperability are priorities.
fragmentation
Fragmentation in DBMS
Fragmentation in DBMS refers to dividing a database into smaller, logical pieces called fragments.
These fragments are distributed across different sites in a distributed database system to improve
performance, availability, and manageability. Each fragment can be stored and accessed
independently.
Types of Fragmentation
1. Horizontal Fragmentation:
Divides a table into subsets of rows (tuples) based on a condition.
Each fragment contains a subset of rows that satisfy a specific predicate.
All fragments share the same schema (column structure).
Example:
Table: `Employee(ID, Name, Dept, Salary)`
Condition: `Dept = "HR"` and `Dept = "Finance"`
Fragment 1: Employees in HR.
Fragment 2: Employees in Finance.
2. Vertical Fragmentation:
Divides a table into subsets of columns (attributes), along with a key attribute to ensure
reconstruction.
Each fragment contains some attributes of the table.
Example:
Table: `Employee(ID, Name, Dept, Salary)`
Fragment 1: `(ID, Name)`
Fragment 2: `(ID, Dept, Salary)`
Reconstruction: Join fragments on the key attribute (`ID`).
3. Hybrid (Mixed) Fragmentation:
Combines horizontal and vertical fragmentation.
A table is first horizontally fragmented, and each horizontal fragment is then vertically
fragmented (or vice versa).
Example:
Horizontal fragmentation by `Dept`.
Vertical fragmentation of HR department into `(ID, Name)` and `(ID, Salary)`.
4. Derived Fragmentation:
Fragments of one table are derived based on the fragmentation of another table.
Ensures referential integrity when there are relationships between tables.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 11/18
Advantages of Fragmentation
1. Improved Performance:
Queries are executed on relevant fragments, reducing data transfer and processing time.
2. Increased Parallelism:
Multiple fragments can be processed simultaneously across different sites.
3. Better Resource Utilization:
Allows data to be stored closer to where it is frequently accessed.
4. Enhanced Availability:
If one fragment or site fails, others remain accessible.
5. Scalability:
Fragments can be added or distributed across new nodes as the system grows.
Disadvantages of Fragmentation
1. Complexity:
Query processing becomes more complex, requiring reassembly of fragmented data.
2. Overhead:
Managing and maintaining fragments requires additional resources.
3. Data Redundancy:
Fragments may need to be replicated for fault tolerance, increasing storage needs.
4. Reconstruction:
Joining fragments to satisfy global queries may incur additional costs.
1. Completeness: Ensure that all data can be accessed through the fragments.
2. Reconstruction: Ensure that the original table can be reconstructed without loss of information.
3. Disjointness: Fragments should not overlap unless redundancy is necessary (e.g., replication).
Fragmentation plays a crucial role in distributed databases, enhancing system efficiency while posing
challenges in query optimization and data integrity.
multimedia
Multimedia refers to the integration of multiple forms of content, such as text, images, audio, video,
graphics, and animation, into a cohesive digital experience. It is widely used in various fields, including
entertainment, education, advertising, and communication.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 12/18
2. Images/Graphics: Visual representations, including photographs, drawings, and charts.
3. Audio: Sound, including music, voice, or sound effects.
4. Video: Moving visual media, often combined with audio.
5. Animation: Simulated motion or transitions between states.
6. Interactive Content: User-engaged elements like clickable links, forms, or virtual reality
environments.
A Multimedia Database Management System (MMDBMS) is designed to store, retrieve, and manage
multimedia content effectively.
Applications of Multimedia
1. Entertainment:
Streaming services like Netflix, YouTube, or Spotify.
Gaming and virtual reality.
2. Education:
E-learning platforms with video tutorials, interactive graphics, and animations.
3. Marketing and Advertising:
Multimedia-rich advertisements, banners, and promotional videos.
4. Healthcare:
Storing medical imaging data like X-rays and MRIs.
Interactive tools for patient education.
5. Communication:
Video conferencing, podcasts, and multimedia messaging.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 13/18
1. Large Storage Requirements:
Multimedia files are typically large and require efficient storage solutions.
2. High Bandwidth Demand:
Streaming and downloading multimedia content requires high-speed networks.
3. Data Integration:
Combining multimedia content with structured data in traditional databases.
4. Content Retrieval:
Advanced techniques are needed for effective search and retrieval.
5. Standardization:
Handling diverse file formats and codecs.
1. File Formats:
Images: JPEG, PNG, GIF.
Audio: MP3, WAV, AAC.
Video: MP4, AVI, MKV.
2. Compression Standards:
Images: JPEG, PNG.
Video: MPEG, H.264, H.265.
3. Tools and Platforms:
Editing: Adobe Photoshop, Premiere Pro, Audacity.
Delivery: Content delivery networks (e.g., Akamai).
Conclusion
Multimedia integrates diverse forms of content to enhance user experience. In computing, effective
management of multimedia requires specialized tools and technologies to address challenges like large
data sizes, retrieval complexities, and high bandwidth needs.
log
Log in DBMS
A log in a Database Management System (DBMS) is a crucial component used for maintaining the
consistency, durability, and recoverability of a database. It records all the operations and transactions
performed on the database, ensuring that the system can recover from failures while adhering to ACID
properties.
Purpose of Logs
1. Durability: Ensures committed transactions are not lost in case of system failures.
2. Atomicity: Allows rollback of incomplete or failed transactions.
3. Recovery: Facilitates recovery of the database to a consistent state after a crash or error.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 14/18
4. Concurrency: Helps manage concurrent transactions and resolve conflicts.
Types of Logging
css
<T1, Start>
<T1, A, 50> // Before change, A = 50
<T1, Commit>
3. Redo Logging:
Logs new values of the data items being updated, ensuring that committed transactions can
be redone if required.
Example:
css
<T1, Start>
<T1, A, 70> // After change, A = 70
<T1, Commit>
4. Undo-Redo Logging:
Combines both undo and redo logs to handle partial or failed transactions and ensure
recovery.
Logging Process
1. Start of Transaction:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 15/18
Log entry records the transaction's beginning.
Example: `<T1, Start>`.
2. During Transaction:
Operations are logged as they are performed.
Example: `<T1, A, Old=50, New=70>`.
3. Commit/Abort:
A log entry indicates the transaction's outcome.
Example:
Commit: `<T1, Commit>`
Abort: `<T1, Abort>`
4. Checkpoint:
A checkpoint log periodically records the state of active transactions and dirty pages in
memory, reducing recovery time.
Example: `<Checkpoint, T1, T2>`.
Advantages of Logging
1. Crash Recovery:
Helps restore the database to a consistent state after a crash.
2. Minimal Overhead:
Logging is efficient and does not significantly impact system performance.
3. Support for Concurrency:
Ensures correctness even in concurrent transaction execution.
1. Analysis Phase:
Identifies active, committed, and aborted transactions at the time of failure.
2. Redo Phase:
Reapplies committed changes using redo logs to ensure durability.
3. Undo Phase:
Rolls back uncommitted changes using undo logs to maintain atomicity.
Challenges in Logging
1. Storage Overhead:
Log files can grow significantly, requiring periodic management.
2. Complexity in Implementation:
Requires careful synchronization and error handling.
3. Performance Impact:
Frequent logging operations may affect throughput in high-transaction environments.
Logs are essential for ensuring the reliability and robustness of database systems, particularly in
environments where data integrity is critical.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 16/18
2 phase locking and variants
Two-Phase Locking (2PL) is a concurrency control protocol that ensures serializability in transaction
management. The protocol governs the acquisition and release of locks in a transaction to avoid
conflicts between concurrent transactions.
In 2PL, each transaction goes through two distinct phases:
1. Growing Phase: A transaction can acquire locks but cannot release any.
2. Shrinking Phase: A transaction can release locks but cannot acquire any new locks.
The key principle is that once a transaction releases a lock, it can no longer acquire new locks, ensuring
that all its data access operations are completed before any locks are released.
1. Growing Phase:
A transaction can acquire any number of locks, but it cannot release any locks during this
phase.
2. Shrinking Phase:
Once the transaction releases a lock, it enters the shrinking phase.
During this phase, no new locks can be acquired, but it can continue releasing locks.
Types of 2PL
Advantages of 2PL
Disadvantages of 2PL
1. Deadlocks: Basic 2PL can lead to deadlocks, where two or more transactions are waiting for each
other to release resources. This situation requires deadlock detection and resolution techniques.
2. Reduced Concurrency: The protocol may reduce the system's overall concurrency, especially when
transactions lock multiple resources and hold locks for long durations.
3. Overhead: The management of locks introduces some system overhead, especially in high-
transaction environments.
4. Starvation: In the case of deadlock resolution, some transactions may be delayed indefinitely
(starvation), especially if the priority of transaction management isn't well-handled.
Consider the following two transactions, T1 and T2, accessing resources A, B, and C.
With Strict 2PL, T1 would not release any locks until it commits, making it easier to ensure consistency.
Conclusion
Two-Phase Locking is a widely used concurrency control protocol to ensure serializability in database
systems. While it ensures correctness and consistency of data, it requires careful handling of deadlocks
and system performance, especially in highly concurrent environments.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 18/18