0% found this document useful (0 votes)
80 views

Database Questions

The document discusses key concepts related to database locking mechanisms and transaction processing. It defines shared/exclusive locks and intent locks, as well as protocols for acquiring locks. Deadlock detection and resolution techniques are also covered. The document then defines database transactions and their ACID properties. It outlines backup and recovery and transaction logging as techniques for recovering databases from failures or disasters.

Uploaded by

Aashutosh Sinha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Database Questions

The document discusses key concepts related to database locking mechanisms and transaction processing. It defines shared/exclusive locks and intent locks, as well as protocols for acquiring locks. Deadlock detection and resolution techniques are also covered. The document then defines database transactions and their ACID properties. It outlines backup and recovery and transaction logging as techniques for recovering databases from failures or disasters.

Uploaded by

Aashutosh Sinha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

1. Define the locking mechanism. Explain locking types and their protocol.

=> In the context of databases, a locking mechanism is used to manage concurrent access to database
resources, such as tables, records, or data pages, to ensure data integrity and consistency.

Here are some common locking types and their protocols used in database systems:

1. Shared/Exclusive (Read/Write) Locks: Shared locks (also known as read locks) allow multiple
transactions to read a resource simultaneously, without conflicting with each other.

Exclusive locks (also known as write locks) ensure exclusive access to a resource.

Protocol:

When a transaction wants to read a resource, it requests a shared lock. If no exclusive lock is held on the
resource, the shared lock is granted, and multiple transactions can concurrently read the resource.

2. Intent Locks:

Intent locks are used to indicate the intention to acquire shared or exclusive locks on higher-level
resources. They help optimize lock acquisition by allowing transactions to know the locking intentions of
other transactions on related resources.

Protocol:

When a transaction wants to acquire a shared lock on a lower-level resource, it must first acquire an
intent shared lock on the higher-level resource. This informs other transactions that a shared lock is
already being held or requested on the higher-level resource.

Similarly, when a transaction wants to acquire an exclusive lock on a lower-level resource, it must first
acquire an intent exclusive lock on the higher-level resource.

3. Deadlock Detection and Resolution:

Deadlocks can occur when multiple transactions wait indefinitely for locks held by each other. Deadlock
detection and resolution mechanisms are used to identify and resolve such situations.

Various algorithms, such as the wait-for-graph algorithm or timeout-based approaches, can be employed
to detect deadlocks. Once a deadlock is detected, one or more transactions may need to be rolled back
or interrupted to resolve the deadlock.
2. Define Transaction and its properties. Discuss the purpose of database recovery and outline the
techniques to Recover the Database from any disaster.

=> Transaction is a sequence of operations that is treated as a single logical unit of work in a database
system. It is a fundamental concept in database management systems (DBMS) to ensure data
consistency and integrity. The properties of transactions, commonly known as ACID properties, are:

1. Atomicity: A transaction is an atomic unit of work that must either complete successfully or
aborted, leaving the database unchanged. All its operations must be treated as a single,
indivisible operation.

2. Consistency: A transaction should ensure that the database remains consistent before and
after it executes. It should satisfy all the integrity constraints and business rules.

3. Isolation: Transactions should be executed in isolation from each other so that each
transaction sees the database as if it is the only one accessing it.

4. Durability: Once a transaction is committed, its effects should be permanent and survive
system failures or crashes.

Purpose of Database Recovery:

The purpose of database recovery is to restore the database to a consistent state after a failure or
disaster. Failures can occur due to software or hardware malfunctions, natural disasters, or human
errors. Recovery is a crucial component of database management, and it is essential for maintaining data
integrity and consistency.

Techniques for Database Recovery:

There are two main techniques for database recovery:

1. Backup and Recovery:

This technique involves creating regular backups of the database and using them to restore the system
to a previous state in case of failure.

When a failure occurs, the backup is used to restore the database to the last consistent state. Then,
transaction logs are replayed to bring the database up to date, and any uncommitted transactions are
rolled back.
2. Transaction Logging:

This technique involves recording all transactions in a log file, called the transaction log. The log file
contains a record of all database modifications, including the before and after values. When a failure
occurs, the log file is used to restore the database to its last consistent state.

3. Discuss the concept of: i) Candidate Key ii) Distributed Database iii) Centralized Database iv) Data
Replication

i) Candidate Key:

A candidate key is a set of one or more attributes (columns) in a relational database table that can
uniquely identify each tuple (row) in the table. It is a candidate for being chosen as the primary key of
the table.

ii) Distributed Database:

A distributed database is a database system in which data is stored across multiple interconnected
computers or nodes that are geographically distributed. Each node in the distributed database may have
its own local storage, processing power, and memory.

iii) Centralized Database:

A centralized database is a database system where all data is stored in a single location or server. It is
managed by a central authority, and all access to the database is controlled through that central server.

iv) Data Replication:

Data replication is the process of creating and maintaining multiple copies of data in a distributed
system. In this process, data from a source database is copied and stored in one or more destination
databases.

There are different approaches to data replication, including:

1. Full Replication: In this approach, the entire database is replicated to multiple locations or nodes
in a distributed system. Each replica contains a complete copy of the data.
2. Partial Replication: In this approach, only a subset of the database is replicated to multiple
locations. The selection of data to replicate can be based on criteria such as access frequency,
importance, or specific requirements.

3. Update Propagation: When data is modified or updated in one replica, those changes need to
be propagated and synchronized to other replicas to maintain consistency across the distributed
system. Various mechanisms, such as eager replication or lazy replication, can be employed to
propagate updates efficiently.

4. Highlight the major difference between DDL, DML, and DCL with one example for each.

DDL, DML, and DCL are three types of SQL commands used in relational databases. The major
differences between them are:

1. DDL (Data Definition Language):

DDL commands are used to define the structure of the database objects such as tables, views,
indexes, and constraints. DDL statements include CREATE, ALTER, and DROP commands. Examples of
DDL statements are:

• CREATE TABLE: creates a new table in the database. For example, CREATE TABLE employees (id
INT, name VARCHAR (50), salary INT).
• ALTER TABLE: modifies the structure of an existing table. For example, ALTER TABLE employees
ADD COLUMN age INT.
• DROP TABLE: removes an existing table from the database. For example, DROP TABLE
employees.
2. DML (Data Manipulation Language):

DML commands are used to manipulate or modify the data stored in the database. DML statements
include SELECT, INSERT, UPDATE, and DELETE commands. Examples of DML statements are:

• SELECT: retrieves data from one or more tables. For example, SELECT name, salary FROM
employees WHERE age > 30;
• INSERT: adds new data to a table. For example, INSERT INTO employees (id, name, salary)
VALUES (1, 'John Doe', 5000);
• UPDATE: modifies existing data in a table. For example, UPDATE employees SET salary = 6000
WHERE name = 'John Doe';
• DELETE: removes data from a table. For example, DELETE FROM employees WHERE age > 60;
3. DCL (Data Control Language):

DCL commands are used to manage user privileges and permissions in the database. DCL statements
include GRANT and REVOKE commands. Examples of DCL statements are:

• GRANT: gives specific privileges to a user or a group of users. For example, GRANT SELECT ON
employees TO user1.
• REVOKE: removes specific privileges from a user or a group of users. For example, REVOKE
SELECT ON employees FROM user1.
5. Define all three types of anomalies.

The three types of anomalies that can occur in a relational database are as follows:

1. Insertion Anomaly:
An insertion anomaly occurs when it is not possible to add certain data to the database without
also adding unrelated data.

2. Update Anomaly:
An update anomaly occurs when modifying data in the table results in inconsistencies or
unintended changes across the database.

3. Deletion Anomaly:
A deletion anomaly occurs when deleting data from a table unintentionally removes other
necessary data or leads to the loss of information that may be needed for other purposes.

6. Identify three types of failure in a database system?

Three types of failure in a database system are:

1. System Failure: System failure occurs when the entire database management system (DBMS) or
any of its crucial components stop functioning properly. This can result from hardware failures,
software crashes, power outages, or network disruptions.

2. Media Failure: Media failure refers to the failure of the storage media that holds the database's
permanent data, such as hard disk failures, disk controller malfunctions, or data corruption.
Media failures can lead to the loss or corruption of critical data, making it inaccessible or
unusable.

3. Transaction Failure: Transaction failure occurs when an individual database transaction cannot
be completed successfully due to various reasons. It can happen because of logical errors in the
application code, violations of integrity constraints, or concurrent access conflicts. Transaction
failures can leave the database in an inconsistent state, where changes made by a transaction
are partially applied but not fully committed or rolled back. Proper error handling and
transaction management techniques, such as the use of rollback and commit mechanisms, are
essential to handle transaction failures and maintain data integrity.
7. Discuss data replication, replicated database, and fully replicated database.

Data replication is the process of creating and maintaining multiple copies of the same data across
different locations or systems. It is commonly used in database systems to enhance data availability,
improve performance, and provide fault tolerance.

A replicated database refers to a database system that employs data replication to maintain multiple
copies of the database across different nodes or servers.

There are different replication strategies that can be employed in a replicated database, depending on
the requirements and goals of the system. One common classification is between partial replication and
full replication.

In partial replication, only a subset of the database is replicated across multiple nodes. This subset could
be based on specific tables, partitions, or data segments.

In contrast, a fully replicated database replicates the entire database across all replicas. Each replica
contains an exact copy of the entire dataset. This approach provides the highest level of data availability
and fault tolerance since any replica can serve read or write requests.

Both partial and fully replicated databases have their advantages and trade-offs, and the choice between
them depends on factors such as system requirements, performance considerations, and resource
constraints.
8. Transaction processing is a key component of a relational database system. Explain
the concept of transaction and provide a diagram to clearly depict the state transaction
for transaction execution.
In a relational database system, a transaction represents a unit of work or a sequence of
operations that need to be executed as a single, indivisible entity. A transaction can consist of
one or more database operations, such as inserting, updating, or deleting records from one or
multiple tables. The concept of a transaction ensures the ACID properties: Atomicity,
Consistency, Isolation, and Durability.

1. Atomicity: A transaction is atomic, meaning it is treated as a single, indivisible


operation. It either executes all its operations successfully or is rolled back to its initial
state if any operation fails. Atomicity ensures that the database remains in a consistent
state.

2. Consistency: A transaction brings the database from one consistent state to another. It
enforces integrity constraints and rules defined on the database, ensuring that data
remains valid and meaningful throughout the transaction's execution.

3. Isolation: Each transaction is isolated from other concurrent transactions, allowing them
to execute independently without interference. Isolation ensures that the intermediate
state of a transaction remains invisible to other transactions until it is committed.

4. Durability: Once a transaction is committed, its changes become permanent and


survive any subsequent failures, such as system crashes or power outages. The
committed data is stored in a durable manner, typically by writing it to disk, ensuring its
availability and persistence.
+------------------------------------+
| Transaction |
| |
| +-------+ +--------+ |
| |Active | | Failed| |
| +-------+ +--------+ |
| | ^ |
| V | |
| +--------+ +--------+
| | Partial| | Aborted|
| +--------+ +--------+
| | ^
| V |
| +--------+ +--------+
| | Committed| | Rolled |
| +--------+ | Back |
| +--------+
+------------------------------------+
• Active: The initial state of a transaction when it starts executing.

• Partial: The transaction has executed some operations but is not yet completed. It may
have made changes to the database that are yet to be committed.

• Committed: The transaction has been completed successfully, and all its changes have
been permanently applied to the database.

• Failed: The transaction has encountered an error or an exceptional condition that


prevents it from continuing. It cannot proceed further and must be aborted or rolled back.

• Aborted: The transaction has been rolled back, undoing any changes it made and
restoring the database to its state before the transaction started.

• Rolled Back: The transaction has been rolled back due to a failure or cancellation. It
undoes all its changes and restores the database to its state before the transaction
started.
9. Discuss the purpose of database concurrency and database recovery.

Database Concurrency:

The purpose of database concurrency is to allow multiple users or applications to access and manipulate
the database simultaneously without causing data inconsistency or conflicts. The main objectives of
database concurrency are:

1. Data Integrity: Concurrency control mechanisms prevent data inconsistencies that can arise
when multiple transactions access or modify the same data concurrently. By coordinating the
execution of transactions, concurrency control ensures that each transaction sees a consistent
and valid view of the database.

2. Performance and Throughput: Concurrency allows for parallel execution of transactions, which
can improve system performance and increase the throughput of database operations. By
enabling multiple transactions to execute simultaneously, the system can make better use of
available resources and reduce overall processing time.

3. User Satisfaction: Concurrency control enhances user satisfaction by allowing multiple users to
access the database concurrently, providing better responsiveness and minimizing delays caused
by contention.

Concurrency control mechanisms, such as locking, timestamps, and multi-version concurrency control
(MVCC), are employed to manage concurrent access and modifications to the database. These
mechanisms ensure that transactions are properly coordinated, serialized, and isolated from each other
to maintain data consistency and prevent conflicts.

Database Recovery:

The purpose of database recovery is to restore the database to a consistent and valid state after a failure
or an abnormal termination of the system. The main objectives of database recovery are:

1. Atomicity and Durability: Recovery mechanisms ensure that transactions exhibit atomicity,
meaning they are either fully executed and committed or fully rolled back. Changes made by
committed transactions are made durable, so they are not lost even in the event of system
failures.
2. Data Consistency: Recovery processes aim to restore the database to a consistent state by
undoing or redoing the effects of incomplete or partially executed transactions. This ensures that
the database remains in a valid and consistent state, adhering to integrity constraints and
business rules.

3. System Resilience: Recovery mechanisms improve the system's resilience and ability to handle
failures. By providing mechanisms to recover from failures, such as system crashes, power
outages, or disk failures, the system can resume normal operation with minimal data loss and
downtime.

Recovery techniques typically involve two main components: logging and checkpointing. Logging records
all changes made by transactions in a log file, allowing for the undoing or redoing of transactions during
recovery. Checkpointing involves periodically saving the state of the database in a stable manner, which
helps in identifying the most recent consistent state to start the recovery process.

Recovery algorithms, such as undo-redo recovery and deferred update recovery, use the information
from logs and checkpoints to restore the database to a consistent state and ensure the durability of
committed transactions. These mechanisms provide a safety net for the database system and help
maintain data integrity even in the face of failures or errors.
10. In the context of transaction processing discuss the concepts of
● Locking
● Commit
● Log File
● Transaction record

1. Locking: Locking is a concurrency control mechanism used to manage concurrent


access to shared resources, such as database objects (e.g., tables, records) or portions
of the database.

2. Commit: Commit is an operation that marks the successful completion of a transaction


and makes its changes permanent in the database. When a transaction commits, all the
changes it made during its execution are applied to the database, and other transactions
can see the modified data.

3. Log File: A log file, also known as a transaction log, is a record of all the changes made
by transactions in a database. The log file is used for recovery purposes and ensures
durability and atomicity.

4. Transaction Record: A transaction record contains information about a specific


transaction, including its unique identifier, the operations it performs on the database,
and its current state.
11. Discuss the three levels of the ANSI-SPARC model, namely, external level, conceptual
level, and internal level. Illustrate your answer with a diagram.

The ANSI-SPARC model, also known as the three-schema architecture, defines three levels: the external
level, conceptual level, and internal level. Here's a concise explanation of each level:

1. External Level (View Level): The external level focuses on the individual user's or application's
perspective of the database. It allows users to define their customized views of the data, tailored
to their specific requirements. Each user view includes only the necessary data and structures
relevant to that user, hiding the complexity of the underlying database. The external level
ensures data privacy, security, and customized access for different users or user groups.

2. Conceptual Level (Logical Level): The conceptual level represents the overall logical view of the
entire database system. It provides a global and integrated view of the data, independent of any
specific application or user requirements. At this level, the database schema is defined, including
entities, relationships, and constraints. The conceptual level ensures data integrity, consistency,
and a comprehensive understanding of the organization's data requirements.

3. Internal Level (Physical Level): The internal level represents the physical implementation details
of the database system. It focuses on the storage structures, access methods, and low-level
implementation details. The internal level describes how the data is stored on disk, indexed, and
organized for efficient retrieval. It deals with issues such as data storage, data compression,
indexing techniques, and data access paths.
+---------------------+
| External Level |
| |
| User Views |
| |
+---------------------+
|
|
+---------------------+
| Conceptual Level |
| |
| Global Schema |
| |
+---------------------+
|
|
+---------------------+
| Internal Level |
| |
| Physical Storage |
| Implementation |
+---------------------+

In the diagram, the external level represents user views that provide customized access to the data. The
conceptual level represents the global schema that defines the overall logical structure of the database.
The internal level represents the physical storage implementation details. Each level encapsulates
different aspects of the database system, providing a clear separation of concerns and allowing for
flexibility, scalability, and data independence.
12. Outline the main functions for the two levels of mapping associated with the
ANSI/SPARC model.

In 1971, DBTG (Database Task Group) realized the requirement for a two-level approach
having views and schema, and afterward, in 1975, ANSI-SPARC realized the need for a three-
level approach with the three levels of abstraction comprises of an external, a conceptual, and
an internal level.
The three-level architecture aims to separate each user’s view of the database from the way the
database is physically represented.
1. External level:
It is the view of how the user views the database. The data of the database that is
relevant to that user is described at this level. The external level consists of several
different external views of the database. In the external view only entities,
attributes, and relationships are included that the user wants. The different views
may have different ways of representing the same data. For example, one user may
view the name in the form (first name, last name), while another may view it as (last
name, first name).

2. Conceptual level:
It is the community view of the database and describes what data is stored in the
database and represents the entities, their attributes, and their relationships. It
represents the semantic, security, and integrity information about the data. The
middle level or the second level in the three-level architecture is the conceptual
level. This level contains the logical structure of the entire database, it represents
the complete view of the database that the organization demands independent of any
storage consideration.

3. Internal level:
At the internal level, the database is represented physically on the computer. It
emphasizes the physical implementation of the database to do storage space
utilization and to achieve the optimal runtime performance, and data encryption
techniques. It interfaces with the operating system to place the data on storage files
and build the storage space, retrieve the data, etc.
13. Discuss the main difference between DDL and DML and give one example SQL
statement for each.

DDL DML
It stands for Data Definition It stands for Data Manipulation
Language. Language.
It is used to create database It is used to add, retrieve or update the
schema and can be used to define data.
some constraints as well.
It basically defines the column It adds or updates the row of the table.
(Attributes) of the table. These rows are called tuples.
It doesn’t have any further It is further classified into Procedural
classification. and Non-Procedural DML.
The basic command present in DDL is BASIC command present in DML
CREATE, DROP, RENAME, ALTER, is UPDATE, INSERT, MERGE , etc.
etc.
DDL does not use the WHERE While DML uses a WHERE clause in its
clause in its statement. statement.
DDL is used to define the structure of DML is used to manipulate the data
a database. within the database.
DDL is used to create and modify DML is used to perform operations on
database objects like tables, indexes, the data within those database objects.
views, and constraints.
DDL statements are typically DML statements are frequently
executed less frequently than DML executed to manipulate and query data.
statements.
DDL statements are typically DML statements are typically executed
executed by database administrators. by application developers or end-users.
DDL statements are not used to DML statements are used to manipulate
manipulate data directly. data directly.
DDL statements do not change the DML statements change the contents of
contents of the database. the database.
Examples of DDL commands: CREATE Examples of DML commands: SELECT,
TABLE, ALTER TABLE, DROP TABLE, INSERT, UPDATE, DELETE, and MERGE.
TRUNCATE TABLE, and RENAME
TABLE.
14. List and briefly outline the three concurrency problems which could occur in a multi-
transaction database environment.
Concurrency control is an essential aspect of database management systems (DBMS) that
ensures transactions can execute concurrently without interfering with each other. However,
concurrency control can be challenging to implement, and without it, several problems can
arise, affecting the consistency of the database. In this article, we will discuss some of the
concurrency problems that can occur in DBMS transactions and explore solutions to prevent
them.
When multiple transactions execute concurrently in an uncontrolled or unrestricted manner,
then it might lead to several problems. These problems are commonly referred to as
concurrency problems in a database environment.
The five concurrency problems that can occur in the database are:
• Temporary Update Problem
• Incorrect Summary Problem
• Lost Update Problem
• Unrepeatable Read Problem
• Phantom Read Problem
These are explained as following below.
Temporary Update Problem:
Temporary update or dirty read problem occurs when one transaction updates an item and fails.
But the updated item is used by another transaction before the item is changed or reverted back
to its last value.
Example:

In the above example, if transaction 1 fails for some reason then X will revert back to its
previous value. But transaction 2 has already read the incorrect value of X.
Incorrect Summary Problem:
Consider a situation, where one transaction is applying the aggregate function on some records
while another transaction is updating these records. The aggregate function may calculate some
values before the values have been updated and others after they are updated.
Example:

In the above example, transaction 2 is calculating the sum of some records while transaction 1
is updating them. Therefore the aggregate function may calculate some values before they have
been updated and others after they have been updated.
Lost Update Problem:
In the lost update problem, an update done to a data item by a transaction is lost as it is
overwritten by the update done by another transaction.
Example:

In the above example, transaction 2 changes the value of X but it will get overwritten by the
write commit by transaction 1 on X (not shown in the image above). Therefore, the update
done by transaction 2 will be lost. Basically, the write commit done by the last
transaction will overwrite all previous write commits.
Unrepeatable Read Problem:
The unrepeatable problem occurs when two or more read operations of the same transaction
read different values of the same variable.
Example:

In the above example, once transaction 2 reads the variable X, a write operation in transaction
1 changes the value of the variable X. Thus, when another read operation is performed by
transaction 2, it reads the new value of X which was updated by transaction 1.
Phantom Read Problem:
The phantom read problem occurs when a transaction reads a variable once but when it tries to
read that same variable again, an error occurs saying that the variable does not exist.
Example:

In the above example, once transaction 2 reads the variable X, transaction 1 deletes the
variable X without transaction 2’s knowledge. Thus, when transaction 2 tries to read X, it is
not able to do it.
From slides:

Database Recovery-Week 3

1. Storage Media

In the context of database recovery, storage media refers to the physical devices or mediums
used to store database files and data. The choice of storage media is crucial as it directly impacts
the durability and reliability of the database. Here are some common types of storage media
used in database systems:

1. Hard Disk Drives (HDD): HDDs are traditional mechanical storage devices that use
rotating disks and read/write heads to store and retrieve data. They offer high capacity
and relatively lower cost, making them suitable for storing large databases. However,
they may have slower read/write speeds compared to other storage media.

2. Solid-State Drives (SSD): SSDs are non-mechanical storage devices that use flash
memory to store data. They provide faster read/write speeds and better performance
compared to HDDs. SSDs are commonly used for database systems that require high-
speed data access.

3. Network-Attached Storage (NAS): NAS is a dedicated storage device connected to a


network. It allows multiple systems or servers to access the storage over a network
connection. NAS provides centralized storage management and can be used for
database backups, replication, and data sharing among multiple systems.

4. Storage Area Network (SAN): SAN is a specialized network infrastructure that


connects storage devices to servers. It enables high-speed data transfer between servers
and storage devices, providing high availability, scalability, and performance for
databases. SANs are often used in enterprise-level database systems that require
advanced storage capabilities.

5. Cloud Storage: Cloud storage refers to storing data on remote servers accessed over
the internet. Cloud storage providers offer scalable, cost-effective, and highly available
storage options for databases. It allows organizations to store and retrieve data from
anywhere with internet connectivity, eliminating the need for on-premises storage
infrastructure.
2. Transaction
A transaction refers to a logical unit of work that consists of one or more database operations. A
transaction represents a series of database actions, such as reading or modifying data, that are
treated as a single indivisible unit. Transactions ensure data integrity and consistency by
enforcing the ACID properties (Atomicity, Consistency, Isolation, and Durability).

3. Database Recovery
Database recovery refers to the process of restoring a database to a consistent and usable state
after a failure or an error has occurred. It involves recovering lost or damaged data, restoring the
database to its last known consistent state, and resuming normal database operations.

Database recovery is crucial for ensuring data integrity and minimizing the impact of failures or
errors. It involves the following key components:

1. Backup: Regularly creating backups of the database is an essential part of database recovery.
Backups serve as a point-in-time copy of the database, allowing for data restoration in the
event of data loss or corruption. Backups can be full backups (copying the entire database)
or incremental backups (copying only the changes since the last backup).

2. Transaction Logs: Transaction logs record all the changes made to the database during
transaction execution. They capture the before and after values of the modified data.
Transaction logs play a vital role in database recovery as they allow for the reconstruction of
the database to a consistent state before the failure occurred.

3. Recovery Manager: The recovery manager is responsible for coordinating and executing the
recovery process. It analyzes the database state, transaction logs, and available backups to
determine the necessary steps for recovery. The recovery manager applies the appropriate
recovery techniques to restore the database to a consistent state.

4. Recovery Techniques: There are two primary recovery techniques used in database
recovery:

5. Rollback/Undo Recovery: Rollback recovery involves undoing or rolling back incomplete or


uncommitted transactions at the time of failure. It ensures that only the changes made by
the completed and committed transactions are applied to the restored database.

6. Redo Recovery: Redo recovery involves applying the changes recorded in the transaction
logs to bring the database to a consistent state. It ensures that all the committed
transactions' changes are reapplied to the restored database.

7. Point-in-Time Recovery: Point-in-time recovery allows for restoring the database to a


specific point in time before the failure occurred. It involves combining the appropriate
backups and transaction logs to reconstruct the database state at the desired time.
8. Importance of Database Recovery?

Database recovery is of utmost importance in ensuring the integrity, availability, and reliability of a
database system. It refers to the process of restoring a database to a consistent state after a failure
or an error. Here are some key reasons why database recovery is important:

1. Data Integrity: Databases store critical information and maintaining data integrity is crucial.
Database recovery ensures that the data remains consistent and accurate even in the event
of failures like system crashes, hardware malfunctions, or power outages. It helps to recover
transactions and bring the database back to a consistent state, minimizing the risk of data
corruption or loss.

2. Business Continuity: For organizations, databases often serve as the backbone of their
operations. Downtime or data loss can result in significant disruptions, financial losses, and
damage to reputation. Database recovery helps in minimizing downtime by restoring the
database to a functional state as quickly as possible. It enables businesses to resume normal
operations and maintain business continuity.

3. Transactional Consistency: Databases handle multiple concurrent transactions that may


modify data simultaneously. In the event of a failure during transaction processing, recovery
mechanisms ensure that either all the changes made by a transaction are applied or none at
all. This ensures transactional consistency, preventing partial updates or incomplete
operations that could lead to data inconsistencies.

4. Compliance and Legal Requirements: Many industries have regulatory and legal
requirements related to data management and retention. Database recovery mechanisms,
such as backups and point-in-time recovery, help organizations meet these requirements.
They provide the ability to recover data from a specific point in time, which is useful in
audits, investigations, or legal disputes.

5. Disaster Recovery: Disasters like natural calamities, fires, or system compromises can result
in complete loss of data if appropriate recovery mechanisms are not in place. Database
recovery includes techniques like backups, replication, and off-site storage that help in
creating redundant copies of data. These measures enable organizations to recover data and
restore the database to a previous state, even in the face of catastrophic events.
6. Performance Optimization: Database recovery mechanisms, such as incremental backups or
log-based recovery, can also be used to improve database performance. By applying only the
necessary changes or recovering specific transactions, these mechanisms minimize the
downtime and resource requirements for recovery operations.

9. Principles of Database Recovery?


The principles of database recovery are based on a set of techniques and strategies that ensure
the restoration of a database to a consistent state following a failure or error. These principles
include:

1. Atomicity: The principle of atomicity states that a transaction is treated as an indivisible unit
of work. During database recovery, this principle ensures that either all the changes made by
a transaction are applied, or none at all. If a failure occurs during the execution of a
transaction, the recovery process ensures that the partially completed transaction is rolled
back to maintain transactional consistency.

2. Consistency: The principle of consistency ensures that a database remains in a valid and
consistent state before and after a failure. During recovery, consistency is achieved by
applying necessary operations to undo the incomplete or inconsistent transactions and redo
the committed transactions. This process brings the database back to a state that satisfies all
integrity constraints and business rules.

3. Isolation: The principle of isolation guarantees that each transaction is executed in isolation
from other transactions until it is committed. During recovery, isolation is maintained by
ensuring that the changes made by uncommitted transactions are undone, while the
changes made by committed transactions are preserved and reapplied. This ensures that the
effects of uncommitted transactions do not persist in the recovered database.

4. Durability: The principle of durability ensures that once a transaction is committed, its
changes are permanent and will survive any subsequent failures. Recovery mechanisms,
such as transaction logs or backups, play a crucial role in achieving durability. By storing
transactional information in a durable manner, the recovery process can restore the
database to a consistent state by applying the logged changes.

5. Write-Ahead Logging (WAL): The write-ahead logging principle ensures that changes made
by transactions are first recorded in a transaction log before being applied to the database.
This log serves as a persistent record of all modifications and is crucial for recovery. During
the recovery process, the transaction log is used to undo or redo transactions to restore the
database to a consistent state.

6. Checkpointing: Checkpointing involves periodically saving the state of the database system
to stable storage. Checkpoints provide recovery points, allowing the recovery process to
start from a known consistent state. During recovery, the checkpoint information helps in
determining which transactions need to be undone or redone.

7. Redo and Undo Operations: Redo and undo operations are fundamental to database
recovery. Redo operations involve reapplying the changes recorded in the transaction log to
bring the database up to date after a failure. Undo operations, on the other hand, involve
rolling back or undoing the changes made by incomplete or uncommitted transactions to
restore transactional consistency.
8. Concept of Transaction?

The concept of a transaction refers to an atomic unit of work that represents a series of actions
or operations performed on a database or system. In computer science and database
management, a transaction ensures the consistency, integrity, and reliability of data by providing
a mechanism to group multiple operations into a single logical unit.

Transactions typically follow the ACID properties, which stand for:

1. Atomicity: A transaction is treated as a single indivisible unit of work. It either executes all its
operations successfully, or if any operation fails, the entire transaction is rolled back, and the
system returns to its original state before the transaction started.

2. Consistency: A transaction takes the system from one consistent state to another consistent
state. It ensures that the database remains in a valid state by enforcing integrity constraints
and data validation rules.

3. Isolation: Each transaction is executed in isolation from other concurrent transactions. This
property ensures that the intermediate state of a transaction is not visible to other
transactions until it is committed. It prevents interference and maintains data integrity.

4. Durability: Once a transaction is committed, its changes become permanent and will survive
any subsequent system failures. The data modifications made by the transaction are stored
in a durable manner to ensure that they are not lost.

Transactions are essential for maintaining data integrity and providing reliable operations in
various systems, such as databases, banking systems, e-commerce platforms, and more. They
allow for complex operations to be treated as a single logical unit, ensuring that the system
remains in a consistent state even in the presence of failures or concurrent access by multiple
users.
5. Role of LOG file
The role of a log file, also known as a transaction log or transaction log file, is crucial in
maintaining the integrity and recoverability of data in a database management system. The log
file records all the modifications made to the database, including insertions, updates, and
deletions, in a sequential manner.
1. Recovery
2. Atomicity and Durability
3. Rollback and Undo Operations
4. Point-in-Time Recovery
5. Replication and High Availability

6. Three types of Failures


1. System Failures: System failures occur when the underlying hardware or software
components of a computer system experience a malfunction or stop functioning properly.
These failures can be caused by power outages, hardware faults, operating system crashes,
software bugs, or network failures. System failures can lead to unexpected shutdowns, loss
of data in memory, or even complete system crashes.

2. Software Failures: Software failures refer to errors, bugs, or faults within an application or
software program. These failures can manifest as incorrect outputs, crashes, freezes, or
unexpected behavior. Software failures are typically caused by programming errors, design
flaws, inadequate testing, or compatibility issues. They can result in data corruption, data
loss, or system instability.

3. Media Failures: Media failures occur when there is a physical problem with the storage
media that holds the data, such as hard disk drives (HDDs) or solid-state drives (SSDs). Media
failures can include disk failures, bad sectors, data corruption, or physical damage to the
storage media. These failures can lead to the loss of data stored on the affected media.

4. Recovery Techniques for system failure


To recover from system failures, various techniques and strategies can be employed depending
on the nature of the failure and the system's architecture. Here are some common recovery
techniques:

1. Checkpointing and Rollback: Checkpointing involves periodically saving the system's state
and data to stable storage. In the event of a system failure, the system can be restored to a
consistent state by rolling back to the latest checkpoint. This technique ensures that any
incomplete or uncommitted transactions are undone, maintaining data integrity. The
rollback process uses the transaction log or other recovery mechanisms to reverse the
effects of transactions.
2. Redundancy and Replication: Redundancy involves duplicating critical components or
systems to ensure fault tolerance. This can include redundant hardware components, such
as redundant power supplies or disk arrays, or redundant system architectures, such as
clustered or distributed systems. Replication, on the other hand, involves maintaining
multiple copies of data or entire systems in real time. In the event of a failure, the replicated
copies can be used to take over the failed system, minimizing downtime and data loss.

3. High Availability (HA) Clustering: HA clustering involves grouping multiple systems together
in a cluster and ensuring that if one system fails, another system in the cluster takes over
seamlessly. This technique provides continuous availability by automatically detecting
failures and redirecting requests to the functioning nodes in the cluster. HA clustering often
involves shared storage and mechanisms for heart beating and failover.

4. Backup and Restore: Regularly backing up critical data and system configurations is crucial
for recovery. Backups can be created at different levels, such as full backups or incremental
backups that only store changes since the last backup. In the event of a system failure, the
system can be restored by using the most recent backup and applying the transaction log or
redo logs to bring the system up to date.

5. Replication and Log Shipping: Replication and log shipping involves replicating data and log
files from a primary system to one or more standby systems. This technique ensures that in
the event of a system failure, the standby systems can take over with minimal data loss.
Replication can be asynchronous or synchronous, depending on the desired level of data
consistency and availability.

6. RAID (Redundant Array of Independent Disks): RAID is a data storage technology that
combines multiple physical disk drives into a single logical unit. Different RAID levels offer
various levels of redundancy and performance. RAID can help protect against disk failures by
providing fault tolerance and the ability to rebuild data from redundant disks.

Database security-Week 5

1. Security as a Major Concern


In the realm of database management, security is a major concern due to the sensitivity and
importance of the data stored in databases. Databases often contain confidential information,
trade secrets, personal data, financial records, and other critical data that must be protected
from unauthorized access, misuse, modification, or destruction. Security breaches can lead to
severe consequences, including financial loss, reputational damage, legal issues, and privacy
violations. As a result, implementing robust security measures is crucial to safeguarding
databases and the information they contain.
2. Types of Security measures

There are various security measures that can be implemented to protect databases. Some of
the common types include:

1. Authentication: Authentication ensures that only authorized individuals or entities can


access the database. This involves verifying the identity of users through credentials such as
usernames, passwords, biometric data, or digital certificates.

2. Authorization: Authorization controls determine the level of access and permissions granted
to authenticated users. It involves defining user roles, access privileges, and restrictions to
ensure that users can only perform authorized actions based on their assigned privileges.

3. Encryption: Encryption is the process of encoding data to prevent unauthorized access.


Database encryption can be applied at different levels, such as encrypting data at rest
(stored data) and encrypting data in transit (data being transmitted between the database
and users or between databases).

4. Auditing and Logging: Auditing involves monitoring and recording database activities, such
as user logins, data modifications, and access attempts. Audit logs provide a record of events
and can be used for security analysis, compliance, and investigation purposes.

5. Data Masking and Redaction: Data masking and redaction techniques involve obscuring or
anonymizing sensitive data to protect its confidentiality. This is especially important when
sharing data with non-privileged users or in non-production environments.

6. Backup and Disaster Recovery: Regular database backups and disaster recovery plans are
crucial for data security. Backups help restore data in case of accidental deletion, data
corruption, or system failures, while disaster recovery plans ensure business continuity and
minimize downtime in the event of a major incident.

3. Access Control

Access control is a fundamental security measure that regulates who can access a database and
what actions they can perform. It involves granting or denying permissions to users based on
their authentication and authorization credentials. Access control mechanisms include user
authentication, user roles, access privileges, and access control lists (ACLs). These mechanisms
ensure that only authorized users can access the database and that they can perform only the
actions permitted by their privileges. Access control helps prevent unauthorized access, data
breaches, and unauthorized modifications to the database, thereby safeguarding the integrity
and confidentiality of the data.
Database Concurrency-Week 6

1. Solution to concurrency problems


Concurrency problems, such as race conditions, deadlocks, and starvation, can be addressed
through various solutions. Here are some common approaches:

1. Synchronization: Synchronization mechanisms, like locks, semaphores, and monitors, can be


used to control access to shared resources. By allowing only one process or thread to access
a resource at a time, synchronization prevents race conditions and ensures mutual exclusion.

2. Deadlock Avoidance: Deadlock avoidance techniques involve careful resource allocation and
scheduling to prevent the possibility of deadlocks. Resource allocation strategies can include
methods like resource ordering, where resources are requested and released in a
predetermined order, or the use of dynamic resource allocation algorithms that ensure that
a safe state is maintained. By avoiding the conditions that lead to deadlocks, the occurrence
of deadlocks can be minimized.

3. Deadlock Detection and Recovery: Deadlock detection involves periodically analyzing the
system to identify potential deadlocks. Techniques such as resource allocation graphs or
wait-for graphs can be used to detect cycles and identify the processes involved in the
deadlock. Once a deadlock is detected, recovery techniques can be applied, such as process
termination, resource preemption, or rolling back operations to a safe state. These
techniques aim to break the deadlock and restore system functionality.

4. Thread/Process Scheduling: Effective thread or process scheduling algorithms can help


alleviate concurrency problems. Schedulers determine the order and priority in which
processes or threads are executed. Techniques like preemptive scheduling, priority-based
scheduling, or round-robin scheduling can be employed to ensure fairness, prevent
starvation, and provide appropriate resource allocation among concurrent processes.

5. Data Consistency Measures: Ensuring data consistency is vital in concurrent environments.


Techniques like atomic operations, transactions, and isolation levels help maintain data
integrity. Atomic operations ensure that a series of operations are executed as a single,
indivisible unit. Transactions provide an all-or-nothing approach, where a set of operations
either complete successfully or are rolled back if an error occurs. Isolation levels define the
level of concurrent access allowed to the data, minimizing interference and conflicts.

6. Concurrency Control Mechanisms: Concurrency control mechanisms, such as optimistic


concurrency control or pessimistic concurrency control, help manage access to shared
resources. Optimistic concurrency control allows concurrent access but resolves conflicts
during commit time, while pessimistic concurrency control uses locks to ensure exclusive
access to resources. These mechanisms ensure consistency and prevent conflicts between
concurrent processes.
7. There are three sides to ACID.
• ENHANCED LONG-TERM MEMORY
• DECREASED SHORT-TERM MEMORY
• AND I FORGOT THE THIRD

Database Concurrency-(Week 7) II

1. Concurrency Situations
2. Three Types of Concurrency Problems
3. Locking: Might Result in Deadlock
4. Deadlock Handling
5. Wait for the Graph for Deadlock Detection and Recovery

Concurrency Situations:

Concurrency situations occur when multiple processes or threads are executing concurrently in a
system. These situations can lead to potential conflicts or coordination issues when accessing shared
resources, such as data or hardware devices. Managing concurrency is important to ensure correct
and efficient execution of concurrent processes and to prevent data inconsistency or conflicts.

Three Types of Concurrency Problems:

The three main types of concurrency problems are:

1. Race Conditions: Race conditions occur when the outcome of a concurrent execution
depends on the specific order or timing of events. It can result in unpredictable or incorrect
behavior when multiple processes access and modify shared data simultaneously.

2. Deadlocks: Deadlocks occur when two or more processes are unable to proceed because
each is waiting for a resource held by another process. This results in a state of deadlock
where none of the processes can continue, leading to a system freeze or deadlock situation.

3. Starvation: Starvation happens when a process is unable to access a resource indefinitely


due to resource allocation policies or priority settings. It can occur when a high-priority
process continuously acquires resources, preventing lower-priority processes from accessing
them.
4. Locking: Might Result in Deadlock:
Locking is a concurrency control mechanism used to coordinate access to shared resources.
However, if locking is not managed properly, it can lead to deadlock situations. Deadlock
occurs when multiple processes hold resources and are waiting for each other's resources,
resulting in a cyclic dependency where no process can proceed. Improper use of locking,
such as incorrect ordering or excessive resource holding, can increase the likelihood of
deadlock occurrence.

Deadlock Handling:

There are several techniques to handle deadlocks:

1. Deadlock Avoidance: This technique involves careful resource allocation and scheduling to
avoid the possibility of deadlocks. It requires predicting potential resource needs in advance
and allocating resources in a way that guarantees safe execution and avoids circular wait
conditions.

2. Deadlock Detection and Recovery: Deadlock detection involves periodically examining the
resource allocation state to identify potential deadlocks. Once detected, recovery techniques
can be applied, such as aborting one or more processes involved in the deadlock or rolling
back their operations to a safe state.

3. Deadlock Prevention: Deadlock prevention techniques aim to eliminate one or more


necessary conditions for deadlock occurrence, such as the hold-and-wait condition, the no-
preemption condition, or the circular-wait condition. By preventing these conditions, the
possibility of deadlock can be eliminated.

Wait-for Graph for Deadlock Detection and Recovery:

The wait-for graph is a graphical representation used for deadlock detection and recovery. It shows
the relationships among processes and resources, indicating which processes are waiting for
resources held by other processes. By analyzing the wait-for graph, it is possible to detect cycles,
which indicate the presence of deadlock. Deadlock recovery techniques, such as resource
preemption or process termination, can be applied based on the information obtained from the
wait-for graph to resolve the deadlock situation and restore system functionality.
Week-8

Query Optimization

Relational database systems provide a system-managed optimization facility by making use of a wealth
of statistical information (metadata) available to the system.

The optimization process makes use of the database statistics stored in the system catalogue or data
dictionary of the database system.

Week-9

1. Database Warehouse
2. Data Mining
3. Big Data
4. Distributed Database

1. Database Warehouse:
A database warehouse is a central repository of integrated and consolidated data from
multiple sources. It is designed to support business intelligence and reporting activities by
providing a unified view of historical and current data. Database warehouses typically store
large volumes of structured data for analytical purposes, allowing organizations to analyze
trends, make informed decisions, and gain insights into their operations.

2. Data Mining:
Data mining is the process of discovering patterns, relationships, and insights from large
datasets. It involves applying various techniques, such as statistical analysis, machine
learning, and pattern recognition, to extract meaningful and actionable information from the
data. Data mining helps identify hidden patterns, trends, and correlations that can be used
for prediction, forecasting, and decision-making in various domains, including marketing,
finance, healthcare, and fraud detection.

3. Big Data:
Big data refers to extremely large and complex datasets that are beyond the capability of
traditional data processing methods to handle effectively. It is characterized by the volume,
velocity, and variety of data generated from various sources such as social media, sensors,
and transactional systems. Big data requires specialized technologies and analytics tools to
capture, store, process, and analyze the data to extract valuable insights and support data-
driven decision-making.

4. Distributed Database:

A distributed database is a collection of interconnected databases that are geographically


distributed across different locations or computer systems. The data in a distributed database is
divided and replicated across multiple nodes or servers to improve performance, scalability, and
availability. Distributed databases allow for parallel processing, fault tolerance, and decentralized
data management, enabling efficient data access and processing in a distributed computing
environment. They are commonly used in scenarios where data needs to be accessed and
shared across multiple locations or when there is a need for high availability and fault tolerance.
Past Questions:

1. The three major components of the relational data model are:

Tables/Relations:
Tables or relations are the fundamental building blocks of the relational data model. They are
used to organize and store data in a structured manner. A table consists of rows (also known as
tuples) and columns (also known as attributes or fields). Each row represents a single record or
instance, while each column represents a specific attribute or characteristic of that record.
Example: Consider a table named "Employees" with columns such as "EmployeeID," "Name,"
"Department," and "Salary." Each row in the table represents a unique employee, and the
columns represent the attributes associated with each employee.

Attributes:
Attributes are the individual data elements or characteristics that define the properties of a table.
Each column in a table represents a specific attribute. Attributes have a name and a data type
associated with them, which define the kind of data that can be stored in that column.
Example: In the "Employees" table mentioned earlier, the attributes could be "EmployeeID"
(integer), "Name" (string), "Department" (string), and "Salary" (numeric).

Relationships:
Relationships define the associations or connections between tables in the relational data model.
They establish how data in one table is related to data in another table. Relationships are
typically based on common attributes or keys present in multiple tables.
Example: Suppose we have another table called "Departments" with columns like
"DepartmentID" and "DepartmentName." The "Department" attribute in the "Employees" table
can be linked to the "DepartmentID" attribute in the "Departments" table. This establishes a
relationship between the two tables, allowing us to retrieve information about the department
associated with each employee.
2. Describe the main differences between database Security and Integrity

1. Data Security :

Data security refers to the prevention of data from unauthorized users. It is only allowed to
access the data to the authorized users. In database, the DBA or head of department can access
all the data. Some users are only allowed to retrieve data, whereas others are allowed to
retrieve as well as to modify the data.

2. Data Integrity :

Data integrity is defined as the data contained in the database is both correct and consistent.
For this purpose, the data stored in the database must satisfy certain types of procedures
(rules). The data in a database must be correct and consistent. DBMS provides different ways
to implement such types of constraints (rules). It can be implemented by rules i.e., Primary
Key, Secondary Key, Foreign key. This improves data integrity in a database.
Difference between Data Security and Data Integrity :
S.No. Data Security Data Integrity
Data security refers to the prevention of Data integrity refers to the quality of
data corruption through the use of data, which assures the data is complete
1. controlled access mechanisms. and has a whole structure.
2. Its motive is the protection of data. Its motive is the validity of data.
Its work is to only the people who should
have access to the data are the only ones Its work is to check the data is correct
3. who can access the data. and not corrupt.
It refers to making sure that data is
accessed by its intended users, thus It refers to the structure of the data and
ensuring the privacy and protection of how it matches the schema of the
4. data. database.
Some of the popular means of data Some of the means to preserve integrity
security are are backing up, error detection,
authentication/authorization, masking, designing a suitable user interface, and
5. and encryptions. correcting data.
It relates to the physical form of data It relates to the logical protection
against accidental or intentional loss or (correct, complete, and consistence) of
6. misuse and destruction. data.
It avoids human error when data is
7. It avoids unauthorized access to data. entered.
It can be implemented by following rule
:
It can be implemented through :
• Primary Key
• user accounts (passwords) • Foreign Key
• authentication schemes • Relationship
8.
3. Describe the term Locking and the basic idea of locking in the context of DBMS.

locking in the context of DBMS

In the context of a Database Management System (DBMS), locking refers to a mechanism used to control
concurrent access to data by multiple transactions. When multiple transactions are accessing or
modifying data simultaneously, there is a possibility of data inconsistencies and conflicts. Locking helps
maintain data integrity and ensures that transactions operate in a consistent and isolated manner.

The basic idea of locking in a DBMS is to prevent conflicts between transactions by allowing only one
transaction to access a particular data item (such as a row, page, or table) at a time. Locks are acquired
and released based on predefined rules and protocols, ensuring that transactions follow a consistent and
controlled process when accessing shared resources.

There are different types of locks used in a DBMS:

1. Shared Lock (Read Lock): This type of lock allows concurrent transactions to read (retrieve) data
but not modify it. Multiple transactions can hold a shared lock on the same data item
simultaneously, as long as no transaction holds an exclusive lock on that item.

2. Exclusive Lock (Write Lock): This type of lock allows a transaction to both read and modify data.
An exclusive lock on a data item restricts other transactions from acquiring either shared or
exclusive locks on the same item until the lock is released.
The locking process involves the following steps:

1. Lock Acquisition: When a transaction needs to access a data item, it requests the appropriate
lock (shared or exclusive) on that item. If the lock is available, the transaction acquires the lock
and proceeds with its operations. If the lock is already held by another transaction, the
requesting transaction may be forced to wait until the lock is released.

2. Lock Release: Once a transaction has completed its operations on a data item, it releases the
lock, allowing other transactions to acquire it.

Locking ensures data consistency by preventing conflicts such as dirty reads (reading uncommitted data),
non-repeatable reads (reading different values for the same data item within a transaction), and lost
updates (overwriting changes made by another transaction).
Describe and compare the two main approaches to database access control, namely:

discretionary access control and mandatory access control?

Discretionary Access Control (DAC):

Discretionary Access Control is an access control model that provides users with the discretion or control
to determine access permissions to their own resources. In DAC, the owner of an object or resource has
the authority to define and manage access rights for that object, including granting or revoking
permissions to other users or entities.

Key features of Discretionary Access Control:

1. Owner's Discretion: The owner of a resource has the freedom to set access controls and
determine who can access the resource and what actions they can perform on it.

2. Access Permissions: DAC uses access control lists (ACLs) or similar mechanisms to define and
manage access permissions. ACLs associate users or groups with specific permissions, such as
read, write, execute, or delete.

3. Flexibility: DAC offers flexibility in terms of granting and revoking permissions, allowing owners
to change access rights as needed. This flexibility is useful in scenarios where users require
varying levels of access to different resources.

4. User Responsibility: The responsibility for managing access permissions lies with the resource
owner. The owner needs to ensure appropriate permissions are assigned and updated to
maintain data security.

Mandatory Access Control (MAC):


Mandatory Access Control is an access control model that enforces access permissions based on
system-level policies and rules rather than individual user discretion. In MAC, access permissions
are determined by predefined security labels or levels assigned to both users and resources.
Key features of Mandatory Access Control:

1. Security Labels: MAC uses security labels or levels to categorize users, objects, and system
resources based on sensitivity, importance, or classification. Examples include security
classifications like Top Secret, Secret, or Unclassified.

2. System-Defined Policies: Access permissions in MAC are determined by system-defined policies


and rules. These policies are typically based on the security labels assigned to users and
resources. Users can only access resources if their security labels match or satisfy the required
security level.

3. Centralized Control: MAC is centrally controlled by a trusted authority, such as a system


administrator or security officer, who defines and enforces the access policies across the system.
Users have limited control over access decisions.

4. Stronger Security: MAC provides a higher level of security and confidentiality as access decisions
are based on strict system-level rules and policies. It is particularly useful in environments that
handle highly sensitive or classified information.
Comparison:

• DAC provides users with more control and flexibility in managing access permissions, while MAC
enforces access permissions based on system-level policies.
• DAC relies on resource owners to manage access permissions, while MAC relies on centralized
control and predefined security policies.
• DAC is more suitable for environments where users need different levels of access to resources,
while MAC is more suitable for high-security environments where access decisions need to be
strictly enforced.
• DAC is easier to implement and maintain, while MAC requires more planning and administration
due to the predefined security policies and labels.
• DAC allows for easier collaboration and sharing of resources, while MAC can restrict access to
resources based on system policies, even if the resource owner wants to grant access.
Briefly describe the reasons why database security is a major concern in modern.

database systems.

Database security is a major concern in modern database systems due to several reasons:

1. Data Breaches: Databases often contain sensitive and valuable information, such as personal
identifiable information (PII), financial records, trade secrets, and intellectual property.
Unauthorized access to this data can lead to data breaches, identity theft, financial loss, or
reputational damage to organizations.

2. Regulatory Compliance: Many industries, such as healthcare, finance, and e-commerce, are
subject to strict data protection and privacy regulations. Organizations must comply with
these regulations, such as the General Data Protection Regulation (GDPR) or the Health
Insurance Portability and Accountability Act (HIPAA), to avoid legal penalties and maintain
trust with customers.

3. Insider Threats: Database systems are vulnerable to insider threats, where authorized
individuals with privileged access can intentionally or unintentionally misuse or disclose
sensitive data. Insider threats can come from disgruntled employees, contractors, or third-
party service providers.

4. External Attacks: Hackers and malicious actors continuously target databases to gain
unauthorized access and exploit vulnerabilities. They may exploit software flaws, weak
authentication mechanisms, SQL injection attacks, or employ other techniques to
compromise the database system and steal or manipulate data.

5. Data Integrity and Availability: Database security also encompasses ensuring the integrity
and availability of data. Unauthorized modifications, accidental data corruption, or denial-of-
service attacks can disrupt business operations, compromise data integrity, and cause
financial and operational harm.

6. Cloud Adoption: With the increasing adoption of cloud-based database systems, security
concerns arise around shared infrastructure, data segregation, and access control.
Organizations must trust cloud service providers to implement robust security measures and
maintain data confidentiality, privacy, and availability.
7. Complexity of Systems: Modern database systems are complex, involving multiple layers,
components, and interconnected systems. Securing such systems requires implementing
security measures at each layer, including the database engine, network, operating system,
and application layers.

8. Authentication and Access Control: Database systems need to implement strong


authentication mechanisms to ensure that only authorized users can access the data. Access
control mechanisms, such as role-based access control (RBAC) and fine-grained access
controls, should be in place to restrict users' privileges based on their roles and
responsibilities.
(b) Briefly explain the purpose of the following SQL statements

i) GRANT Select, Update ON Employee TO Tom;

The purpose of the GRANT statement in SQL is to give specific privileges or permissions to a user or role
for a particular database object. In this case, the statement "GRANT Select, Update ON Employee TO
Tom" grants Tom the privileges to select and update the "Employee" table.

The statement is divided into three parts:

• GRANT: Indicates the start of the grant statement.


• Select, Update: Specifies the privileges being granted. In this case, Tom is being granted the
privilege to select (read) and update (modify) the data in the "Employee" table.
• ON Employee: Specifies the object on which the privileges are being granted. Here, the object is
the "Employee" table.
• TO Tom: Specifies the user or role being granted the privileges. In this case, Tom is the recipient
of the privileges.
• After executing this statement, Tom will be able to perform SELECT and UPDATE operations on
the "Employee" table, allowing him to retrieve and modify the data stored in that table.

ii) REVOKE Update, Delete ON Customer FROM Jerry;

The purpose of the REVOKE statement in SQL is to revoke or remove specific privileges or permissions
from a user or role for a particular database object. In this case, the statement "REVOKE Update, Delete
ON Customer FROM Jerry" revokes the privileges of Jerry to update and delete data from the
"Customer" table.

The statement is also divided into three parts:

• REVOKE: Indicates the start of the revoke statement.


• Update, delete: Specifies the privileges being revoked. In this case, Jerry's privilege to update
and delete data is being revoked.
• ON Customer: Specifies the object from which the privileges are being revoked. Here, the object
is the "Customer" table.
• FROM Jerry: Specifies the user or role from which the privileges are being revoked. In this case,
the privileges are being revoked by Jerry.
• After executing this statement, Jerry will no longer have the privileges to update and delete data
from the "Customer" table. The data modification operations (UPDATE and DELETE) on the
"Customer" table will be restricted for Jerry, ensuring tighter control over data manipulation.
6. Checkpoints play a vital role in database recovery operations. Describe the term

“checkpoint” in the context of database recovery. Your answer should outline the

activities that take place during a checkpoint.

In the context of database recovery, a checkpoint is a crucial mechanism that ensures the consistency
and durability of a database. It represents a point-in-time reference that indicates the state of the
database at a specific moment. The primary purpose of a checkpoint is to provide a reliable starting
point for recovery operations after a system failure or crash.

During a checkpoint, several activities take place to ensure data integrity and facilitate recovery:

1. Flushing dirty buffers: A checkpoint involves writing all modified data pages (buffers) from the
database's cache to the disk. This process ensures that any changes made to the database are
permanently stored on the disk and not just held in memory. By doing so, it prevents the loss of
recently updated data in case of failure.

2. Updating the transaction log: The transaction log records all the changes made to the database.
During a checkpoint, the log is updated to include information about the completed transactions
and the corresponding data pages that have been flushed to disk. This step ensures that the
transaction log reflects the state of the database at the checkpoint.

3. Updating the master file table: The master file table (MFT) is a metadata structure that keeps
track of the locations of database files on disk. During a checkpoint, the MFT is updated to
reflect the changes made to the database file locations as a result of flushing the modified
buffers.

4. Marking the checkpoint in the log: Once all the necessary activities are completed, a checkpoint
record is written to the transaction log. This record serves as a marker, indicating that all the
changes made up to this point have been successfully persisted on disk. It allows the recovery
process to start from this checkpoint in case of a failure.

By performing these activities, a checkpoint establishes a consistent state of the database on disk and
provides a reliable starting point for recovery. In the event of a failure, the recovery process can begin
from the most recent checkpoint, ensuring that the database can be restored to a known and valid state.
7. Explain briefly the difference between a distributed database system and a decentralized database
system.

Difference between Centralized database and Distributed database:

Basis of
S.NO. Comparison Centralized database Distributed database

It is a database that consists of


It is a database that is
multiple databases which are
stored, located as well as
1. Definition connected with each other and are
maintained at a single
spread across different physical
location only.
locations.

The data access time in the


The data access time in the case of
case of multiple users is
2. Access time multiple users is less in a
more in a centralized
distributed database.
database.

The management,
modification, and backup The management, modification,
Management of this database are easier and backup of this database are
3.
of data as the entire data is very difficult as it is spread across
present at the same different physical locations.
location.

This database provides a Since it is spread across different


4. View uniform and complete view locations thus it is difficult to
to the user. provide a uniform view to the user.

This database has more This database may have some data
Data
5. data consistency in replications thus data consistency
Consistency
comparison to distributed is less.
Basis of
S.NO. Comparison Centralized database Distributed database

databases.

The users cannot access In a distributed database, if one


6. Failure the database in case of database fails users have access to
database failure occurs. other databases.

A centralized database is
7. Cost This database is very expensive.
less costly.

Ease of maintenance It is difficult to maintain because of


because the whole of the the distribution of data and
data and information is information at varied places. So,
8. Maintenance
available at a single there is a need to check for data
location and thus, easy to redundancy issues and how to
reach and access. maintain data consistency.

A centralized database is
A distributed database is more
less efficient as data
efficient than a centralized
finding becomes quite
database because of the splitting
9. Efficient complex because of the
up of data at several places which
storing of data and
makes data finding simple and less
information at a particular
time-consuming.
place.

The response speed is The response speed is less in


Response
10. more in comparison to a comparison to a centralized
Speed
distributed database. database.

11. Advantages • Integrity of data • High performance


Basis of
S.NO. Comparison Centralized database Distributed database

• Security because of the division


• Easy access to of workload.
all information • High availability because
• Data is easily of the readiness of
portable available nodes to do
work.
• Independent nodes and
better control over
resources

• Data searching
• It is quite large and
takes time
complex so difficult to
• In case of failure
use and maintain.
of a centralized
• Difficult to provide
server, the
security
12. Disadvantages whole database
• Issue of data integrity
will be lost.
• Increase in storage and
• If multiple users
infrastructure
try to access the
requirements
data at the same
• Handling failures is a
time then it may
quite difficult task
create issues.

• Apache Ignite
• A desktop or • Apache Cassandra
13. Examples server CPU • Apache HBase
• A mainframe • Amazon SimpleDB
computer. • Clusterpoint
• FoundationDB.
There are a number of typical options available for distributing data in a distributed database system.
Demonstrate in detail how horizontal and vertical fragmentation could be performed by providing one
example of each, using the following EMPLOYEE relation.
Horizontal fragmentation involves dividing a relation into multiple fragments based on rows or tuples.
Each fragment contains a subset of the rows from the original relation. Here's an example of
horizontal fragmentation for the EMPLOYEE relation based on the "city" attribute:

Fragment 1: Employees in London

e-no e-name address status salary-grade city

E1355 James 20 High Street full-time G6 London

E0072 Martin 50 Wood Lane part-time G5 London

E0379 Dominic 60 Greenway part-time G5 London

E9947 Spencer 40 Green Avenue full-time G9 London

E1275 Hassan 60 Roman Road part-time G6 London

Fragment 2: Employees in Stansted

e-no e-name address status salary-grade city

E4813 Clare 90 Oak Road fractional G3 Stansted

Fragment 3: Employees in Cambridge

e-no e-name address status salary-grade city

E3358 Jose 10 Tree Road fractional G4 Cambridge


In this example, the EMPLOYEE relation is horizontally fragmented based on the "city" attribute. The
first fragment contains all employees in London, the second fragment includes the employee in
Stansted, and the third fragment consists of the employee in Cambridge. Each fragment represents a
subset of the original relation, preserving the relevant attributes and their values.

Vertical fragmentation involves dividing a relation into multiple fragments based on columns or
attributes. Each fragment contains a subset of the columns from the original relation. Here's an
example of vertical fragmentation for the EMPLOYEE relation based on the "status" and "salary-grade"
attributes:
Fragment 1:

e-no e-name address status

E1355 James 20 High Street full-time

E0072 Martin 50 Wood Lane part-time

E4813 Clare 90 Oak Road fractional

E0379 Dominic 60 Greenway part-time

E3358 Jose 10 Tree Road fractional

E9947 Spencer 40 Green Avenue full-time

E1275 Hassan 60 Roman Road part-time

Fragment 2:

e-no e-name address salary-grade

E1355 James 20 High Street G6

E0072 Martin 50 Wood Lane G5

E4813 Clare 90 Oak Road G3

E0379 Dominic 60 Greenway G5

E3358 Jose 10 Tree Road G4

E9947 Spencer 40 Green Avenue G9

E1275 Hassan 60 Roman Road G6

In this example, the EMPLOYEE relation is vertically fragmented based on the "status" and "salary-
grade" attributes. Fragment 1 contains the "e-no," "e-name," "address," and "status" attributes, while
Fragment 2 includes the "e-no," "e-name," "address," and "salary-grade" attributes. Each fragment
represents a subset of the original relation, focusing on specific attributes.

Horizontal and vertical fragmentation techniques allow for the distribution of data in a distributed
database system. They can enhance performance, scalability, and facilitate localized access to data by
dividing the relation into manageable fragments based on specific criteria.

You might also like