0% found this document useful (0 votes)
4 views

dbms

The document discusses functional dependencies in a relation R(A,B,C,D,E), identifying superkeys and calculating F+ through attribute closures. It also contrasts recoverable and non-recoverable schedules in database transactions, emphasizing the importance of recoverability for maintaining consistency. Additionally, it covers concurrency control mechanisms, particularly lock-based protocols, and outlines the features and advantages of the object-oriented data model, as well as the HBase data model and normalization processes up to 3NF and BCNF.

Uploaded by

dharshutec
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

dbms

The document discusses functional dependencies in a relation R(A,B,C,D,E), identifying superkeys and calculating F+ through attribute closures. It also contrasts recoverable and non-recoverable schedules in database transactions, emphasizing the importance of recoverability for maintaining consistency. Additionally, it covers concurrency control mechanisms, particularly lock-based protocols, and outlines the features and advantages of the object-oriented data model, as well as the HBase data model and normalization processes up to 3NF and BCNF.

Uploaded by

dharshutec
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

13.

a)Consider the relation R(A,B,C,D,E) with Functional dependencies


{A→BC, CD→E, B→D, E → A}. Identify superkeys. Find F*
To find the superkeys and F+, we need to follow these steps:
1. Find the closure of each attribute set.
2. Identify the superkeys.
3. Find F+ (the closure of the set of functional dependencies).
Step 1: Find the closure of each attribute set
- a+ = aBC (using a→BC and B→D)
- b+ = bD (using B→D)
- c+ = c (no dependencies to add)
- d+ = dE (using CD→E)
- e+ = eA (using E→A)
- ab+ = abCD (using a→BC and B→D)
- ac+ = ac (no dependencies to add)
- ad+ = adE (using CD→E)
- ae+ = aeABCD (using E→A and a→BC)
- bc+ = bcD (using B→D)
- bd+ = bdE (using CD→E)
- be+ = beAD (using E→A and B→D)
- cd+ = cdE (using CD→E)
- ce+ = ceA (using E→A)
- de+ = deA (using E→A)
- abc+ = abcDE (using a→BC, B→D, and CD→E)
- abd+ = abdE (using B→D and CD→E)
- abe+ = abeCD (using E→A, a→BC, and B→D)
- acd+ = acdE (using CD→E)
- ace+ = ace (no dependencies to add)
- ade+ = ade (no dependencies to add)
- bcd+ = bcdE (using CD→E)
- bce+ = bceAD (using E→A, B→D)
- bde+ = bdeA (using E→A)
- cde+ = cdeA (using E→A)
- abcd+ = abcde (using a→BC, B→D, CD→E)
- abce+ = abceAD (using E→A, a→BC, B→D)
- abde+ = abdeA (using E→A, B→D)
- acde+ = acdeA (using E→A, CD→E)
- bcde+ = bcdeA (using E→A, CD→E)
- abcde+ = abcde (all attributes)
Step 2: Identify the superkeys
A superkey is an attribute set whose closure contains all attributes.
From the closures calculated above, we can see that the following are superkeys:
- abcde (trivial superkey)
- abce
- abde
- acde
- bcde
- abcd (not minimal, but a superkey)
Step 3: Find F+ (the closure of the set of functional dependencies)
F+ is the set of all functional dependencies that can be derived from the given
dependencies using the Armstrong’s axioms.
After analyzing the given dependencies, we can add the following dependencies to
F+:
- a→D (using a→BC and B→D)
- a→E (using a→BC, B→D, and CD→E)
- ab→E (using a→BC and CD→E)
- ac→E (using CD→E)
- ad→A (using E→A)
- bc→E (using CD→E)
- bd→A (using E→A)
- cd→A (using E→A)
F+ includes all the original dependencies and the additional ones derived above.

14. a) Distinguish recoverable and Non-recoverable schedules. Why is


recoverability of schedules desirable? Are there any circumstances under
which it would be desirable to allow Non-recoverable schedules? Justify your
answer.

Aspects Recoverable schedule Non- Recoverable


schedule
Definition A schedule where a A schedule where a
transaction commits only transaction commits
after all transactions it even if the transaction it
depends on have depends on hasn’t
committed. committed yet.
Consistency Ensures database Can lead to database
consistency by preventing inconsistency if a
a transaction from transaction commits
committing based on based on data from
uncommitted data. another transaction that
is later rolled back.
Roll back scenario If a dependent transaction If a dependent
rolls back, all subsequent transaction is rolled back
transactions that read its after another transaction
data can also be rolled has committed based on
back to maintain its data, it can cause
consistency. inconsistencies.
Commit dependency A transaction will wait to A transaction may
commit until the commit even before the
transactions whose data it transactions whose data
has read have committed. it has read have
committed.
Safety Safer and preferred for Risky and can lead to
maintaining database inconsistent states in the
integrity and consistency. database.
Usage Used in most DBMS Generally avoided due to
implementations to the risk of inconsistency
ensure reliable recovery in the event of
and consistency. transaction failures.

•Recoverable schedules are desirable because failure of a transaction might


otherwise bring the system into an irreversibly inconsistent state.
•A recoverable schedule is one where, for each pair of transactions Ti and Tj such
that Tj reads data items previously written by Ti, the commit operation of Ti
appears before the commit operation of Tj. Recoverable schedules are desirable
because failure of a transaction might otherwise bring the system into an
irreversibly inconsistent state. Non recoverable schedules may sometimes be
needed when updates must be made visible early due to time constraints, even if
they have not yet been committed, which may be required for very long duration
transactions.

b) What is the need for concurrency control mechanisms? Explain the


working of lock-based protocol.
→ Concurrency control is a very important concept of DBMS which ensures the
simultaneous execution or manipulation of data by several processes or user
without resulting in data inconsistency. Concurrency control provides a procedure
that is able to control concurrent execution of the operations in the database.
1. Data Consistency: It prevents data from getting messed up when multiple
users are changing the same information simultaneously.
2. Isolation: Ensures that each transaction is treated as if it’s the only one
happening, so one person’s work doesn’t interfere with another’s.
3. Deadlock Prevention: Helps avoid situations where transactions get stuck
waiting for each other, ensuring smooth processing.
4. Efficiency: Lets multiple transactions happen at the same time without
slowing things down unnecessarily.
5. Durability and Atomicity: Makes sure all changes in a transaction are
either fully completed or not done at all, protecting the integrity of the data.
→In a database management system (DBMS), lock-based concurrency control
(BCC) is used to control the access of multiple transactions to the same data item.
This protocol helps to maintain data consistency and integrity across multiple
users. In the protocol, transactions gain locks on data items to control their access
and prevent conflicts between concurrent transactions.
Lock Based Protocols
A lock is a variable associated with a data item that describes the status of the data
item to possible operations that can be applied to it. They synchronize the access
by concurrent transactions to the database items. It is required in this protocol that
all the data items must be accessed in a mutually exclusive manner. Let me
introduce you to two common locks that are used and some terminology followed
in this protocol.
Types of Lock-Based Protocols
1. Simplistic Lock Protocol
It is the simplest method for locking data during a transaction. Simple lock-based
protocols enable all transactions to obtain a lock on the data before inserting,
deleting, or updating it. It will unlock the data item once the transaction is
completed.
2. Pre-Claiming Lock Protocol
Pre-claiming Lock Protocols assess transactions to determine which data elements
require locks. Before executing the transaction, it asks the DBMS for a lock on all
of the data elements. If all locks are given, this protocol will allow the transaction
to start. When the transaction is finished, it releases all locks. If all of the locks are
not provided, this protocol allows the transaction to be reversed and waits until all
of the locks are granted.
3. Two-phase locking (2PL)
The two-phase locking protocol divides the execution phase of the transaction into
three parts.
•In the first part, when the execution of the transaction starts, it seeks permission
for the lock it requires.
•In the second part, the transaction acquires all the locks. The third phase is started
as soon as the transaction releases its first lock.
•In the third phase, the transaction cannot demand any new locks. It only releases
the acquired locks.

A transaction is said to follow the Two-Phase Locking protocol if Locking and


Unlocking can be done in two phases:
Growing Phase: New locks on data items may be acquired but none can be
released.
Shrinking Phase: Existing locks may be released but no new locks can be
acquired.
4. Strict Two-Phase Locking Protocol
Strict Two-Phase Locking requires that in addition to the 2-PL all Exclusive(X)
locks held by the transaction be released until after the Transaction Commits. The
first phase of Strict-2PL is similar to 2PL. The only difference between 2PL and
strict 2PL is that Strict-2PL does not release a lock after using it.

•Strict-2PL waits until the whole transaction to commit, and then it releases all the
locks at a time.
•Strict-2PL protocol does not have shrinking phase of lock release.

15. a)Describe the features of object- oriented data model.


Need of Object Oriented Data Model :
To represent the complex real world problems there was a need for a data model
that is closely related to real world. Object Oriented Data Model represents the real
world problems easily.
Object Oriented Data Model :
In Object Oriented Data Model, data and their relationships are contained in a
single structure which is referred as object in this data model. In this, real world
problems are represented as objects with different attributes. All objects have
multiple relationships between them. Basically, it is combination of Object
Oriented programming and Relational Database Model as it is clear from the
following figure :
Object Oriented Data Model
= Combination of Object Oriented Programming + Relational database model
Components of Object Oriented Data Model :

Objects –
An object is an abstraction of a real world entity or we can say it is an instance of
class. Objects encapsulates data and code into a single unit which provide data
abstraction by hiding the implementation details from the user. For example:
Instances of student, doctor, engineer in above figure.
Attribute –
An attribute describes the properties of object. For example: Object is STUDENT
and its attribute are Roll no, Branch, Setmarks() in the Student class.
Methods –
Method represents the behavior of an object. Basically, it represents the real-world
action. For example: Finding a STUDENT marks in above figure as Setmarks().
Class –
A class is a collection of similar objects with shared structure i.e. attributes and
behavior i.e. methods. An object is an instance of class. For example: Person,
Student, Doctor, Engineer in above figure.

Class student
{
Char Name[20];
Int roll_no;
--
--
Public:
Void search();
Void update();
}
In this example, students refers to class and S1, S2 are the objects of class which
can be created in main function.
Inheritance –
By using inheritance, new class can inherit the attributes and methods of the old
class i.e. base class. For example: as classes Student, Doctor and Engineer are
inherited from the base class Person.

Advantages of Object Oriented Data Model :


•Codes can be reused due to inheritance.
•Easily understandable.
•Cost of maintenance can reduced due to reusability of attributes and functions
because of inheritance.
Disadvantages of Object Oriented Data Model :
•It is not properly developed so not accepted by users easily.

b)Explain the Hbase data model with example.


•Hbase is a database that is an open-source platform and Hbase is a distributed,
scalable, NoSQL database modeled after Google’s Bigtable and runs on top of
Hadoop’s HDFS (Hadoop Distributed File System).
• The Hbase database is column-oriented thus it makes it unique from other
databases. One of the unique qualities of Hbase is it doesn’t care about data types
because we can store different data types of data for the same column in different
rows.
•It contains different sets of tables that maintain the data in key-value format.
Hbase is best suitable for sparse data sets which are very common in the case of
big data.
•It can be used to manage structured and semi-structured data.
•The Hbase data model is quite different from traditional relational databases.
Components of Hbase data model:
Table: An Hbase table is a collection of rows, and it is similar to a table in a
relational database. However, Hbase tables do not have a fixed schema, meaning
the number of columns and their types are not predefined.
Row Key: Each row in an Hbase table is identified by a unique row key, which is a
byte array. Row keys are stored in lexicographical order, so data access is fast and
efficient when rows are accessed by key.
Column Families: Columns in Hbase are grouped into column families, which are
predefined and stored together on disk. A table must have at least one column
family, and each column family can have multiple columns.Column families are
defined at table creation and are stored as separate files, which makes retrieval of
columns from the same family efficient.
Columns: Columns are identified by their column family and a qualifier. The full
name of a column is formed as column_family:qualifier.
For example, in a column family details, a column name would be represented as
details:name.Columns can be dynamically added to column families.
Cell: The intersection of a row key and a column (column family + qualifier) forms
a cell. Each cell stores a versioned value, where the version is identified by a
timestamp (by default) or a custom version number.The latest version is returned
by default when querying.
Timestamp: Hbase allows storing multiple versions of data within a cell. Each
version is identified by a timestamp, which can either be automatically generated
by Hbase or provided by the user. The latest version is always returned by default
unless specified otherwise.
Example Hbase Data Model:
Let’s consider an example where we store user information in an Hbase table:
Table Name: user
Row Key: User ID (e.g., user1, user2, …)
Column Families: personal, contact
Columns in Column Family personal: name,age
Columns in Column Family contact: email,phone
Row Key Personal: Personal: Contact: email Contact:
name age phone
User 1 John Doe 30 [email protected] 1234567890
User 2 Jane Smith 25 [email protected] 0987654321

In this example:
•The row key uniquely identifies each user.
•The column families personal and contact group related columns.
•The personal column family contains the columns name and age.
•The contact column family contains the columns email and phone.

16. a)Describe normalisation upto 3NF and BCNF with example. State the
desirable properties of decomposition.
Normalization:
→Normalization is the process of organizing the data in the database.
→Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
→Normalization divides the larger table into smaller and links them using
relationships.
→The normal form is used to reduce redundancy from the database table.
1. First Normal Form (1NF):
•A relation will be 1NF if it contains an atomic value.
•It states that an attribute of a table cannot hold multiple values. It must hold
only single-valued attribute.
•First normal form disallows the multi-valued attribute, composite attribute,
and their combinations.
Example:

EMPLOYEE TABLE:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

Decomposition of Employee table into 1NF:


EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab

2. Second Normal Form(2NF):


• In the 2NF, relational must be in 1NF.
• In the second normal form, all non-key attributes are fully functional
dependent on the primary key.
Example: Let's assume, a school can store the data of teachers and the subjects
they teach. In a school, a teacher can teach more than one subject.
TEACHER TABLE
TEACHER_ID SUBJECT TEACHER _AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Maths 38
83 Computer 38

The given table, non-prime attribute TEACHER_AGE is dependent on


TEACHER_ID which is a proper subset of a candidate key. That’s why it violates
the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38

TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Maths
83 Computer

3. Third Normal Form(3NF):


•A relation will be in 3NF if it is in 2NF and not contain any transitive
partial dependency.
•3NF is used to reduce the data duplication. It is also used to achieve the
data integrity.
•If there is no transitive dependency for non-prime attributes, then the
relation must be in third normal form.
Example:
EMPLOYEE _DETAIL table
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noids
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Kathrine 06389 UK Norwich
666 John 462007 MP Bhopal

• Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP


dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY)
transitively dependent on super key(EMP_ID). It violates the rule of third normal
form.
• That’s why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE _ZIP> table, with EMP_ZIP as a primary key.

EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Kathrine 06389
666 John 462007

EMPLOYEE _ZIP table:


EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal

4. Boycee Codd Normal Form(BCNF):


•BCNF is the advance version of 3NF. It is stricter than 3NF.
•A table is in BCNF if every functional dependency X → Y, X is the super
key of the table.
•For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let’s assume there is a company where employees work in more than
one department.
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283


264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
•The Table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are
keys.
•To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
364 UK

EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_TYPE
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549

EMP_DEPT_MAPPING table:
DEPT_TYPE EMP_DEPT_TYPE
D394 283
D394 300
D283 232
D283 549

PROPERTIES OF DECOMPOSITION:
Decomposition refers to the division of tables into multiple tables to produce
consistency in the data. In this article, we will learn about the Database concept.
This article is related to the concept of Decomposition in DBMS. It explains the
definition of Decomposition, types of Decomposition in DBMS, and its properties.
PROPERTIES:
Lossless: All the decomposition that we perform in Database management system
should be lossless. All the information should not be lost while performing the join
on the sub-relation to get back the original relation. It helps to remove the
redundant data from the database.
Dependency Preservation: Dependency Preservation is an important technique in
database management system. It ensures that the functional dependencies between
the entities is maintained while performing decomposition. It helps to improve the
database efficiency, maintain consistency and integrity.
Lack of Data Redundancy: Data Redundancy is generally termed as duplicate
data or repeated data. This property states that the decomposition performed should
not suffer redundant data. It will help us to get rid of unwanted data and focus only
on the useful data or information.

b) Discuss query optimization with a diagram.


• The query optimizer is a critical component of a Database Management System
(DBMS) that determines the most efficient way to execute a given query.
• It generates various query plans and chooses the one with the least cost, ensuring
optimal performance in terms of time and resource utilization.

Steps of Query Optimization:

1. SQL Query Input:


The process starts when a user or application submits an SQL query to the DBMS.
This query specifies the data to be retrieved or manipulated.
2. Parsing:
Parsing is the first step where the SQL query is checked for syntax errors and then
translated into an internal representation, typically an Abstract Syntax Tree (AST).

Steps in Parsing:
•Lexical Analysis: The query is broken down into tokens such as keywords,
identifiers, and operators.Syntactical Analysis: The sequence of tokens is checked
against the SQL grammar rules.
•Semantic Analysis: The query is checked for semantic correctness, such as
verifying that the tables and columns exist.
3.Logical Plan Generation:
Once the query is parsed, a Logical Query Plan is generated. This plan describes
the sequence of high-level operations (e.g., select, project, join) needed to execute
the query.
Steps in Logical Plan Generation:
•Transformation Rules: The logical plan is transformed using a set of rules to
optimize the sequence of operations (e.g., pushing selections down to reduce the
size of intermediate results).

like σ (selection), π (projection), ⋈ (join), etc.


•Relational Algebra: The query is represented using relational algebra operations

4.Cost Estimation:
The Cost Estimator evaluates different possible execution strategies for the query
and assigns a cost to each. The cost usually considers factors like I/O operations,
CPU usage, and memory consumption.
Components of Cost Estimation:
•Statistical Information: The optimizer uses statistics such as table size, index
availability, data distribution, and selectivity of predicates to estimate costs.
•Cost Metrics: Costs are usually estimated in terms of disk I/O (number of page
reads and writes), CPU usage, and response time.
•Plan Alternatives: For each logical operation (e.g., join), multiple physical
methods (e.g., nested loop join, hash join) are considered and their costs are
estimated.
5. Physical Plan Generation:
The logical plan is converted into a Physical Query Plan. This plan specifies the
actual algorithms and data access methods that will be used to execute the query.
Physical Operators:
•Table Access Methods: Methods like table scan, index scan, or index seek are
chosen based on the data and indexes available
•Join Algorithms: The optimizer selects between different join methods (e.g.,
nested loop join, sort-merge join, hash join) based on cost and data characteristics.
•Sorting and Aggregation: Operations like sorting, grouping, and aggregation are
optimized using efficient algorithms like quicksort, hash aggregation, etc.
6. Plan Selection:
The optimizer compares the costs of all possible physical plans and selects the one
with the lowest estimated cost.
•Search Space: The optimizer explores a large search space of possible execution
plans, considering various transformations and join orders.
•Heuristics vs. Exhaustive Search: Some optimizers use heuristics to limit the
search space (e.g., limiting the number of joins considered), while others may use
exhaustive search methods like dynamic programming.
7. Result of query:
The selected query execution plan is passed to the Query Executor, which actually
retrieves the data from the database.
•Execution Strategies:Pipelined Execution: Operators produce outputs that are
immediately consumed by the next operator, reducing the need for intermediate
storage.
•Materialized Execution: Intermediate results are stored temporarily, and
subsequent operations are performed on these stored results.

You might also like