Previous Year Solved Question Paper
Previous Year Solved Question Paper
PART-A
1. Illustrate with an example, the difference between the conceptual data models
and the physical data models.
2. How is weak entity type different from a strong entity type? Give an example.
1
its owner entity to form a unique identifier.
Let's consider an example of a "Bank Account" entity in a banking system.
The "Bank Account" entity has attributes like "AccountNumber," "AccountType," and
"Balance." However, a bank account cannot be uniquely identified on its own. It
requires the existence of an owner entity, such as "Customer," to establish uniqueness.
In this example, the "Bank Account" entity is a weak entity type, and the
"Customer" entity is its owner entity. The combination of the "AccountNumber"
attribute and the "CustomerID" (primary key of the "Customer" entity) forms a unique
identifier for each bank account.
In summary, the key difference between a weak entity type and a strong entity
type is that a strong entity type can be uniquely identified independently, while a
weak entity type relies on an owner entity for identification. The weak entity type's
unique identifier includes attributes from both itself and its owner entity.
Entity integrity constraint is a rule or condition that ensures the uniqueness and
non-nullness of the primary key attribute in a database table. It guarantees that each
instance or row of a table has a unique and non-null value for its primary key attribute.
In other words, it ensures that there are no duplicate or missing primary key values
within a table.
4. Using the following ER diagram, create a relation database. Give your assumptions.
2
PART-B
5. a) With the help of an example, compare DML and DDL.
b) What are logical data independence and physical data independence? What is the
difference between them? Which of these harder to realize? Why?
6. Design an ER diagram to represent the following scenario:A company has many employees
working on a project. An employee can be part of one or more projects. Each employee
works on a project for certain amount of time.Assume suitable attributes for entities and
3
relations. Mark the primary key(s) and the cardinality ratio of the relations
Entities:
1. Employee
2. Project
Relationships:
Sample Diagram
Attributes:
1. Employee:
o Employee ID (Primary Key)
o Employee Name
o Employee Role
o Other employee attributes as needed
2. Project:
o Project ID (Primary Key)
o Project Name
o Project Description
o Other project attributes as needed
3. Works On:
o Employee ID (Foreign Key referencing Employee)
4
o Project ID (Foreign Key referencing Project)
o Start Date
o End Date (if applicable)
4. Duration:
o Employee ID (Foreign Key referencing Employee)
o Project ID (Foreign Key referencing Project)
o Duration (in days)
Cardinality Ratios:
An employee can be part of one or more projects (Many-to-Many). The cardinality ratio between
Employee and Works On will be (0,N) on the Employee side and (0,N) on the Works On side.
A project can have one or more employees working on it (Many-to-Many). The cardinality ratio
between Project and Works On will be (0,N) on the Project side and (0,N) on the Works On side.
Each employee works on a project for a certain amount of time (One-to-One). The cardinality ratio
between Works On and Duration will be (0,1) on both sides.
7. Consider the following relations for a database that keeps track of business trips of
salespersons in a sales office:SALESPERSON(Ssn, Name, StartYear, DeptNo)
TRIP(Ssn, FromCity, ToCity, DepartureDate, ReturnDate, TripId)
*We assume that SALESPERSON relation's Ssn attribute is the primary key.
*We assume that TRIP relation's TripId attribute is the primary key.
*We assume that EXPENSE relation's combination of TripId and AccountNo is
the primary key.
b) Write relation algebra expression to get the details of salespersons who have
travelled between Mumbai and Delhi and the travel expense is greater that Rs.
5
50000.
σ(FromCity = 'Mumbai' ∧ ToCity = 'Delhi' ∧ Amount > 50000) (SALESPERSON
⨝ TRIP ⨝ EXPENSE)
c) Write relation algebra expression to get the details of salesperson who had
incurred the greatest travel expenses among all travels made.
MAX(σ(SALESPERSON.Ssn = TRIP.Ssn ∧
TRIP.TripId = EXPENSE.TripId)(EXPENSE ⨝ TRIP
⨝ SALESPERSON))
PART-C
8. With the help of an example, illustrate the use of SQL TRIGGER.
Benefits of Triggers
*Generating some derived column values automatically
*Enforcing referential integrity
*Event logging and storing information on table access
*Auditing
*Synchronous replication of tables
*Imposing security authorizations
*Preventing invalid transactions
Example :
CREATE [OR REPLACE ] TRIGGER trigger_name
{BEFORE | AFTER }
{INSERT [OR] | UPDATE [OR] | DELETE}
[OF column_name]
ON table_name
[FOR EACH
ROW]
WHEN
(condition)
BEGIN
………
……
END;
9. List the basic data types available for defining attributes in SQL?
To find the closure of attribute A, we need to determine all the attributes that
can be functionally determined by A through the given set of functional dependencies
F.
Starting with A, let's calculate the closure step by step:
1. A → BC (Given)
The closure now includes A, B, and C.
2. C → BD (Given)
Since A determines C, and C determines B and D, we can add B and D to the closure.
The closure now includes A, B, C, and D.
3. BF → E (Given)
Since A determines B, and B in combination with F determines E, we can
add E to the closure.
The closure now includes A, B, C, D, and E.
4. F → D (Given)
Since A determines F, and F determines D, we can add D to the closure.
The closure now includes A, B, C, D, E, and F.
At this point, the closure includes all attributes of the relation R={A, B, C, D, E, F}.
Therefore, the closure of A is {A, B, C, D, E, F}.
To determine if A is a candidate key, we need to check if it is a superkey and if it is
minimal.
A superkey is a set of attributes that can uniquely identify each tuple in a
relation. Since the closure of A includes all attributes of the relation R, A is a
superkey.
To check if A is minimal, we can check if removing any attribute from A would
still be able to uniquely identify each tuple. In this case, removing any attribute from A
would result in the closure not including all attributes of R, and therefore it would not
be able to uniquely identify each tuple. Therefore, A is minimal.
Therefore, A is a candidate key for the relation R.
11. What are fully functional dependencies and partial functional dependencies? Give an
example to distinguish between these?
For example, let's consider a relation called Employees with attributes (EmployeeID,
FirstName, LastName, Address). Here, the primary key is EmployeeID. If we have the
functional dependency EmployeeID → FirstName, it means that for each unique
EmployeeID, there is a unique FirstName associated with it. This is a fully functional
dependency because the entire primary key (EmployeeID) determines the FirstName
attribute.
Let's continue with the Employees example. Suppose we have the following
functional dependency: (EmployeeID, Address) → FirstName. This means that for a
given combination of EmployeeID and Address, there is a unique FirstName
associated with it. Here, the functional dependency is partial because only a part of the
primary key (EmployeeID) is required to determine the FirstName attribute, and the
additional attribute (Address) also plays a role.
PART-D
12. a) Consider the following table MARKS. Why is the table not in 1NF? Reconstruct
The given table "MARKS" is not in the first normal form (1NF) because it
violates the rule that each cell in a table should contain a single atomic value. The
8
table has repeating groups of attributes (Marks, Subject Code, and Subject Name) for
each student, leading to redundancy and difficulty in interpreting the data.
To reconstruct the table into 1NF, we need to separate the repeating groups into
separate tables and establish appropriate relationships between them. Here's the
modified table structure:
Table 1: STUDENTS
- Roll No. (Primary Key)
- Name
Table 2: SUBJECTS
- Subject Code (Primary Key)
- Subject Name
Table 3:
MARKS
- Roll No. (Foreign Key referencing STUDENTS.Roll No.)
- Subject Code (Foreign Key referencing SUBJECTS.Subject Code)
- Marks
The reconstructed tables eliminate the repeating groups and follow 1NF guidelines by
ensuring that each table contains atomic values in each cell. The STUDENTS table
stores information about the students, the SUBJECTS table contains subject
information, and the MARKS table stores the marks obtained by each student in each
subject.
By separating the data into multiple tables and establishing appropriate relationships,
we achieve a normalized structure that adheres to the 1NF requirements.
b) When does a relational scheme is said to be in 3NF? How is BCNF different from
3NF?
A relational schema is said to be in the third normal form (3NF) when it satisfies the
following conditions:
1. It is in the second normal form (2NF).
2. There are no transitive dependencies between non-key attributes.
To understand the difference between 3NF and Boyce-Codd Normal Form
(BCNF), let's first define BCNF. A relational schema is said to be in BCNF if, for
every non-trivial functional dependency X → Y (where X is a superkey), X is a
candidate key. Now, the key differences between 3NF and BCNF are as follows:
1. Dependency Consideration:
- In 3NF, there should be no transitive dependencies between non-key
attributes, meaning that the attributes should not depend on each other through
other attributes.
- In BCNF, all functional dependencies must be determined by candidate keys.
This means that every non-trivial functional dependency should have a superkey on
the left-hand side (LHS).
9
2. Preservation of Dependencies:
- 3NF allows non-key attributes to have transitive dependencies on other
non-key attributes.
- BCNF, on the other hand, eliminates all non-trivial functional dependencies
that are not determined by candidate keys. This ensures that all functional
dependencies are directly determined by the superkey and there are no redundant
dependencies.
i.) COUNT()-Function Returns the number of rows that matches a specified criteria.
Syntax- SELECT COUNT (column_name) FROM table_name WHERE condition;
b) Given a relation R(A,B,C). Find the minimal cover of the set of functional
dependencies given;F= {A→BC, B→C, A→B, AB→C}
To find the minimal cover of the set of functional dependencies, we need to simplify
and eliminate any redundant or extraneous dependencies. Here's the step-by-step
process:
10
2. Eliminate Extraneous Dependencies:
- Check each dependency and eliminate any extraneous attributes from the
left-hand side (LHS).
- For each dependency, we'll check if any attribute on the LHS can be
removed while preserving the closure.
Analyzing F = {A→BC, B→C, AB→C}:
Dependency 1: A→BC
- We can check if any attribute on the LHS can be removed.
- Removing A from the LHS gives us BC.
- Check the closure of BC using the remaining dependencies.
Closure:
- BC+ = BC (already included)
- AB+ = ABC (using AB→C)
- Since BC+ and AB+ both include the attribute C, we can remove A from the
LHS.
- Updated dependency: B→C.
Updated set of functional dependencies: F = {B→C,
AB→C}. The minimal cover of the set of functional
dependencies is: F = {B→C, AB→C}.
We have eliminated redundant dependencies and eliminated extraneous attributes from
the LHS, resulting in the minimal cover.
To determine the key for relation R, we need to identify the attributes that are
functionally dependent on other attributes but not functionally determined by any
subset of attributes. This will give us the candidate key for R.
2NF Decomposition:
11
1. Identify functional dependencies that violate the 2NF requirement, which is to
remove partial dependencies.
2. A partial dependency occurs when a non-key attribute is functionally dependent on
only part of the key.
R1 (B, F)
R2 (A, B, C, D, E, G, H)
3NF Decomposition:
1. Identify functional dependencies that violate the 3NF requirement, which is to
remove transitive dependencies.
2. A transitive dependency occurs when a non-key attribute is functionally dependent
on another non-key attribute.
These relations are in 2NF and 3NF, respectively, following the decomposition
process.
PART-E
15. a) Suppose that we have an ordered file with 400,000 records stored on a disk with
block size 4,096 bytes. File records are of fixed size and are unspanned,with record
length 200 bytes. How many blocks are needed for the file? Approximately, how
many block accesses are required for a binary search in this file? On an average,
how many block accesses are required for a linear search, if the file is nonordered?
To determine the number of blocks needed for the file, we need to divide the total file
size by the block size.
12
Total file size = Number of records × Record length
Total file size = 400,000 records × 200 bytes = 80,000,000 bytes
Number of blocks = Total file size / Block size
Number of blocks = 80,000,000 bytes / 4,096 bytes = 19,531.25 blocks
Since the block size is fixed and records are unspanned, we can't have partial blocks.
Therefore, we need 19,532 blocks for the file.
For a binary search in this file, we can estimate the number of block accesses by
considering that the binary search algorithm reduces the search space by half with
each iteration. In the worst case, we need to continue dividing the search space until
we find the desired record.
The number of block accesses required for a binary search can be calculated as
log2(N), where N is the number of blocks.
Number of block accesses for binary search = log2(19,532) ≈ 14.25 block accesses
On average, for a linear search in a non-ordered file, we need to scan through half of
the file, assuming the desired record is equally likely to be anywhere in the file.
The number of block accesses required for a linear search can be estimated as N/2,
where N is the number of blocks.
Number of block accesses for linear search = 19,532 / 2 = 9,766 block accesses
Keep in mind that these calculations provide estimates based on the assumptions
mentioned. The actual performance may vary depending on factors such as disk
seek time, caching, and file organization.
Example:
Suppose we have an index on the "ID" attribute of the file. The index allows
us to quickly locate the block address where a specific ID is stored.
If we want to search for a record with ID = 500, instead of performing a linear
search through the file, we can use the index to directly find the block address
associated with ID = 500. This significantly reduces the number of block accesses
required for the search.
Let's say the index lookup for ID = 500 gives us the block address as Block 7.
With the index, we only need to access Block 7 to retrieve the desired record,
resulting in just one block access.
In this example, the use of indexing reduces the search time from potentially
thousands of block accesses in a linear search to just one block access. This illustrates
the significant improvement in search efficiency that indexing provides.
16. a) Explain the structure of an internal node and a leaf node in a B+-tree.
13
In a B+-tree, both internal nodes and leaf nodes play essential roles in
organizing and accessing data efficiently. Let's discuss the structure of each node type:
1. Internal Node:
- An internal node in a B+-tree contains an ordered list of key-value pairs and child
pointers.
- The key-value pairs act as separators for the ranges of keys represented by the
child nodes.
- Each key in the internal node represents the largest key value in its left child and
the smallest key value in its right child.
- The child pointers point to the corresponding child nodes.
- The number of key-value pairs in an internal node is typically less than or equal to
the maximum degree of the B+-tree.
- Internal nodes facilitate efficient navigation through the tree structure by guiding the
search path towards the appropriate leaf node.
2. Leaf Node:
- The leaf nodes in a B+-tree store the actual data records or key-value pairs of the
indexed data.
- Each leaf node contains an ordered list of key-value pairs.
- The key-value pairs in the leaf nodes are sorted based on the keys.
- The leaf nodes are connected in a linked list structure, allowing sequential access
to the records.
- Each leaf node also has a pointer to the next leaf node in the linked list.
- The leaf nodes store the complete data records or key-value pairs and are
responsible for storing and retrieving the actual data.
The structure of a B+-tree is optimized for efficient range searches and sequential
access. The internal nodes provide a hierarchy and guide the search path, while the
leaf nodes store the actual data and support efficient range queries and ordered
traversal. This separation of internal and leaf nodes allows for a balanced tree
structure with improved performance characteristics.
b) Illustrate with an example how searching for a record with search key field value is
done using a B+-Tree.
Let's consider an example to illustrate how searching for a record with a search key
field value is done using a B+-tree.
Suppose we have a B+-tree that stores student records, where each record has a search
key field value representing the student ID. The B+-tree has the following structure:
13
Each internal node and leaf node in the B+-tree is shown above, with the key values
represented in square brackets. The arrows represent the child pointers.
Now, let's assume we want to search for a student record with the student ID 10.
1. Starting at the root node, we compare the search key value (10) with the key
values in the internal node [7, 14, 23, 35].
2. Since 10 is less than 14, we follow the left child pointer and move to the next level.
3. Now, we compare the search key value (10) with the key values in the internal
node [9, 12].
4. Again, 10 is less than 12, so we follow the left child pointer.
5. We reach the leaf node [9, 10], which contains the search key value (10).
6. We have found the desired record with the search key field value 10.
In this example, the B+-tree allowed us to efficiently search for the record with the
search key field value 10. We started at the root node and made comparisons to
determine the appropriate child node to follow. By narrowing down the search path
based on the key values in the internal nodes, we reached the leaf node that contained
the desired record.
The structure and organization of the B+-tree, along with its search algorithm, enable
efficient searching and retrieval of records based on their search key field values.
17. Why Concurrency Control Is Needed? What are the different types of problems we
may encounter when two transactions run concurrently? Illustrate each problem with
suitable examples.
15
Example:
Suppose there are two transactions, T1 and T2, both updating the same account
balance concurrently:
T1: Read balance = $500
Deduct $100
Write balance = $400
T2: Read balance = $500
Deduct $50
Write balance = $450
Example:
Consider two transactions, T1 and T2, where T1 updates a customer's phone
number, and T2 reads the updated phone number:
T1: Update customer's phone number to 1234567890 (not yet committed)
T2: Reads the customer's phone number (dirty read)
If T1 rolls back before committing the update, T2 will have read an invalid or
incorrect phone number, leading to data inconsistency.
Example:
Suppose there are two transactions, T1 and T2, where T1 reads a customer's
account balance twice during its execution:
T1: Read account balance (at time T1)
Perform some calculations
Read account balance again (at time T2)
If T2 modifies the account balance in between the two read operations of T1, T1 will
analyze inconsistent versions of the account balance, leading to incorrect calculations
or decisions.
16
maintaining data integrity and consistency in a multi-user database system. Various
concurrency control mechanisms, such as locking, timestamps, and serializability
techniques, are employed to prevent these problems and ensure correct and reliable
execution of concurrent transactions.
1. Atomicity:
Atomicity ensures that a transaction is treated as a single, indivisible unit of
work. It guarantees that either all the operations within a transaction are successfully
completed and permanently saved to the database, or none of them are performed at
all. If any part of the transaction fails, all changes made by the transaction are rolled
back, and the database remains unchanged.
2. Consistency:
Consistency ensures that a transaction brings the database from one consistent
state to another consistent state. The database must satisfy a set of predefined integrity
constraints and rules before and after the execution of a transaction. In other words, a
transaction should preserve the consistency of the data and not violate any integrity
constraints.
3. Isolation:
Isolation ensures that concurrent execution of multiple transactions produces
the same results as if the transactions were executed sequentially, one after another.
Each transaction must operate independently of other concurrently executing
transactions. Isolation prevents interference, such as dirty reads, non-repeatable reads,
and phantom reads, between concurrent transactions.
4. Durability:
Durability guarantees that once a transaction is committed and changes are
saved to the database, they persist even in the event of system failures, such as power
outages or crashes. The changes made by a committed transaction are considered
permanent and should not be lost or undone.
b) “If every transaction in a schedule follows the two-phase locking protocol, the
schedule is guaranteed to be serializable”, justify the statement.
17
The statement "If every transaction in a schedule follows the two-phase
locking protocol, the schedule is guaranteed to be serializable" is true. The
two-phase locking protocol is a concurrency control mechanism that ensures
serializability of transactions in a schedule. Let's justify this statement:
The two-phase locking (2PL) protocol consists of two phases: the growing phase and
the shrinking phase.
1. Growing Phase:
- During the growing phase, a transaction acquires locks on the resources it needs
before performing any updates or reads.
- Once a lock is acquired on a resource (e.g., a database record or a table), it cannot be
released until the transaction completes.
- This phase ensures that the transaction does not release any locks until it has
acquired all the required locks, preventing other transactions from accessing or
modifying the locked resources.
2. Shrinking Phase:
- During the shrinking phase, a transaction releases all the locks it holds after
completing its updates and reads.
- Once a lock is released, it cannot be reacquired by the transaction.
- Releasing the locks allows other transactions to acquire them and proceed with
their operations.
Now, let's see how the two-phase locking protocol guarantees serializability:
1. Conflict Serializability:
- The two-phase locking protocol prevents conflicts between transactions by ensuring
that no two transactions acquire conflicting locks simultaneously.
- Conflicts arise when two transactions try to access or modify the same resource
concurrently, leading to potential data inconsistencies.
- By following the two-phase locking protocol, transactions acquire locks in a way
that avoids conflicts and enforces a serial order of accessing resources.
- This ensures conflict serializability, where the final result of concurrent execution is
equivalent to some serial order of executing the transactions.
c) What are the different types of lock that are commonly used in
concurrency control?In concurrency control, various types of locks are used to manage
concurrent access to shared resources and ensure data integrity. The commonly used
types of locks are:
3. Intent Lock:
- Intent locks are used to indicate the intention of a transaction to acquire locks at
different granularities.
- Intent locks are acquired at higher levels in the lock hierarchy to prevent conflicts
and coordinate lock acquisition at lower levels.
- For example, an intent lock at the table level indicates that the transaction intends
to acquire locks on individual rows or columns within the table.
5. Conversion:
- Conversion locks refer to the process of changing one type of lock into another
without releasing the lock completely.
- For example, a transaction holding a shared lock may request a conversion to an
exclusive lock if it needs to modify the resource.
19
These lock types are used by concurrency control mechanisms, such as the
two-phase locking (2PL) protocol, to control concurrent access to shared resources.
By acquiring and releasing locks based on specific rules and protocols, these locks
help prevent conflicts and maintain data consistency in multi-user database systems.
To optimize the given query, we can follow the rules of heuristics to rearrange
and simplify the query tree. Here's the initial query tree and the optimized query tree:
Explanation:
1. In the initial query tree, the tables INSTRUCTOR and COURSE are joined
first, followed by the TEACHES table. This order can be rearranged to improve
performance.
20
2. The INSTRUCTOR table has a filter condition on the DEPT column with a value
of 'MATHS'. We can push down this filter condition to the TEACHES table, as it has
a JOIN condition with the INSTRUCTOR table based on the ID and DEPT columns.
This helps reduce the intermediate result set early in the query execution.
3. The JOIN between the INSTRUCTOR and COURSE tables remains the same, as
there are no further optimization opportunities within those tables.
4. The JOIN between the COURSE and TEACHES tables remains the same, as the
query condition requires matching the COURSE-ID column in both tables.
By optimizing the query tree using these heuristics, we aim to reduce the
intermediate result set early in the execution process, thereby improving the query
performance.
Big Data refers to vast and complex sets of data that are too large and diverse
to be easily managed and analyzed using traditional methods. It is characterized by
the three V's: Volume (massive amounts of data), Velocity (generated at high speed),
and Variety (diverse data formats). Big Data is valuable for organizations as it enables
them to gain insights, make data-driven decisions, and improve processes. Advanced
technologies like cloud computing and machine learning are used to handle Big Data.
However, privacy, security, and ethical concerns are also important considerations.
Overall, Big Data has the potential to drive innovation and growth across industries.
Semantic web technology enhances the World Wide Web by adding explicit
semantics to web content, allowing machines to understand and process information
more effectively. It includes standards like RDF and ontologies to represent
knowledge and relationships. Semantic web technology improves search, data
integration, knowledge representation, analytics, and decision-making. Its relevance
lies in enabling more precise search results, seamless data integration, interoperability
across diverse sources, automated reasoning, and domain-specific applications.
***********************************************************************************
21