0% found this document useful (0 votes)
8 views

Dbms Imp Notes

The document explains database architecture, detailing its three levels: physical, conceptual, and external, and discusses various types of DBMS architecture including 1-Tier, 2-Tier, and 3-Tier architectures along with their advantages and disadvantages. It also outlines Codd's 12 rules for relational database management systems to ensure data integrity and usability, and describes the roles and responsibilities of a Database Administrator (DBA) in managing databases. Additionally, it defines transactions in a database context, emphasizing the importance of ACID properties and serializability in transaction management.

Uploaded by

rahulgupta.mahe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Dbms Imp Notes

The document explains database architecture, detailing its three levels: physical, conceptual, and external, and discusses various types of DBMS architecture including 1-Tier, 2-Tier, and 3-Tier architectures along with their advantages and disadvantages. It also outlines Codd's 12 rules for relational database management systems to ensure data integrity and usability, and describes the roles and responsibilities of a Database Administrator (DBA) in managing databases. Additionally, it defines transactions in a database context, emphasizing the importance of ACID properties and serializability in transaction management.

Uploaded by

rahulgupta.mahe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 105

1. Explain Database Architecture?

Following are the three levels of database architecture,

1. Physical Level
2. Conceptual Level
3. External Level

In the above diagram,


• It shows the architecture of DBMS.
• Mapping is the process of transforming request response between
various database levels of architecture.
• Mapping is not good for small database, because it takes more time.
• In External / Conceptual mapping, DBMS transforms a request on an
external schema against the conceptual schema.
• In Conceptual / Internal mapping, it is necessary to transform the
request from the conceptual to internal levels.
1. Physical Level
• Physical level describes the physical storage structure of data in
database.
• It is also known as Internal Level.
• This level is very close to physical storage of data.
• At lowest level, it is stored in the form of bits with the physical
addresses on the secondary storage device.
• At highest level, it can be viewed in the form of files.
• The internal schema defines the various stored data types. It uses a
physical data model.
2. Conceptual Level
• Conceptual level describes the structure of the whole database for a
group of users.
• It is also called as the data model.
• Conceptual schema is a representation of the entire content of the
database.
• These schema contains all the information to build relevant external
records.
• It hides the internal details of physical storage.
3. External Level
• External level is related to the data which is viewed by individual end
users.
• This level includes a no. of user views or external schemas.
• This level is closest to the user.
• External view describes the segment of the database that is required
for a particular user group and hides the rest of the database from
that user group.

DBMS Architecture 1-level, 2-Level, 3-Level



••
A Database stores a lot of critical information to access data quickly and securely.
Hence it is important to select the correct architecture for efficient data
management. DBMS Architecture helps users to get their requests done while
connecting to the database. We choose database architecture depending on
several factors like the size of the database, number of users, and relationships
between the users. There are two types of database models that we generally use,
logical model and physical model. Several types of architecture are there in the
database which we will deal with in the next section.
Types of DBMS Architecture
There are several types of DBMS Architecture that we use according to the usage
requirements. Types of DBMS Architecture are discussed here.
• 1-Tier Architecture
• 2-Tier Architecture
• 3-Tier Architecture

1-Tier Architecture
In 1-Tier Architecture the database is directly available to the user, the user can
directly sit on the DBMS and use it that is, the client, server, and Database are all
present on the same machine. For Example: to learn SQL we set up an SQL server
and the database on the local system. This enables us to directly interact with the
relational database and execute operations. The industry won’t use this
architecture they logically go for 2-tier and 3-tier Architecture.

DBMS 1-Tier Architecture

Advantages of 1-Tier Architecture


Below mentioned are the advantages of 1-Tier Architecture.
• Simple Architecture: 1-Tier Architecture is the most simple
architecture to set up, as only a single machine is required to maintain
it.
• Cost-Effective: No additional hardware is required for implementing 1-
Tier Architecture, which makes it cost-effective.
• Easy to Implement: 1-Tier Architecture can be easily deployed, and
hence it is mostly used in small projects.

2-Tier Architecture
The 2-tier architecture is similar to a basic client-server model. The application at
the client end directly communicates with the database on the server side. APIs
like ODBC and JDBC are used for this interaction. The server side is responsible for
providing query processing and transaction management functionalities. On the
client side, the user interfaces and application programs are run. The application
on the client side establishes a connection with the server side to communicate
with the DBMS.
An advantage of this type is that maintenance and understanding are easier, and
compatible with existing systems. However, this model gives poor performance
when there are a large number of users.
DBMS 2-Tier Architecture

Advantages of 2-Tier Architecture


• Easy to Access: 2-Tier Architecture makes easy access to the database,
which makes fast retrieval.
• Scalable: We can scale the database easily, by adding clients or
upgrading hardware.
• Low Cost: 2-Tier Architecture is cheaper than 3-Tier Architecture
and Multi-Tier Architecture.
• Easy Deployment: 2-Tier Architecture is easier to deploy than 3-Tier
Architecture.
• Simple: 2-Tier Architecture is easily understandable as well as simple
because of only two components.

3-Tier Architecture

In 3-Tier Architecture, there is another layer between the client and the server.
The client does not directly communicate with the server. Instead, it interacts with
an application server which further communicates with the database system and
then the query processing and transaction management takes place. This
intermediate layer acts as a medium for the exchange of partially processed data
between the server and the client. This type of architecture is used in the case of
large web applications.
DBMS 3-Tier Architecture

Advantages of 3-Tier Architecture


• Enhanced scalability: Scalability is enhanced due to the distributed
deployment of application servers. Now, individual connections need
not be made between the client and server.
• Data Integrity: 3-Tier Architecture maintains Data Integrity. Since
there is a middle layer between the client and the server, data
corruption can be avoided/removed.
• Security: 3-Tier Architecture Improves Security. This type of model
prevents direct interaction of the client with the server thereby
reducing access to unauthorized data.
Disadvantages of 3-Tier Architecture
• More Complex: 3-Tier Architecture is more complex in comparison to
2-Tier Architecture. Communication Points are also doubled in 3-Tier
Architecture.
• Difficult to Interact: It becomes difficult for this sort of interaction to
take place due to the presence of middle layers.
For more, you can refer to the Advantages and Disadvantages of 3-Tier
Architecture in DBMS.
Conclusion
When it comes to choosing a DBMS architecture, it all comes down to how complex
and scalable the system is. The 3-level structure has the best features and is perfect
for modern, big database systems.

2. State Codd’s 12 Rules for DBMS?

Codd’s rules are proposed by a computer scientist named Dr. Edgar F. Codd
and he also invent the relational model for database management. These
rules are made to ensure data integrity, consistency, and usability. This set of
rules basically signifies the characteristics and requirements of a relational
database management system (RDBMS). In this article, we will learn about
various Codd’s rules.

Codd’s Rules in DBMS

Rule 1: The Information Rule


All information, whether it is user information or metadata, that is stored in a
database must be entered as a value in a cell of a table. It is said that
everything within the database is organized in a table layout.

Rule 2: The Guaranteed Access Rule


Each data element is guaranteed to be accessible logically with a
combination of the table name, primary key (row value), and attribute name
(column value).

Rule 3: Systematic Treatment of NULL Values


Every Null value in a database must be given a systematic and uniform
treatment.

Rule 4: Active Online Catalog Rule


The database catalog, which contains metadata about the database, must be
stored and accessed using the same relational database management
system.

Rule 5: The Comprehensive Data Sublanguage Rule


A crucial component of any efficient database system is its ability to offer an
easily understandable data manipulation language (DML) that facilitates
defining, querying, and modifying information within the database.

Rule 6: The View Updating Rule


All views that are theoretically updatable must also be updatable by the
system.

Rule 7: High-level Insert, Update, and Delete


A successful database system must possess the feature of facilitating high-
level insertions, updates, and deletions that can grant users the ability to
conduct these operations with ease through a single query.

Rule 8: Physical Data Independence


Application programs and activities should remain unaffected when changes
are made to the physical storage structures or methods.

Rule 9: Logical Data Independence


Application programs and activities should remain unaffected when changes
are made to the logical structure of the data, such as adding or modifying
tables.

Rule 10: Integrity Independence


Integrity constraints should be specified separately from application
programs and stored in the catalog. They should be automatically enforced
by the database system.

Rule 11: Distribution Independence


The distribution of data across multiple locations should be invisible to users,
and the database system should handle the distribution transparently.

Rule 12: Non-Subversion Rule


If the interface of the system is providing access to low-level records, then
the interface must not be able to damage the system and bypass security and
integrity constraints.
3. Explain Role and Responsibility of Database
Administrator (DBA)?

Importance of Database Administrator (DBA) :

• Database Administrator manages and controls three levels of


database internal level, conceptual level, and external level of
Database management system architecture and in discussion with
the comprehensive user community, gives a definition of the world
view of the database. It then provides an external view of different
users and applications.
• Database Administrator ensures held responsible to maintain
integrity and security of database restricting from unauthorized
users. It grants permission to users of the database and contains a
profile of each and every user in the database.
• Database Administrators are also held accountable that the
database is protected and secured and that any chance of data loss
keeps at a minimum.
• Database Administrator is solely responsible for reducing the risk
of data loss as it backup the data at regular intervals.

Role and Duties of Database Administrator (DBA) :

• Decides hardware –
They decide on economical hardware, based on cost, performance,
and efficiency of hardware, and best suits the organization. It is
hardware that is an interface between end users and the database.
• Manages data integrity and security –
Data integrity needs to be checked and managed accurately as it
protects and restricts data from unauthorized use. DBA eyes on
relationships within data to maintain data integrity.
• Database Accessibility –
Database Administrator is solely responsible for giving permission
to access data available in the database. It also makes sure who
has the right to change the content.
• Database design –
DBA is held responsible and accountable for logical, physical
design, external model design, and integrity and security control.
• Database implementation –
DBA implements DBMS and checks database loading at the time
of its implementation.
• Query processing performance –
DBA enhances query processing by improving speed, performance,
and accuracy.
• Tuning Database Performance –
If the user is not able to get data speedily and accurately then it
may lose organization’s business. So by tuning SQL commands
DBA can enhance the performance of the database.

Various responsibilities of Database Administrator (DBA) :

• Responsible for designing overall database schema (tables &


fields).
• To select and install database software and hardware.
• Responsible for deciding on access methods and data storage.
• DBA selects appropriate DBMS software like oracle, SQL server or
MySQL.
• Used in designing recovery procedures.
• DBA decides the user access level and security checks for
accessing, modifying or manipulating data.
• DBA is responsible for specifying various techniques for monitoring
the database performance.
• DBA is responsible for operation managements.
• The operation management deals with the data problems which
arises on day-to-day basis, and the responsibilities include are:
1. Investigating if any error is been found in the data.
2. Supervising of restart and recovery procedures in case of any event
failure.
3. Supervising reorganization of the databases.
4. Controlling and handling all periodic dumps of data.

Skills Required for DBA:


1. The various programming and soft skills are required to DBA are as
follows,
• Good communication skills
• Excellent knowledge of databases architecture and design and
RDBMS.
• Knowledge of Structured Query Language (SQL).
2. In addition, this aspect of database administration includes maintenance
of data security, which involves maintaining security authorization tables,
conducting periodic security audits, investigating all known security
breaches.
3. To carry out all these functions, it is crucial that the DBA has all the
accurate information about the company’s data readily on hand. For this
purpose, he maintains a data dictionary.
4. The data dictionary contains definitions of all data items and structures,
the various schemes, the relevant authorization and validation checks and
the different mapping definitions.
5. It should also have information about the source and destination of a data
item and the flow of a data item as it is used by a system. This type of
information is a great help to the DBA in maintaining centralized control of
data.

5. What is transaction? explain transaction states.

DBMS - Transaction

A transaction can be defined as a group of tasks. A single task is


the minimum processing unit which cannot be divided further.

Let’s take an example of a simple transaction. Suppose a bank


employee transfers Rs 500 from A's account to B's account. This
very simple and small transaction involves several low-level
tasks.

A’s Account

Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)

B’s Account

Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)

ACID Properties
A transaction is a very small unit of a program and it may contain
several lowlevel tasks. A transaction in a database system must
maintain Atomicity, Consistency, Isolation, and Durability −
commonly known as ACID properties − in order to ensure
accuracy, completeness, and data integrity.

• Atomicity − This property states that a transaction must be


treated as an atomic unit, that is, either all of its operations
are executed or none. There must be no state in a database
where a transaction is left partially completed. States should
be defined either before the execution of the transaction or
after the execution/abortion/failure of the transaction.
• Consistency − The database must remain in a consistent
state after any transaction. No transaction should have any
adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a
transaction, it must remain consistent after the execution of
the transaction as well.
• Durability − The database should be durable enough to hold
all its latest updates even if the system fails or restarts. If a
transaction updates a chunk of data in a database and
commits, then the database will hold the modified data. If a
transaction commits but the system fails before the data
could be written on to the disk, then that data will be
updated once the system springs back into action.
• Isolation − In a database system where more than one
transaction are being executed simultaneously and in
parallel, the property of isolation states that all the
transactions will be carried out and executed as if it is the
only transaction in the system. No transaction will affect the
existence of any other transaction.

Serializability
When multiple transactions are being executed by the operating
system in a multiprogramming environment, there are
possibilities that instructions of one transactions are interleaved
with some other transaction.

• Schedule − A chronological execution sequence of a


transaction is called a schedule. A schedule can have many
transactions in it, each comprising of a number of
instructions/tasks.
• Serial Schedule − It is a schedule in which transactions are
aligned in such a way that one transaction is executed first.
When the first transaction completes its cycle, then the next
transaction is executed. Transactions are ordered one after
the other. This type of schedule is called a serial schedule,
as transactions are executed in a serial manner.

In a multi-transaction environment, serial schedules are


considered as a benchmark. The execution sequence of an
instruction in a transaction cannot be changed, but two
transactions can have their instructions executed in a random
fashion. This execution does no harm if two transactions are
mutually independent and working on different segments of data;
but in case these two transactions are working on the same data,
then the results may vary. This ever-varying result may bring the
database to an inconsistent state.

To resolve this problem, we allow parallel execution of a


transaction schedule, if its transactions are either serializable or
have some equivalence relation among them.

Equivalence Schedules
An equivalence schedule can be of the following types −

Result Equivalence
If two schedules produce the same result after execution, they
are said to be result equivalent. They may yield the same result
for some value and different results for another set of values.
That's why this equivalence is not generally considered
significant.

View Equivalence

Two schedules would be view equivalence if the transactions in


both the schedules perform similar actions in a similar manner.

For example −

• If T reads the initial data in S1, then it also reads the initial
data in S2.
• If T reads the value written by J in S1, then it also reads the
value written by J in S2.
• If T performs the final write on the data value in S1, then it
also performs the final write on the data value in S2.

Conflict Equivalence

Two schedules would be conflicting if they have the following


properties −

• Both belong to separate transactions.


• Both accesses the same data item.
• At least one of them is "write" operation.

Two schedules having multiple transactions with conflicting


operations are said to be conflict equivalent if and only if −

• Both the schedules contain the same set of Transactions.


• The order of conflicting pairs of operation is maintained in
both the schedules.

Note − View equivalent schedules are view serializable and


conflict equivalent schedules are conflict serializable. All conflict
serializable schedules are view serializable too.

States of Transactions
A transaction in a database can be in one of the following states

• Active − In this state, the transaction is being executed. This


is the initial state of every transaction.
• Partially Committed − When a transaction executes its final
operation, it is said to be in a partially committed state.
• Failed − A transaction is said to be in a failed state if any of
the checks made by the database recovery system fails. A
failed transaction can no longer proceed further.
• Aborted − If any of the checks fails and the transaction has
reached a failed state, then the recovery manager rolls back
all its write operations on the database to bring the
database back to its original state where it was prior to the
execution of the transaction. Transactions in this state are
called aborted. The database recovery module can select
one of the two operations after a transaction aborts −
o Re-start the transaction
o Kill the transaction
• Committed − If a transaction executes all its operations
successfully, it is said to be committed. All its effects are
now permanently established on the database system.

Transaction States in DBMS



••
States through which a transaction goes during its lifetime. These are the states
which tell about the current state of the Transaction and also tell how we will
further do the processing in the transactions. These states govern the rules which
decide the fate of the transaction whether it will commit or abort.
They also use Transaction log. Transaction log is a file maintain by recovery
management component to record all the activities of the transaction. After
commit is done transaction log file is removed.

These are different types of Transaction States :

1. Active State –
When the instructions of the transaction are running then the
transaction is in active state. If all the ‘read and write’ operations are
performed without any error then it goes to the “partially committed
state”; if any instruction fails, it goes to the “failed state”.

2. Partially Committed –
After completion of all the read and write operation the changes are
made in main memory or local buffer. If the changes are made
permanent on the DataBase then the state will change to “committed
state” and in case of failure it will go to the “failed state”.

3. Failed State –
When any instruction of the transaction fails, it goes to the “failed state”
or if failure occurs in making a permanent change of data on Data Base.

4. Aborted State –
After having any type of failure the transaction goes from “failed state”
to “aborted state” and since in previous states, the changes are only
made to local buffer or main memory and hence these changes are
deleted or rolled-back.
5. Committed State –
It is the state when the changes are made permanent on the Data Base
and the transaction is complete and therefore terminated in the
“terminated state”.

6. Terminated State –
If there isn’t any roll-back or the transaction comes from the
“committed state”, then the system is consistent and ready for new
transaction and the old transaction is terminated.

7. Explain timestamp based protocol ?

Timestamp based Concurrency Control


•••
Concurrency Control can be implemented in different ways. One way to
implement it is by using Locks. Now, let us discuss Time Stamp Ordering
Protocol.
As earlier introduced, Timestamp is a unique identifier created by the DBMS to
identify a transaction. They are usually assigned in the order in which they are
submitted to the system. Refer to the timestamp of a transaction T as TS(T). For
the basics of Timestamp, you may refer here.
Timestamp Ordering Protocol –
The main idea for this protocol is to order the transactions based on their
Timestamps. A schedule in which the transactions participate is then serializable
and the only equivalent serial schedule permitted has the transactions in the order
of their Timestamp Values. Stating simply, the schedule is equivalent to the
particular Serial Order corresponding to the order of the Transaction timestamps.
An algorithm must ensure that, for each item accessed by Conflicting
Operations in the schedule, the order in which the item is accessed does not
violate the ordering. To ensure this, use two Timestamp Values relating to each
database item X.
• W_TS(X) is the largest timestamp of any transaction that
executed write(X) successfully.
• R_TS(X) is the largest timestamp of any transaction that
executed read(X) successfully.
Basic Timestamp Ordering –
Every transaction is issued a timestamp based on when it enters the system.
Suppose, if an old transaction Ti has timestamp TS(Ti), a new transaction Tj is
assigned timestamp TS(Tj) such that TS(Ti) < TS(Tj). The protocol manages
concurrent execution such that the timestamps determine the serializability
order. The timestamp ordering protocol ensures that any conflicting read and
write operations are executed in timestamp order. Whenever some
Transaction T tries to issue a R_item(X) or a W_item(X), the Basic TO algorithm
compares the timestamp of T with R_TS(X) & W_TS(X) to ensure that the
Timestamp order is not violated. This describes the Basic TO protocol in the
following two cases.
1. Whenever a Transaction T issues a W_item(X) operation, check the
following conditions:
• If R_TS(X) > TS(T) and if W_TS(X) > TS(T), then abort and
rollback T and reject the operation. else,
• Execute W_item(X) operation of T and set W_TS(X) to TS(T).
2. Whenever a Transaction T issues a R_item(X) operation, check the
following conditions:
• If W_TS(X) > TS(T), then abort and reject T and reject the
operation, else
• If W_TS(X) <= TS(T), then execute the R_item(X) operation of
T and set R_TS(X) to the larger of TS(T) and current R_TS(X).

Whenever the Basic TO algorithm detects two conflicting operations that occur in
an incorrect order, it rejects the latter of the two operations by aborting the
Transaction that issued it. Schedules produced by Basic TO are guaranteed to
be conflict serializable. Already discussed that using Timestamp can ensure that
our schedule will be deadlock free.
One drawback of the Basic TO protocol is that Cascading Rollback is still
possible. Suppose we have a Transaction T1 and T2 has used a value written by T1.
If T1 is aborted and resubmitted to the system then, T2 must also be aborted and
rolled back. So the problem of Cascading aborts still prevails.
Let’s gist the Advantages and Disadvantages of Basic TO protocol:

• Timestamp Ordering protocol ensures serializability since the


precedence graph will be of the form:

Image – Precedence Graph for TS ordering


• Timestamp protocol ensures freedom from deadlock as no transaction
ever waits.
• But the schedule may not be cascade free, and may not even be
recoverable.
Strict Timestamp Ordering –
A variation of Basic TO is called Strict TO ensures that the schedules are both
Strict and Conflict Serializable. In this variation, a Transaction T that issues a
R_item(X) or W_item(X) such that TS(T) > W_TS(X) has its read or write
operation delayed until the Transaction T‘ that wrote the values of X has
committed or aborted.

Advantages :

High Concurrency: Timestamp-based concurrency control allows for a high


degree of concurrency by ensuring that transactions do not interfere with each
other.
Efficient: The technique is efficient and scalable, as it does not require locking
and can handle a large number of transactions.
No Deadlocks: Since there are no locks involved, there is no possibility of
deadlocks occurring.
Improved Performance: By allowing transactions to execute concurrently, the
overall performance of the database system can be improved.

Disadvantages:

Limited Granularity: The granularity of timestamp-based concurrency control


is limited to the precision of the timestamp. This can lead to situations where
transactions are unnecessarily blocked, even if they do not conflict with each
other.
Timestamp Ordering: In order to ensure that transactions are executed in the
correct order, the timestamps need to be carefully managed. If not managed
properly, it can lead to inconsistencies in the database.
Timestamp Synchronization: Timestamp-based concurrency control requires
that all transactions have synchronized clocks. If the clocks are not synchronized,
it can lead to incorrect ordering of transactions.
Timestamp Allocation: Allocating unique timestamps for each transaction can
be challenging, especially in distributed systems where transactions may be
initiated at different locations.

8. Explain different set operations of Relational Algebra?

Introduction of Relational Algebra in DBMS


••

Pre-Requisite: Relational Model in DBMS
Relational Algebra is a procedural query language. Relational algebra mainly
provides a theoretical foundation for relational databases and SQL. The main
purpose of using Relational Algebra is to define operators that transform one or
more input relations into an output relation. Given that these operators accept
relations as input and produce relations as output, they can be combined and used
to express potentially complex queries that transform potentially many input
relations (whose data are stored in the database) into a single output relation (the
query results). As it is pure mathematics, there is no use of English Keywords in
Relational Algebra and operators are represented using symbols.
Fundamental Operators
These are the basic/fundamental operators used in Relational Algebra.
1. Selection(σ)
2. Projection(π)
3. Union(U)
4. Set Difference(-)
5. Set Intersection(∩)
6. Rename(ρ)
7. Cartesian Product(X)
1. Selection(σ): It is used to select required tuples of the relations.
Example:
A B C

1 2 4

2 2 3

3 2 3

4 3 4

For the above relation, σ(c>3)R will select the tuples which have c more than 3.
A B C

1 2 4
A B C

4 3 4

Note: The selection operator only selects the required tuples but does not display
them. For display, the data projection operator is used.
2. Projection(π): It is used to project required column data from a relation.
Example: Consider Table 1. Suppose we want columns B and C from Relation R.
π(B,C)R will show following columns.

B C

2 4

2 3

3 4

Note: By Default, projection removes duplicate data.


3. Union(U): Union operation in relational algebra is the same as union operation
in set theory.
Example:
FRENCH
Student_Name Roll_Number

Ram 01

Mohan 02

Vivek 13

Geeta 17

GERMAN
Student_Name Roll_Number

Vivek 13

Geeta 17

Shyam 21

Rohan 25

Consider the following table of Students having different optional subjects in their
course.
π(Student_Name)FRENCH U π(Student_Name)GERMAN

Student_Name

Ram

Mohan

Vivek

Geeta

Shyam

Rohan

Note: The only constraint in the union of two relations is that both relations must
have the same set of Attributes.
4. Set Difference(-): Set Difference in relational algebra is the same set difference
operation as in set theory.
Example: From the above table of FRENCH and GERMAN, Set Difference is used as
follows
π(Student_Name)FRENCH - π(Student_Name)GERMAN

Student_Name

Ram

Mohan

Note: The only constraint in the Set Difference between two relations is that both
relations must have the same set of Attributes.
5. Set Intersection(∩): Set Intersection in relational algebra is the same set
intersection operation in set theory.
Example: From the above table of FRENCH and GERMAN, the Set Intersection is
used as follows
π(Student_Name)FRENCH ∩ π(Student_Name)GERMAN

Student_Name

Vivek

Geeta

Note: The only constraint in the Set Difference between two relations is that both
relations must have the same set of Attributes.
6. Rename(ρ): Rename is a unary operation used for renaming attributes of a
relation.
ρ(a/b)R will rename the attribute 'b' of the relation by 'a'.
7. Cross Product(X): Cross-product between two relations. Let’s say A and B, so
the cross product between A X B will result in all the attributes of A followed by
each attribute of B. Each record of A will pair with every record of B.
Example:
A
Name Age Sex

Ram 14 M

Sona 15 F
Name Age Sex

Kim 20 M

B
ID Course

1 DS

2 DBMS

AXB
Name Age Sex ID Course

Ram 14 M 1 DS

Ram 14 M 2 DBMS

Sona 15 F 1 DS

Sona 15 F 2 DBMS

Kim 20 M 1 DS

Kim 20 M 2 DBMS

Note: If A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘ n*m ‘ tuples.
Derived Operators
These are some of the derived operators, which are derived from the fundamental
operators.
1. Natural Join(⋈)
2. Conditional Join
1. Natural Join(⋈): Natural join is a binary operator. Natural join between two or
more relations will result in a set of all combinations of tuples where they have an
equal common attribute.
Example:
EMP
Name ID Dept_Name

A 120 IT

B 125 HR

C 110 Sales

D 111 IT

DEPT
Dept_Name Manager

Sales Y

Production Z

IT A

Natural join between EMP and DEPT with condition :


EMP.Dept_Name = DEPT.Dept_Name
EMP ⋈ DEPT
Name ID Dept_Name Manager

A 120 IT A

C 110 Sales Y
Name ID Dept_Name Manager

D 111 IT A

2. Conditional Join: Conditional join works similarly to natural join. In natural


join, by default condition is equal between common attributes while in conditional
join we can specify any condition such as greater than, less than, or not equal.
Example:
R
ID Sex Marks

1 F 45

2 F 55

3 F 60

S
ID Sex Marks

10 M 20

11 M 22

12 M 59

Join between R and S with condition R.marks >= S.marks


R.ID R.Sex R.Marks S.ID S.Sex S.Marks

1 F 45 10 M 20

1 F 45 11 M 22
R.ID R.Sex R.Marks S.ID S.Sex S.Marks

2 F 55 10 M 20

2 F 55 11 M 22

3 F 60 10 M 20

3 F 60 11 M 22

3 F 60 12 M 59

Relational Calculus
As Relational Algebra is a procedural query language, Relational Calculus is a non-
procedural query language. It basically deals with the end results. It always tells
me what to do but never tells me how to do it.
There are two types of Relational Calculus
1. Tuple Relational Calculus(TRC)
2. Domain Relational Calculus(DRC)

8. Mapping Cardinalities ?

Cardinality in DBMS
•••
In database management, cardinality plays an important role. Here cardinality
represents the number of times an entity of an entity set participates in a
relationship set. Or we can say that the cardinality of a relationship is the number
of tuples (rows) in a relationship. Types of cardinality in between tables are:
•one-to-one
• one-to-many
• many-to-one
• many-to-many
Mapping Cardinalities
In a database, the mapping cardinality or cardinality ratio means to denote the
number of entities to which another entity can be linked through a certain relation
set. Mapping cardinality is most useful in describing binary relation sets, although
they can contribute to the description of relation sets containing more than two
entity sets. Here, we will focus only on binary relation sets means we will find the
relation between entity sets A and B for the set R. So we can map any one of
following the cardinality:
1. One-to-one: In this type of cardinality mapping, an entity in A is connected to
at most one entity in B. Or we can say that a unit or item in B is connected to at
most one unit or item in A.

Figure 1

Example:
In a particular hospital, the surgeon department has one head of department. They
both serve one-to-one relationships.

2. One-to-many: In this type of cardinality mapping, an entity in A is associated


with any number of entities in B. Or we can say that one unit or item in B can be
connected to at most one unit or item in A.
Figure 2

Example:
In a particular hospital, the surgeon department has multiple doctors. They serve
one-to-many relationships.

3. Many-to-one: In this type of cardinality mapping, an entity in A is connected to


at most one entity in B. Or we can say a unit or item in B can be associated with
any number (zero or more) of entities or items in A.
Figure 3

Example:
In a particular hospital, multiple surgeries are done by a single surgeon. Such a
type of relationship is known as a many-to-one relationship.

4. Many-to-many: In this type of cardinality mapping, an entity in A is associated


with any number of entities in B, and an entity in B is associated with any number
of entities in A.
Example:
In a particular company, multiple people work on multiple projects. They serve
many-to-many relationships.

The appropriate mapping cardinality for a particular relation set obviously


depends on the real-world situation in which the relation set is modeled.
• If we have cardinality one-to-many or many to one then, we can mix
relational tables with many involved tables.
• If the cardinality is many-to-many we cant mix any two tables.
• If we have a one-to-one relation and we have total participation of one
entity then we can mix that entity with a relation table and if we have
total participation of both entities then we can make one table by mixing
two entities and their relation.

9. Static and Dynamic Hashing

Difference between Static and Dynamic Hashing


Data StructureDBMSDifferences

Hashing is a computation technique in which hashing functions take variable-


length data as input and issue a shortened fixed-length data as output. The
output data is often called a "Hash Code", "Key", or simply "Hash". The data
on which hashing works is called a "Data Bucket".

Characteristics of Hashing Technique


Hashing techniques come with the following characteristics −

• The first characteristic is, hashing technique is deterministic. Means, whatever


number of times you invoke the function on the same test variable, it delivers the
same fixed-length result.
• The second characteristic is its unidirectional action. There is no way you can use the
Key to retrieve the original data. Hashing is irreversible.
What are Hash Functions?
Hash functions are mathematical functions that are executed to generate
addresses of data records. Hash functions use memory locations that store
data, called ‘Data Buckets’.

Hash functions are used in cryptographic signatures, securing privacy of


vulnerable data, and verifying correctness of the received files and texts. In
computation, hashing is used in data processing to locate a single string of
data in an array, or to calculate direct addresses of records on the disk by
requesting its Hash Code or Key.

Applications of Hashing
Hashing is applicable in the following area −

• Password verification
• Associating filename with their paths in operating systems
• Data Structures, where a key-value pair is created in which the key is a unique value,
whereas the value associated with the keys can be either same or different for
different keys.
• Board games such as Chess, tic-tac-toe, etc.
• Graphics processing, where a large amount of data needs to be matched and fetched.

Database Management Systems where phenomenal records are required to be


searched, queried, and matched for retrieval. For example, DBMS used in
banking or large public transport reservation software.
Read through this article to find out more about Hashing and specifically the
difference between two important hashing techniques − static
hashing and dynamic hashing.

What is Static Hashing?


It is a hashing technique that enables users to lookup a definite data set.
Meaning, the data in the directory is not changing, it is "Static" or fixed. In this
hashing technique, the resulting number of data buckets in memory remains
constant.

Operations Provided by Static Hashing

Static hashing provides the following operations −

• Delete − Search a record address and delete a record at the same address or delete a
chunk of records from records for that address in memory.
• Insertion − While entering a new record using static hashing, the hash function (h)
calculates bucket address "h(K)" for the search key (k), where the record is going to
be stored.
• Search − A record can be obtained using a hash function by locating the address of
the bucket where the data is stored.
• Update − It supports updating a record once it is traced in the data bucket.

Advantages of Static Hashing

Static hashing is advantageous in the following ways −

• Offers unparalleled performance for small-size databases.


• Allows Primary Key value to be used as a Hash Key.

Disadvantages of Static Hashing

Static hashing comes with the following disadvantages −

• It cannot work efficiently with the databases that can be scaled.


• It is not a good option for large-size databases.
• Bucket overflow issue occurs if there is more data and less memory.

What is Dynamic Hashing?


It is a hashing technique that enables users to lookup a dynamic data set.
Means, the data set is modified by adding data to or removing the data from,
on demand hence the name ‘Dynamic’ hashing. Thus, the resulting data bucket
keeps increasing or decreasing depending on the number of records.

In this hashing technique, the resulting number of data buckets in memory is


ever-changing.

Operations Provided by Dynamic Hashing

Dynamic hashing provides the following operations −

• Delete − Locate the desired location and support deleting data (or a chunk of data) at
that location.
• Insertion − Support inserting new data into the data bucket if there is a space
available in the data bucket.
• Query − Perform querying to compute the bucket address.
• Update − Perform a query to update the data.

Advantages of Dynamic Hashing

Dynamic hashing is advantageous in the following ways −

• It works well with scalable data.


• It can handle addressing large amount of memory in which data size is always
changing.
• Bucket overflow issue comes rarely or very late.

Disadvantages of Dynamic Hashing

Dynamic hashing comes with the following disadvantage −

• The location of the data in memory keeps changing according to the bucket size. Hence
if there is a phenomenal increase in data, then maintaining the bucket address table
becomes a challenge.

Differences between Static and Dynamic Hashing


Here are some prominent differences by which Static Hashing is different than
Dynamic Hashing −
Key Factor Static Hashing Dynamic Hashing

Form of Fixed-size, non- Variable-size,


Data changing data. changing data.

The resulting Data The resulting


Result Bucket is of fixed- Data Bucket is of
length. variable-length.

Challenge of Bucket Bucket overflow


Bucket overflow can arise can occur very
Overflow often depending upon late or doesn’t
memory size. occur at all.

Complexity Simple Complex

Conclusion
Hashing is a computation technique that uses mathematical functions called
Hash Functions to calculate the location (address) of the data in the memory.
We learnt that there are two different hashing functions namely, Static hashing
and Dynamic hashing.

Each hashing technique is different in terms of whether they work on fixed-


length data bucket or a variable-length data bucket. Selecting a proper hashing
technique is required by considering the amount of data needed to be handled,
and the intended speed of the application.

10. What is MongoDB? Explain its Different functionalities

What is MongoDB – Working and Features



••
MongoDB is an open-source document-oriented database that is designed to
store a large scale of data and also allows you to work with that data very
efficiently. It is categorized under the NoSQL (Not only SQL) database because
the storage and retrieval of data in the MongoDB are not in the form of tables.
The MongoDB database is developed and managed by MongoDB.Inc under
SSPL(Server Side Public License) and initially released in February 2009. It also
provides official driver support for all the popular languages like C, C++, C#, and
.Net, Go, Java, Node.js, Perl, PHP, Python, Motor, Ruby, Scala, Swift, Mongoid. So,
that you can create an application using any of these languages. Nowadays there
are so many companies that used MongoDB like Facebook, Nokia, eBay, Adobe,
Google, etc. to store their large amount of data.

How it works ?
Now, we will see how actually thing happens behind the scene. As we know that
MongoDB is a database server and the data is stored in these databases. Or in
other words, MongoDB environment gives you a server that you can start and
then create multiple databases on it using MongoDB.
Because of its NoSQL database, the data is stored in the collections and
documents. Hence the database, collection, and documents are related to each
other as shown below:

• The MongoDB database contains collections just like the MYSQL


database contains tables. You are allowed to create multiple databases
and multiple collections.
• Now inside of the collection we have documents. These documents
contain the data we want to store in the MongoDB database and a
single collection can contain multiple documents and you are schema-
less means it is not necessary that one document is similar to another.
• The documents are created using the fields. Fields are key-value pairs
in the documents, it is just like columns in the relation database. The
value of the fields can be of any BSON data types like double, string,
boolean, etc.
• The data stored in the MongoDB is in the format of BSON documents.
Here, BSON stands for Binary representation of JSON documents. Or in
other words, in the backend, the MongoDB server converts the JSON
data into a binary form that is known as BSON and this BSON is stored
and queried more efficiently.
• In MongoDB documents, you are allowed to store nested data. This
nesting of data allows you to create complex relations between data
and store them in the same document which makes the working and
fetching of data extremely efficient as compared to SQL. In SQL, you
need to write complex joins to get the data from table 1 and table 2. The
maximum size of the BSON document is 16MB.
NOTE: In MongoDB server, you are allowed to run multiple databases.
For example, we have a database named GeeksforGeeks. Inside this database, we
have two collections and in these collections we have two documents. And in
these documents we store our data in the form of fields. As shown in the below
image:
How mongoDB is different from RDBMS ?
Some major differences in between MongoDB and the RDBMS are as follows:

MongoDB RDBMS

It is a non-relational and document-


oriented database. It is a relational database.

It is suitable for hierarchical data It is not suitable for hierarchical data


storage. storage.

It has a dynamic schema. It has a predefined schema.

It centers around the CAP theorem It centers around ACID properties


(Consistency, Availability, and Partition (Atomicity, Consistency, Isolation, and
tolerance). Durability).

In terms of performance, it is much In terms of performance, it is slower


faster than RDBMS. than MongoDB.

Features of MongoDB –

• Schema-less Database: It is the great feature provided by the


MongoDB. A Schema-less database means one collection can hold
different types of documents in it. Or in other words, in the MongoDB
database, a single collection can hold multiple documents and these
documents may consist of the different numbers of fields, content, and
size. It is not necessary that the one document is similar to another
document like in the relational databases. Due to this cool feature,
MongoDB provides great flexibility to databases.
• Document Oriented: In MongoDB, all the data stored in the documents
instead of tables like in RDBMS. In these documents, the data is stored
in fields(key-value pair) instead of rows and columns which make the
data much more flexible in comparison to RDBMS. And each document
contains its unique object id.
• Indexing: In MongoDB database, every field in the documents is
indexed with primary and secondary indices this makes easier and
takes less time to get or search data from the pool of the data. If the
data is not indexed, then database search each document with the
specified query which takes lots of time and not so efficient.
• Scalability: MongoDB provides horizontal scalability with the help of
sharding. Sharding means to distribute data on multiple servers, here a
large amount of data is partitioned into data chunks using the shard
key, and these data chunks are evenly distributed across shards that
reside across many physical servers. It will also add new machines to a
running database.
• Replication: MongoDB provides high availability and redundancy with
the help of replication, it creates multiple copies of the data and sends
these copies to a different server so that if one server fails, then the
data is retrieved from another server.
• Aggregation: It allows to perform operations on the grouped data and
get a single result or computed result. It is similar to the SQL GROUPBY
clause. It provides three different aggregations i.e, aggregation pipeline,
map-reduce function, and single-purpose aggregation methods
• High Performance: The performance of MongoDB is very high and
data persistence as compared to another database due to its features
like scalability, indexing, replication, etc.
Advantages of MongoDB :

• It is a schema-less NoSQL database. You need not to design the schema


of the database when you are working with MongoDB.
• It does not support join operation.
• It provides great flexibility to the fields in the documents.
• It contains heterogeneous data.
• It provides high performance, availability, scalability.
• It supports Geospatial efficiently.
• It is a document oriented database and the data is stored in BSON
documents.
• It also supports multiple document ACID transition(string from
MongoDB 4.0).
• It does not require any SQL injection.
• It is easily integrated with Big Data Hadoop
Disadvantages of MongoDB :

• It uses high memory for data storage.


• You are not allowed to store more than 16MB data in the documents.
• The nesting of data in BSON is also limited you are not allowed to nest
data more than 100 levels.
MongoDB is a popular open-source NoSQL database management system that falls
under the category of document-oriented databases. It was developed by MongoDB
Inc. and is designed to store, manage, and retrieve data in a flexible, scalable, and
high-performance manner. MongoDB stores data in BSON (Binary JSON) format,
which allows for the representation of complex data structures.

Key Functionalities of MongoDB:

1. Document-Oriented:
• MongoDB stores data in flexible, JSON-like BSON documents. A
document is a set of key-value pairs, where values can include other
documents, arrays, and data types.
2. Schema-less:
• MongoDB is schema-less, meaning that documents in a collection can
have different fields and data types. This flexibility is beneficial for
evolving data models and handling diverse data structures.
3. Indexing:
• MongoDB supports the creation of indexes on any field, similar to
traditional relational databases. Indexing improves query performance
by allowing the database to locate and retrieve data more efficiently.
4. Query Language:
• MongoDB uses a rich query language that supports a wide range of
queries, including field queries, range queries, and regular expression
searches. The query language is expressive and allows for complex
filtering and projection.
5. Aggregation Framework:
• MongoDB provides a powerful aggregation framework that allows
users to perform data transformation and manipulation operations,
such as filtering, grouping, sorting, and projecting. It enables the
execution of complex analytics and reporting tasks directly within the
database.
6. Sharding:
• MongoDB supports horizontal scaling through sharding. Sharding
allows the distribution of data across multiple machines, enabling the
system to handle large datasets and high-throughput workloads.
7. Replication:
• MongoDB supports replica sets, which are clusters of MongoDB servers
that maintain the same data set. Replica sets provide high availability
and fault tolerance by automatically promoting a secondary server in
case the primary server fails.
8. Flexible Storage Engine:
• MongoDB allows users to choose from different storage engines, such
as WiredTiger and MMAPv1, based on specific use cases and
performance requirements.
9. GridFS:
• MongoDB includes GridFS, a specification for storing and retrieving
large files, such as images, videos, and audio files. GridFS enables
efficient storage and retrieval of files exceeding the BSON document
size limit.
10. Geospatial Indexing:
• MongoDB provides support for geospatial indexing, allowing for the
efficient storage and querying of geospatial data, such as coordinates
and shapes.

MongoDB's design and feature set make it suitable for a wide range of applications,
including content management systems, e-commerce platforms, real-time analytics,
and more. Its flexibility and scalability make it a popular choice for developers
working on modern, data-intensive applications.

11. Explain different strengths and weakness of NoSQL

Advantages and Disadvantages of NoSQL Databases

What are the advantages of NoSQL?

Like every other technology, NoSQL databases also offer some benefits and
suffer from some limitations too.

In an era where relational databases are mainly used for data storage and
retrieval, modern web technologies posed a major challenge in the form of
unstructured data, high scale data, enormous concurrency etc.

Relational databases struggled especially to represent highly unstructured data


and high scalability and thus came into being the NoSQL databases.

Let us briefly discuss some advantages and disadvantages of NoSQL databases.

Major advantages of NoSQL databases include:

(i) Flexible Data Model:


NoSQL databases are highly flexible as they can store and combine any type of
data, both structured and unstructured, unlike relational databases that can
store data in a structured way only.

(ii) Evolving Data Model :

NoSQL databases allow you to dynamically update the schema to evolve with
changing requirements while ensuring that it would cause no interruption or
downtime to your application.

(iii) Elastic Scalability:

NoSQL databases can scale to accommodate any type of data growth while
maintaining low cost.

(iv) High Performance:

NoSQL databases are built for great performance, measured in terms of both
throughput (it is a measure of overall performance) and latency (it is the delay
between request and actual response).

(v) Open-source:

NoSQL databases don’t require expensive licensing fees and can run on
inexpensive hardware, rendering their deployment cost-effective.

Major disadvantages of NoSQL databases are:

(i) Lack of Standardization:

There is no standard that defines rules and roles of NoSQL databases. The
design and query languages of NoSQL databases vary widely between different
NoSQL products – much more widely than they do among traditional SQL
databases.
(ii) Backup of Database:

Backups are a drawback in NoSQL databases. Though some NoSQL databases


like MongoDB provide some tools for backup, these tools are not mature
enough to ensure proper complete data backup solution.

(iii) Consistency:

NoSQL puts a scalability and performance first but when it comes to a


consistency of the data NoSQL doesn’t take much consideration so it makes it
little insecure as compared to the relational database e.g., in NoSQL databases if
you enter same set of data again, it will take it without issuing any error whereas
relational databases ensure that no duplicate rows get entry in databases.

NoSQL databases are a diverse set of database management systems that offer
alternatives to traditional relational databases. Each NoSQL database has its own
strengths and weaknesses, and the suitability of a particular NoSQL solution depends
on the specific requirements of the application. Here are some general strengths and
weaknesses associated with NoSQL databases:

Strengths of NoSQL Databases:

1. Scalability:
• Strength: NoSQL databases are often designed to scale horizontally,
allowing them to handle large amounts of data and traffic by adding
more servers to the database cluster.
• Example: MongoDB, Cassandra.
2. Flexibility and Schema-less Design:
• Strength: NoSQL databases, being schema-less, provide flexibility in
terms of data models. They can easily accommodate changes to the
data structure without requiring a predefined schema.
• Example: MongoDB, Couchbase.
3. Performance:
• Strength: NoSQL databases can offer high performance for specific
use cases, particularly those involving read and write operations with
large volumes of data.
• Example: Redis, Cassandra.
4. Handling Unstructured Data:
• Strength: NoSQL databases are well-suited for handling unstructured
or semi-structured data, making them suitable for scenarios where data
formats are evolving.
• Example: MongoDB, Couchbase.
5. Support for Large Data Sets:
• Strength: NoSQL databases are often used in scenarios with large
datasets and can efficiently handle data distributed across multiple
nodes.
• Example: HBase, Cassandra.
6. Horizontal Partitioning (Sharding):
• Strength: NoSQL databases typically support easy horizontal
partitioning, or sharding, which allows for distributing data across
multiple servers to improve performance and scalability.
• Example: MongoDB, Cassandra.

Weaknesses of NoSQL Databases:

1. Consistency and ACID Compliance:


• Weakness: NoSQL databases may sacrifice some level of consistency
and ACID (Atomicity, Consistency, Isolation, Durability) properties in
favor of performance and scalability.
• Example: Eventually Consistent databases like Amazon DynamoDB.
2. Learning Curve:
• Weakness: Adopting NoSQL databases may require a learning curve
for developers accustomed to relational databases. Understanding the
specific query language and data model is crucial.
• Example: Couchbase, Neo4j.
3. Limited Query Capabilities:
• Weakness: NoSQL databases may have limited query capabilities
compared to traditional relational databases, especially for complex
queries involving multiple tables.
• Example: Key-value stores like Redis.
4. Community and Tooling:
• Weakness: Some NoSQL databases may have a smaller community
and fewer tools compared to mature relational databases, which could
impact support and available resources.
• Example: ArangoDB, OrientDB.
5. Lack of Standardization:
• Weakness: NoSQL databases lack a standardized query language like
SQL, making it challenging to switch between different NoSQL
databases without significant adjustments.
• Example: MongoDB, Cassandra.
6. Security and Maturity:
• Weakness: Some NoSQL databases may be less mature than their
relational counterparts, and security features may not be as well-
established.
• Example: Some less widely adopted NoSQL databases.

In summary, the choice between NoSQL and traditional relational databases depends
on the specific needs of the application. NoSQL databases excel in certain use cases,
providing scalability, flexibility, and performance, but they may have trade-offs in
terms of consistency and ease of use. Understanding the strengths and weaknesses is
crucial for making an informed decision based on the requirements of a particular
project.

12. Explain Data Encryption and Decryption

In the context of a Database Management System (DBMS), data encryption and


decryption are crucial for ensuring the security of sensitive information stored in the
database. These processes help protect data confidentiality and prevent
unauthorized access. Here's an overview of how data encryption and decryption work
in a DBMS:

Data Encryption in DBMS:

1. Sensitive Data Identification:


• Identify the data that needs to be protected. This often includes
personally identifiable information (PII), financial data, or any other
sensitive information.
2. Encryption Process:
• Use encryption algorithms and methods to convert the sensitive
plaintext data into ciphertext. This is typically performed at the column
or field level, focusing on specific data elements rather than encrypting
the entire database.
3. Encryption Keys:
• Generate or use encryption keys that are required for the encryption
process. The security of the encrypted data relies on the strength of
these keys. Key management is a critical aspect of encryption in a
DBMS.
4. Stored Encrypted Data:
• Store the encrypted data in the database. The ciphertext appears as
random, unreadable characters without the appropriate decryption key.

Data Decryption in DBMS:

1. Authorized Access:
• When an authorized user needs to access the encrypted data, they
must go through an authentication process to verify their identity.
2. Decryption Key:
• Obtain the appropriate decryption key. This key is used in conjunction
with the decryption algorithm to convert the ciphertext back into
plaintext.
3. Decryption Process:
• Apply the decryption algorithm to the encrypted data using the
decryption key. This reverses the encryption process and transforms the
ciphertext back into readable plaintext.
4. Access to Original Data:
• Once decrypted, the user has access to the original, human-readable
data. The decrypted data is temporarily available for the authorized
user to perform necessary operations.

Use Cases in DBMS:

1. Data-at-Rest Encryption:
• Protects data stored on disk or other storage media. Even if someone
gains physical access to the storage, they won't be able to access the
actual data without the decryption key.
2. Data-in-Transit Encryption:
• Secures data as it is transmitted between the database server and
clients. This prevents eavesdropping or interception of sensitive
information during communication.
3. Column-Level Encryption:
• Encrypts specific columns containing sensitive information within
database tables. This allows for a granular approach to protecting only
the most critical data.
4. Compliance Requirements:
• Many industries have regulatory requirements (e.g., GDPR, HIPAA) that
mandate the use of encryption to protect sensitive data, and DBMS
encryption helps organizations comply with these regulations.
5. Multi-Tenancy Security:
• In multi-tenant environments where multiple users or entities share the
same database, encryption ensures that each entity's data remains
confidential and isolated from others.

Implementing encryption and decryption in a DBMS requires careful consideration of


key management, performance implications, and integration with authentication
mechanisms. It's essential to strike a balance between security and usability, ensuring
that authorized users can efficiently access and work with the data while protecting it
from unauthorized access.

13. Different types of data Base systems.

Types of Databases
There are various types of databases used for storing different varieties of data:

1) Centralized Database
It is the type of database that stores data at a centralized database system. It comforts
the users to access the stored data from different locations through several
applications. These applications contain the authentication process to let users access
data securely. An example of a Centralized database can be Central Library that carries
a central database of each library in a college/university.

Advantages of Centralized Database

o It has decreased the risk of data management, i.e., manipulation of data will not affect
the core data.
o Data consistency is maintained as it manages data in a central repository.
o It provides better data quality, which enables organizations to establish data standards.
o It is less costly because fewer vendors are required to handle the data sets.

Disadvantages of Centralized Database

o The size of the centralized database is large, which increases the response time for
fetching the data.
o It is not easy to update such an extensive database system.
o If any server failure occurs, entire data will be lost, which could be a huge loss.

2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed among
different database systems of an organization. These database systems are connected
via communication links. Such links help the end-users to access the data
easily. Examples of the Distributed database are Apache Cassandra, HBase, Ignite, etc.

We can further divide a distributed database system into:


o Homogeneous DDB: Those database systems which execute on the same operating
system and use the same application process and carry the same hardware devices.
o Heterogeneous DDB: Those database systems which execute on different operating
systems under different application procedures, and carries different hardware devices.

Advantages of Distributed Database

o Modular development is possible in a distributed database, i.e., the system can be


expanded by including new computers and connecting them to the distributed system.
o One server failure will not affect the entire data set.

3) Relational Database
This database is based on the relational data model, which stores data in the form of
rows(tuple) and columns(attributes), and together forms a table(relation). A relational
database uses SQL for storing, manipulating, as well as maintaining the data. E.F. Codd
invented the database in 1970. Each table in the database carries a key that makes the
data unique from others. Examples of Relational databases are MySQL, Microsoft SQL
Server, Oracle, etc.

Properties of Relational Database


There are following four commonly known properties of a relational model known as
ACID properties, where:
A means Atomicity: This ensures the data operation will complete either with success
or with failure. It follows the 'all or nothing' strategy. For example, a transaction will
either be committed or will abort.

C means Consistency: If we perform any operation over the data, its value before and
after the operation should be preserved. For example, the account balance before and
after the transaction should be correct, i.e., it should remain conserved.

I means Isolation: There can be concurrent users for accessing data at the same time
from the database. Thus, isolation between the data should remain isolated. For
example, when multiple transactions occur at the same time, one transaction effects
should not be visible to the other transactions in the database.

D means Durability: It ensures that once it completes the operation and commits the
data, data changes should remain permanent.

4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of
data sets. It is not a relational database as it stores data not only in tabular form but in
several different ways. It came into existence when the demand for building modern
applications increased. Thus, NoSQL presented a wide variety of database technologies
in response to the demands. We can further divide a NoSQL database into the
following four types:
a. Key-value storage: It is the simplest type of database storage where it stores every
single item as a key (or attribute name) holding its value, together.
b. Document-oriented Database: A type of database used to store data as JSON-like
document. It helps developers in storing data by using the same document-model
format as used in the application code.
c. Graph Databases: It is used for storing vast amounts of data in a graph-like structure.
Most commonly, social networking websites use the graph database.
d. Wide-column stores: It is similar to the data represented in relational databases. Here,
data is stored in large columns together, instead of storing in rows.

Advantages of NoSQL Database

o It enables good productivity in the application development as it is not required to


store data in a structured format.
o It is a better option for managing and handling large data sets.
o It provides high scalability.
o Users can quickly access data from the database through key-value.

5) Cloud Database
A type of database where data is stored in a virtual environment and executes over the
cloud computing platform. It provides users with various cloud computing services
(SaaS, PaaS, IaaS, etc.) for accessing the database. There are numerous cloud platforms,
but the best options are:

o Amazon Web Services(AWS)


o Microsoft Azure
o Kamatera
o PhonixNAP
o ScienceSoft
o Google Cloud SQL, etc.

6) Object-oriented Databases
The type of database that uses the object-based data model approach for storing data
in the database system. The data is represented and stored as objects which are similar
to the objects used in the object-oriented programming language.
7) Hierarchical Databases
It is the type of database that stores data in the form of parent-children relationship
nodes. Here, it organizes data in a tree-like structure.

Data get stored in the form of records that are connected via links. Each child record
in the tree will contain only one parent. On the other hand, each parent record can
have multiple child records.

8) Network Databases
It is the database that typically follows the network data model. Here, the
representation of data is in the form of nodes connected via links between them. Unlike
the hierarchical database, it allows each record to have multiple children and parent
nodes to form a generalized graph structure.

9) Personal Database
Collecting and storing data on the user's system defines a Personal Database. This
database is basically designed for a single user.

Advantage of Personal Database

o It is simple and easy to handle.


o It occupies less storage space as it is small in size.

10) Operational Database


The type of database which creates and updates the database in real-time. It is basically
designed for executing and handling the daily data operations in several businesses.
For example, An organization uses operational databases for managing per day
transactions.

11) Enterprise Database


Large organizations or enterprises use this database for managing a massive amount
of data. It helps organizations to increase and improve their efficiency. Such a database
allows simultaneous access to users.

Advantages of Enterprise Database:

o Multi processes are supportable over the Enterprise database.


o It allows executing parallel queries on the system.

There are several types of database systems, each designed to handle specific types
of data and applications. Here are some common types of database systems:

1. Relational Database Management System (RDBMS):

• Overview: Uses a tabular structure with rows and columns to store data. It
enforces the principles of the relational model, including ACID properties
(Atomicity, Consistency, Isolation, Durability).
• Examples: MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server.

2. NoSQL Database:

• Overview: Provides a flexible schema and is designed to handle a variety of


data models, including key-value pairs, document-oriented, wide-column
stores, and graph databases. NoSQL databases are often used for large-scale
and real-time applications.
• Examples: MongoDB, Cassandra, Couchbase, Redis, Neo4j.
3. Columnar Database:

• Overview: Organizes data by columns rather than rows, making it suitable for
analytics and data warehousing. Columnar databases can efficiently handle
queries involving aggregations and analytics.
• Examples: Apache Cassandra, Amazon Redshift, Google Bigtable.

4. Document-Oriented Database:

• Overview: Stores data in flexible, semi-structured document formats such as


JSON or BSON. Each document can have a different structure, allowing for
dynamic and evolving data models.
• Examples: MongoDB, CouchDB, RavenDB.

5. Graph Database:

• Overview: Focuses on the relationships between data entities. Graph


databases use nodes, edges, and properties to represent and store data,
making them well-suited for applications with complex relationships.
• Examples: Neo4j, ArangoDB, Amazon Neptune.

6. In-Memory Database:

• Overview: Stores data in the system's main memory (RAM) rather than on
disk, resulting in faster data access and retrieval. In-memory databases are
suitable for applications that require high-speed data processing.
• Examples: Redis, SAP HANA, Memcached.

7. Time-Series Database:

• Overview: Optimized for handling time-stamped or time-series data, such as


sensor data, financial market data, or log files. Time-series databases are
designed for efficient storage and retrieval of chronological data points.
• Examples: InfluxDB, Prometheus, OpenTSDB.

8. Spatial Database:

• Overview: Specialized in storing and querying spatial data, such as


geographic information system (GIS) data. Spatial databases support spatial
indexing and operations like distance and area calculations.
• Examples: PostGIS (extension for PostgreSQL), Oracle Spatial, MongoDB with
GeoJSON.
9. Distributed Database:

• Overview: Spreads data across multiple nodes or servers to improve


scalability, fault tolerance, and performance. Distributed databases can be
based on various models, including relational, NoSQL, or NewSQL.
• Examples: Apache Cassandra, Amazon DynamoDB, Google Spanner.

10. NewSQL Database:


markdownCopy code
- **Overview:** A category of databases that aims to provide the scalability of NoSQL databases while
maintaining ACID properties typical of traditional relational databases. NewSQL databases are designed
to handle large-scale applications. - **Examples:** Google Spanner, CockroachDB, NuoDB.

Each type of database system has its strengths and weaknesses, and the choice
depends on the specific requirements of the application, scalability needs, data
structure, and other factors. Organizations often use a combination of database
systems to meet different needs within their IT infrastructure.

14. What are Data models?

Data Models in DBMS



••
A Data Model in Database Management System (DBMS) is the concept of tools that
are developed to summarize the description of the database. Data Models provide
us with a transparent picture of data which helps us in creating an actual database.
It shows us from the design of the data to its proper implementation of data.

Types of Relational Models


1. Conceptual Data Model
2. Representational Data Model
3. Physical Data Model
It is basically classified into 3 types:-
1. Conceptual Data Model

The conceptual data model describes the database at a very high level and is useful
to understand the needs or requirements of the database. It is this model, that is
used in the requirement-gathering process i.e. before the Database Designers start
making a particular database. One such popular model is the entity/relationship
model (ER model). The E/R model specializes in entities, relationships, and even
attributes that are used by database designers. In terms of this concept, a
discussion can be made even with non-computer science(non-technical) users and
stakeholders, and their requirements can be understood.
Entity-Relationship Model( ER Model): It is a high-level data model which is
used to define the data and the relationships between them. It is basically a
conceptual design of any database which is easy to design the view of data.
Components of ER Model:
1. Entity: An entity is referred to as a real-world object. It can be a name,
place, object, class, etc. These are represented by a rectangle in an ER
Diagram.
2. Attributes: An attribute can be defined as the description of the entity.
These are represented by Eclipse in an ER Diagram. It can be Age, Roll
Number, or Marks for a Student.
3. Relationship: Relationships are used to define relations among
different entities. Diamonds and Rhombus are used to show
Relationships.

Characteristics of a conceptual data model

• Offers Organization-wide coverage of the business concepts.


• This type of Data Models are designed and developed for a business
audience.
• The conceptual model is developed independently of hardware
specifications like data storage capacity, location or software
specifications like DBMS vendor and technology. The focus is to
represent data as a user will see it in the “real world.”
Conceptual data models known as Domain models create a common vocabulary
for all stakeholders by establishing basic concepts and scope

2. Representational Data Model

This type of data model is used to represent only the logical part of the database
and does not represent the physical structure of the database. The
representational data model allows us to focus primarily, on the design part of the
database. A popular representational model is a Relational model. The relational
Model consists of Relational Algebra and Relational Calculus. In the Relational
Model, we basically use tables to represent our data and the relationships between
them. It is a theoretical concept whose practical implementation is done in
Physical Data Model.
The advantage of using a Representational data model is to provide a foundation
to form the base for the Physical model

3. Physical Data Model

The physical Data Model is used to practically implement Relational Data Model.
Ultimately, all data in a database is stored physically on a secondary storage device
such as discs and tapes. This is stored in the form of files, records, and certain other
data structures. It has all the information on the format in which the files are
present and the structure of the databases, the presence of external data
structures, and their relation to each other. Here, we basically save tables in
memory so they can be accessed efficiently. In order to come up with a good
physical model, we have to work on the relational model in a better
way. Structured Query Language (SQL) is used to practically implement Relational
Algebra.
This Data Model describes HOW the system will be implemented using a specific
DBMS system. This model is typically created by DBA and developers. The
purpose is actual implementation of the database.

Characteristics of a physical data model:

• The physical data model describes data need for a single project or
application though it maybe integrated with other physical data models
based on project scope.
• Data Model contains relationships between tables that which addresses
cardinality and nullability of the relationships.
• Developed for a specific version of a DBMS, location, data storage or
technology to be used in the project.
• Columns should have exact datatypes, lengths assigned and default
values.
• Primary and Foreign keys, views, indexes, access profiles, and
authorizations, etc. are defined
Some Other Data Models
1. Hierarchical Model

The hierarchical Model is one of the oldest models in the data model which was
developed by IBM, in the 1950s. In a hierarchical model, data are viewed as a
collection of tables, or we can say segments that form a hierarchical relation. In
this, the data is organized into a tree-like structure where each record consists of
one parent record and many children. Even if the segments are connected as a
chain-like structure by logical associations, then the instant structure can be a fan
structure with multiple branches. We call the illogical associations as directional
associations.

2. Network Model

The Network Model was formalized by the Database Task group in the 1960s. This
model is the generalization of the hierarchical model. This model can consist of
multiple parent segments and these segments are grouped as levels but there
exists a logical association between the segments belonging to any level. Mostly,
there exists a many-to-many logical association between any of the two segments.

3. Object-Oriented Data Model

In the Object-Oriented Data Model, data and their relationships are contained in a
single structure which is referred to as an object in this data model. In this, real-
world problems are represented as objects with different attributes. All objects
have multiple relationships between them. Basically, it is a combination of Object
Oriented programming and a Relational Database Model.

4. Float Data Model

The float data model basically consists of a two-dimensional array of data models
that do not contain any duplicate elements in the array. This data model has one
drawback it cannot store a large amount of data that is the tables can not be of
large size.

5. Context Data Model

The Context data model is simply a data model which consists of more than one
data model. For example, the Context data model consists of ER Model, Object-
Oriented Data Model, etc. This model allows users to do more than one thing which
each individual data model can do.

6. Semi-Structured Data Model

Semi-Structured data models deal with the data in a flexible way. Some entities
may have extra attributes and some entities may have some missing attributes.
Basically, you can represent data here in a flexible way.

Advantages of Data Models


1. Data Models help us in representing data accurately.
2. It helps us in finding the missing data and also in minimizing Data
Redundancy.
3. Data Model provides data security in a better way.
4. The data model should be detailed enough to be used for building the
physical database.
5. The information in the data model can be used for defining the
relationship between tables, primary and foreign keys, and stored
procedures.
Disadvantages of Data Models
1. In the case of a vast database, sometimes it becomes difficult to
understand the data model.
2. You must have the proper knowledge of SQL to use physical models.
3. Even smaller change made in structure require modification in the entire
application.
4. There is no set data manipulation language in DBMS.
5. To develop Data model one should know physical data stored
characteristics.

Conclusion

• Data modeling is the process of developing data model for the data to
be stored in a Database.
• Data Models ensure consistency in naming conventions, default values,
semantics, security while ensuring quality of the data.
• Data Model structure helps to define the relational tables, primary and
foreign keys and stored procedures.
• There are three types of conceptual, logical, and physical.
• The main aim of conceptual model is to establish the entities, their
attributes, and their relationships.
• Logical data model defines the structure of the data elements and set
the relationships between them.
• A Physical Data Model describes the database specific implementation
of the data model.
• The main goal of a designing data model is to make certain that data
objects offered by the functional team are represented accurately.
• The biggest drawback is that even smaller change made in structure
require modification in the entire application.
• Reading this Data Modeling tutorial, you will learn from the basic
concepts such as What is Data Model? Introduction to different types of
Data Model, advantages, disadvantages, and data model example.

15. Explain Lossy and Lossless decomposition

Lossless and Lossy Decomposition in DBMS


DBMSDatabaseMySQL

Decomposition in DBMS removes redundancy, anomalies and inconsistencies


from a database by dividing the table into multiple tables.

The following are the types −

Lossless Decomposition
Decomposition is lossless if it is feasible to reconstruct relation R from
decomposed tables using Joins. This is the preferred choice. The information
will not lose from the relation when decomposed. The join would result in the
same original relation.

Let us see an example −

<EmpInfo>

Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_

E001 Jacob 29 Alabama Dpt1 Operat

E002 Henry 32 Alabama Dpt2 HR

E003 Tom 22 Texas Dpt3 Financ


Decompose the above table into two tables:

<EmpDetails>

Emp_ID Emp_Name Emp_Age Emp_Location

E001 Jacob 29 Alabama

E002 Henry 32 Alabama

E003 Tom 22 Texas

<DeptDetails>

Dept_ID Emp_ID Dept_Name

Dpt1 E001 Operations

Dpt2 E002 HR

Dpt3 E003 Finance

Now, Natural Join is applied on the above two tables −

The result will be −

Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_

E001 Jacob 29 Alabama Dpt1 Operat

E002 Henry 32 Alabama Dpt2 HR

E003 Tom 22 Texas Dpt3 Financ

Therefore, the above relation had lossless decomposition i.e. no loss of


information.

Lossy Decomposition
As the name suggests, when a relation is decomposed into two or more
relational schemas, the loss of information is unavoidable when the original
relation is retrieved.

Let us see an example −

<EmpInfo>

Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_

E001 Jacob 29 Alabama Dpt1 Operat

E002 Henry 32 Alabama Dpt2 HR

E003 Tom 22 Texas Dpt3 Financ

Decompose the above table into two tables −

<EmpDetails>

Emp_ID Emp_Name Emp_Age Emp_Location

E001 Jacob 29 Alabama

E002 Henry 32 Alabama

E003 Tom 22 Texas

<DeptDetails>

Dept_ID Dept_Name

Dpt1 Operations

Dpt2 HR

Dpt3 Finance
Now, you won’t be able to join the above tables, since Emp_ID isn’t part of
the DeptDetails relation.

Therefore, the above relation has lossy decomposition.

In the context of Database Management Systems (DBMS), decomposition refers to


the process of breaking down a relation (table) into smaller relations to achieve
desirable properties, particularly those related to normalization. There are two main
types of decomposition: lossy decomposition and lossless decomposition.

1. Lossy Decomposition:

Definition:

• Lossy decomposition involves breaking down a relation into smaller relations


in a way that results in the loss of information, and the original relation cannot
be reconstructed exactly from the decomposed relations.

Characteristics:

1. Information Loss:
• Lossy decomposition discards certain attributes or dependencies,
leading to a loss of information.
2. Irreversible:
• The original relation cannot be precisely reconstructed from the
decomposed relations due to the discarded information.
3. Storage Space Reduction:
• Lossy decomposition may be chosen for the purpose of reducing
storage space, but this comes at the cost of sacrificing certain details.

Use Case:

• Lossy decomposition is typically not desirable in most database design


scenarios, especially in situations where preserving data integrity and
relationships is crucial. It may be acceptable in scenarios where absolute
precision is not necessary, and storage efficiency is a higher priority.

2. Lossless Decomposition:

Definition:
• Lossless decomposition involves breaking down a relation into smaller
relations in a way that preserves all the functional dependencies and
information present in the original relation. The original relation can be
reconstructed exactly from the decomposed relations.

Characteristics:

1. Preservation of Information:
• All information and dependencies in the original relation are preserved
in the decomposed relations.
2. Reversibility:
• The original relation can be reconstructed exactly from the
decomposed relations without any loss of data.
3. Data Integrity:
• Lossless decomposition ensures that the decomposed relations
maintain the same level of data integrity as the original relation.

Use Case:

• Lossless decomposition is a fundamental requirement in database design,


especially in scenarios where data accuracy and integrity are critical. It is
commonly used in the normalization process to break down relations into
smaller, well-structured relations without losing any information.

Example:

Consider a relation R with attributes A, B, and C. A lossy decomposition might involve


creating two relations, one with attributes A and B, and another with attribute C. In
this case, information about the relationship between A, B, and C is lost. On the other
hand, a lossless decomposition would ensure that the decomposed relations
maintain the original dependencies and can be joined to reconstruct the original
relation.

In summary, lossy decomposition sacrifices information for the sake of efficiency,


while lossless decomposition ensures that all information is preserved, maintaining
data integrity and enabling the reconstruction of the original relation. In database
design, lossless decomposition is generally preferred to avoid compromising data
accuracy and relationships.

16. Explain the concept of indexing in detail.


Explain the concept of indexing in DBMS
DBMSDatabaseBig Data Analytics

Indexing is one of the techniques used to optimize performance of a database


by reducing the number of disk accesses that are required when a query is
processed.

A database index is a data structure that is helpful to quickly locate and access
the data in a database table.

Indexes are created using database columns.

• The first column is the Search key which contains a copy of the primary key or
candidate key of the table.
• The second column is the data reference that contains a set of pointers which hold the
address of the disk block where the key value can be found.

Structure of Index
The structure of an index in the database management system (DBMS) is given
below −

Search key Data reference

Types of indexes
The different types of index are as follows −

• Primary
• Clustering
• Secondary

These types of indexes are listed below in the form of a chart −


Cluster Index
• Index Entry will be created only for distinct values in a database.
• This is both a dense and sparse type example.

Secondary Index
• Index (Unique value) is created for each record in a data file which is a candidate key.
• Secondary index is a type of dense index and also called a non clustering index.
• Secondary mapping size will be small as the two level DB indexing is used.

Primary Index
• Primary index is defined on an ordered data file. The data file is ordered on a key field.
The key field is generally the primary key of the relation.

17.

Why is concurrency control needed?


30 Aug 2018

• Popular Articles
If transactions are executed serially, i.e., sequentially with no overlap in time, no
transaction concurrency exists. However, if concurrent transactions with interleaving
operations are allowed in an uncontrolled manner, some unexpected, undesirable result
may occur, such as:

• The lost update problem: A second transaction writes a second value of a data-
item (datum) on top of a first value written by a first concurrent transaction, and
the first value is lost to other transactions running concurrently which need, by
their precedence, to read the first value. The transactions that have read the
wrong value end with incorrect results.
• The dirty read problem: Transactions read a value written by a transaction that
has been later aborted. This value disappears from the database upon abort, and
should not have been read by any transaction (“dirty read”). The reading
transactions end with incorrect results.
• The incorrect summary problem: While one transaction takes a summary over
the values of all the instances of a repeated data-item, a second transaction
updates some instances of that data-item. The resulting summary does not
reflect a correct result for any (usually needed for correctness) precedence
order between the two transactions (if one is executed before the other), but
rather some random result, depending on the timing of the updates, and
whether certain update results have been included in the summary or not.
Most high-performance transactional systems need to run transactions concurrently to
meet their performance requirements. Thus, without concurrency control such systems
can neither provide correct results nor maintain their databases consistently.

Categories
The main categories of concurrency control mechanisms are:

• Optimistic – Delay the checking of whether a transaction meets the isolation


and other integrity rules (e.g., serializability and recoverability) until its end,
without blocking any of its (read, write) operations (“…and be optimistic about
the rules being met…”), and then abort a transaction to prevent the violation, if
the desired rules are to be violated upon its commit. An aborted transaction is
immediately restarted and re-executed, which incurs an obvious overhead
(versus executing it to the end only once). If not too many transactions are
aborted, then being optimistic is usually a good strategy.
• Pessimistic – Block an operation of a transaction, if it may cause violation of
the rules, until the possibility of violation disappears. Blocking operations is
typically involved with performance reduction.
• Semi-optimistic – Block operations in some situations, if they may cause
violation of some rules, and do not block in other situations while delaying rules
checking (if needed) to transaction’s end, as done with optimistic.
Different categories provide different performance, i.e., different average transaction
completion rates (throughput), depending on transaction types mix, computing level of
parallelism, and other factors. If selection and knowledge about trade-offs are available,
then category and method should be chosen to provide the highest performance.

The mutual blocking between two transactions (where each one blocks the other) or
more results in a deadlock, where the transactions involved are stalled and cannot reach
completion. Most non-optimistic mechanisms (with blocking) are prone to deadlocks
which are resolved by an intentional abort of a stalled transaction (which releases the
other transactions in that deadlock), and its immediate restart and re-execution. The
likelihood of a deadlock is typically low.

Blocking, deadlocks, and aborts all result in performance reduction, and hence the trade-
offs between the categories.
18. What is concurrency control in DBMS?
DBMSDatabaseBig Data Analytics

Concurrency control concept comes under the Transaction in database


management system (DBMS). It is a procedure in DBMS which helps us for the
management of two simultaneous processes to execute without conflicts
between each other, these conflicts occur in multi user systems.

Concurrency can simply be said to be executing multiple transactions at a time.


It is required to increase time efficiency. If many transactions try to access the
same data, then inconsistency arises. Concurrency control required to
maintain consistency data.

For example, if we take ATM machines and do not use concurrency, multiple
persons cannot draw money at a time in different places. This is where we
need concurrency.

Advantages
The advantages of concurrency control are as follows −

• Waiting time will be decreased.


• Response time will decrease.
• Resource utilization will increase.
• System performance & Efficiency is increased.

Control concurrency
The simultaneous execution of transactions over shared databases can create
several data integrity and consistency problems.

For example, if too many people are logging in the ATM machines, serial
updates and synchronization in the bank servers should happen whenever the
transaction is done, if not it gives wrong information and wrong data in the
database.

Main problems in using Concurrency


The problems which arise while using concurrency are as follows −
• Updates will be lost − One transaction does some changes and another transaction
deletes that change. One transaction nullifies the updates of another transaction.
• Uncommitted Dependency or dirty read problem − On variable has updated in one
transaction, at the same time another transaction has started and deleted the value of
the variable there the variable is not getting updated or committed that has been done
on the first transaction this gives us false values or the previous values of the variables
this is a major problem.
• Inconsistent retrievals − One transaction is updating multiple different variables,
another transaction is in a process to update those variables, and the problem occurs
is inconsistency of the same variable in different instances.

Concurrency control techniques


The concurrency control techniques are as follows −

Locking
Lock guaranties exclusive use of data items to a current transaction. It first
accesses the data items by acquiring a lock, after completion of the transaction
it releases the lock.

Types of Locks

The types of locks are as follows −

• Shared Lock [Transaction can read only the data item values]
• Exclusive Lock [Used for both read and write data item values]

Time Stamping
Time stamp is a unique identifier created by DBMS that indicates relative
starting time of a transaction. Whatever transaction we are doing it stores the
starting time of the transaction and denotes a specific time.

This can be generated using a system clock or logical counter. This can be
started whenever a transaction is started. Here, the logical counter is
incremented after a new timestamp has been assigned.

Optimistic
It is based on the assumption that conflict is rare and it is more efficient to
allow transactions to proceed without imposing delays to ensure serializability.

19. Database Schema

Database Schemas
•••
Pre-requisite: Introduction to Database Management System
Nowadays data is one of the most important things in the business world, every
business captures its customer’s data to understand their behavior, And in the
world of the internet, data is growing like crazy, so businesses need more
advanced database solutions, by which they can maintain the database systems
and whenever they need data to solve business problems, they can easily get what
data they want without any problem. To fulfil this condition, there is a requirement
of database schema in picture.

What is Schema?
• The Skeleton of the database is created by the attributes and this
skeleton is named as Schema.
• Schema is mentioning the logical constraints like table, primary key etc.
• Schema does not represent the data type of the attributes.

Details of a Customer

Schema of Customer
Database Schema
• A database schema is a logical representation of data that shows how
the data in a database should be stored logically. It shows how the data
is organized and the relationship between the tables.
• Database schema contains table, field, views and relation between
different keys like primary key, foreign key.
• Data are stored in the form of files which is unstructured in nature which
makes accessing the data difficult. Thus to resolve the issue the data are
organized in structured way with the help of database schema.
• Database schema provides the organization of data and the relationship
between the stored data.
• Database schema defines a set of guidelines that control the database
along with that it provides information about the way of accessing and
modifying the data.
Types of Database Schemas
There are 3 types of database schema:
• Physical Database Schema:
• A Physical schema defines, how the data or information is
stored physically in the storage systems in the form of files &
indices. This is the actual code or syntax needed to create the
structure of a database, we can say that when we design a
database at a physical level, it’s called physical schema.
• The Database administrator chooses where and how to store
the data in the different blocks of storage.
• Logical Database Schema:
• A logical database schema defines all the logical constraints
that need to be applied to the stored data, and also describes
tables, views, entity relationships, and integrity constraints.
• The Logical schema describes how the data is stored in the
form of tables & how the attributes of a table are connected.
• Using ER modelling the relationship between the
components of the data is maintained.
• In logical schema different integrity constraints are defined in
order to maintain the quality of insertion and update the data.
• View Database Schema:
• It is a view level design which is able to define the interaction
between end-user and database.
• User is able to interact with the database with the help of the
interface without knowing much about the stored mechanism
of data in database.
Three Layer Schema Design

Creating Database Schema


For creating a schema, the statement “CREATE SCHEMA” is used in every database.
But different databases have different meanings for this. Below we’ll be looking at
some statements for creating a database schema in different database systems:
1. MySQL: In MySQL, we use the “CREATE SCHEMA” statement for creating the
database, because, in MySQL CREATE SCHEMA and CREATE DATABASE, both
statements are similar.
2. SQL Server: In SQL Server, we use the “CREATE SCHEMA” statement for
creating a new schema.
3. Oracle Database: In Oracle Database, we use “CREATE USER” for creating a
new schema, because in the Oracle database, a schema is already created with each
database user. The statement “CREATE SCHEMA” does not create a schema,
instead, it populates the schema with tables & views and also allows one to access
those objects without needing multiple SQL statements for multiple transactions.
Database Schema Designs
There are many ways to structure a database, and we should use the best-suited
schema design for creating our database because ineffective schema designs are
difficult to manage & consume extra memory and resources.
Schema design mostly depends on the application’s requirements. Here we have
some effective schema designs to create our applications, let’s take a look at the
schema designs:
1. Flat Model
2. Hierarchical Model
3. Network Model
4. Relational Model
5. Star Schema
6. Snowflake Schema
Flat Model
A flat model schema is a 2-D array in which every column contains the same type
of data/information and the elements with rows are related to each other. It is just
like a table or a spreadsheet. This schema is better for small applications that do
not contain complex data.
Designing Flat Model

Hierarchical Model
The hierarchical model has a tree-like structure, this tree structure contains the
root node that links to its child nodes. Each child node & parent node have a one-
to-many relationship. In other words, we can say that hierarchical schema has a
root table that is associated with multiple tables, and every table can have multiple
child tables, but every child table can only have one parent table. This type of
schema is presented by XML or JSON files.

Designing Hierarchical Model

Network Model
The network model and the hierarchical model are quite similar with an important
difference that is related to data relationships. The network model allows many-
to-many relationships whereas hierarchical models allow one-to-many
relationships.

Designing Network Model


Relational Model
The relational model is mainly used for relational databases, where the data is
stored as relations of the table. This relational model schema is better for object-
oriented programming.

Designing Relational Model

Star Schema
Star schema is better for storing and analyzing large amounts of data. It has a fact
table at its center & multiple dimension tables connected to it just like a star, where
the fact table contains the numerical data that run business processes and the
dimension table contains data related to dimensions such as product, time, people,
etc. or we can say, this table contains the description of the fact table. The star
schema allows us to structure the data of RDBMS.
Designing Star Schema

Snowflake Schema
Just like star schema, the snowflake schema also has a fact table at its center and
multiple dimension tables connected to it, but the main difference in both models
is that in snowflake schema – dimension tables are further normalized into
multiple related tables. The snowflake schema is used for analyzing large amounts
of data.
Designing Snowflake Schema

Difference between Logical and Physical Database


Schema
Physical Schema Logical Schema

Logical schema provides the conceptual


Physical schema describes the way of view that defines the relationship
storage of data in the disk.
between the data entities.

Having Low level of abstraction. Having a high level of abstraction.

The design of database is independent The design of a database must work with a
to any database specific database
management system. management system or hardware platform.

Changes in Physical schema effects the Any changes made in logical schema have
logical schema minimal effect in the physical schema
Physical Schema Logical Schema

Physical schema does not include


Logical schema includes attributes.
attributes.

Physical schema contains the attributes Logical schema does not contain any
and their data types. attributes or data types.

Examples: Data definition


Examples: Entity Relationship diagram,
language(DDL), storage structures,
Unified Modeling Language, class diagram.
indexes.

Advantages of Database Schema


• Providing Consistency of data: Database schema ensures the data
consistency and prevents the duplicates.
• Maintaining Scalability: Well designed database schema helps in
maintaining addition of new tables in database along with that it helps
in handling large amounts of data in growing tables.
• Performance Improvement: Database schema helps in faster data
retrieval which is able to reduce operation time on the database tables.
• Easy Maintenance: Database schema helps in maintaining the entire
database without affecting the rest of the database
• Security of Data: Database schema helps in storing the sensitive data
and allows only authorized access to the database.
Database Instance
The database schema is defined before the actual database is created, after the
database is operational, it is very difficult to modify the schema because the
schema represents the fundamental structure of the database. Database instance
does not hold any information related to the saved data in database. Therefore
database instance represents the data and information that is currently stored in
the database at a specific point in time.

Database instance of Customer table at a specific time

Conclusion
• The Structure of the database is referred to as the Schema, and it
represents logical restrictions like Table and Key, among other things.
• Three Schema Architecture was developed to prevent the user from
direct access to the database.
• Since the information that is saved in the database is subject to frequent
change, Instance is a representation of a data at a specific time.

20. Shote note on DBMS - Deadlock

In a multi-process system, deadlock is an unwanted situation that


arises in a shared resource environment, where a process
indefinitely waits for a resource that is held by another process.

For example, assume a set of transactions {T0, T1, T2, ...,Tn}.


T0 needs a resource X to complete its task. Resource X is held by
T1, and T1 is waiting for a resource Y, which is held by T2. T2 is
waiting for resource Z, which is held by T0. Thus, all the
processes wait for each other to release resources. In this
situation, none of the processes can finish their task. This
situation is known as a deadlock.

Deadlocks are not healthy for a system. In case a system is stuck


in a deadlock, the transactions involved in the deadlock are either
rolled back or restarted.

Deadlock Prevention
To prevent any deadlock situation in the system, the DBMS
aggressively inspects all the operations, where transactions are
about to execute. The DBMS inspects the operations and analyzes
if they can create a deadlock situation. If it finds that a deadlock
situation might occur, then that transaction is never allowed to be
executed.

There are deadlock prevention schemes that use timestamp


ordering mechanism of transactions in order to predetermine a
deadlock situation.

Wait-Die Scheme
In this scheme, if a transaction requests to lock a resource (data
item), which is already held with a conflicting lock by another
transaction, then one of the two possibilities may occur −

• If TS(Ti) < TS(Tj) − that is Ti, which is requesting a


conflicting lock, is older than Tj − then Ti is allowed to wait
until the data-item is available.
• If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then
Ti dies. Ti is restarted later with a random delay but with the
same timestamp.

This scheme allows the older transaction to wait but kills the
younger one.

Wound-Wait Scheme

In this scheme, if a transaction requests to lock a resource (data


item), which is already held with conflicting lock by some another
transaction, one of the two possibilities may occur −

• If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is


Ti wounds Tj. Tj is restarted later with a random delay but
with the same timestamp.
• If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource
is available.

This scheme, allows the younger transaction to wait; but when an


older transaction requests an item held by a younger one, the
older transaction forces the younger one to abort and release the
item.

In both the cases, the transaction that enters the system at a


later stage is aborted.

Deadlock Avoidance
Aborting a transaction is not always a practical approach.
Instead, deadlock avoidance mechanisms can be used to detect
any deadlock situation in advance. Methods like "wait-for graph"
are available but they are suitable for only those systems where
transactions are lightweight having fewer instances of resource.
In a bulky system, deadlock prevention techniques may work
well.
Wait-for Graph

This is a simple method available to track if any deadlock


situation may arise. For each transaction entering into the
system, a node is created. When a transaction Ti requests for a
lock on an item, say X, which is held by some other transaction
Tj, a directed edge is created from Ti to Tj. If Tj releases item X,
the edge between them is dropped and Ti locks the data item.

The system maintains this wait-for graph for every transaction


waiting for some data items held by others. The system keeps
checking if there's any cycle in the graph.

Here, we can use any of the two following approaches −

• First, do not allow any request for an item, which is already


locked by another transaction. This is not always feasible
and may cause starvation, where a transaction indefinitely
waits for a data item and can never acquire it.
• The second option is to roll back one of the transactions. It
is not always feasible to roll back the younger transaction,
as it may be important than the older one. With the help of
some relative algorithm, a transaction is chosen, which is to
be aborted. This transaction is known as the victim and the
process is known as victim selection.
21. What is Data Masking?
• Read

• Discuss

• Courses

••
Data masking is a very important concept to keep data safe from any breaches.
Especially, for big organizations that contain heaps of sensitive data that can be
easily compromised. Details like credit card information, phone numbers, house
addresses are highly vulnerable information that must be protected. To
understand data masking better we first need to know what computer networks
are.

What are computer networks?


A computer network is a coordinated system of computers that share resources.
These resources are provided by a redistribution point or endpoint called a
network node. The computers use common communication protocols over digital
interconnections to communicate with each other. Computer networks are an
integral part of telecommunication systems. The connections can consist of
telecommunication network technologies that are based on physically wired,
optical, and wireless radio-frequency methods.

Computer networks and Network Security

Network security consists of many layers and an attack can happen in any one of
these layers. These networks usually consist of three controls
• Physical network security
• Technical network security
• Administrative network security
Physical network security: this is designed to keep the system network safe from
unauthorized personnel from breaking into the network components that include
OUI, Fiber optic cable, etc.,
Technical network security: this protects the data that is stored in the network
or which is transmitted throughout it. It ensures no one gets away with any
unauthorized activities apart from the user themself.
Administrative network security: this includes all the policies and procedures
that need to be followed by the authorized users for other personnel.
Data masking:
Data masking means creating an exact replica of pre-existing data in order to keep
the original data safe and secure from any safety breaches. Various data masking
software is being created so organizations can use them to keep their data safe.
That is how important it is to emphasize data masking.

Types of data masking

There are various types of data masking. Some of them are given below
• Static data masking(SDM): Static data masking works at a state of rest
by altering the data thereby, permanently replacing sensitive data. It
helps an organization to create a clean and nearly breaches free copy of
their database. SDM is commonly used for development and data
testing.

static data masking takes place at the state of rest

• Dynamic data masking(DDM): Just like the name suggests, dynamic


data masking alters the data simultaneously or while the data transfer is
taking place. With DDM you can do full masking and partial masking as
well. A random mask option is also present for numeric data.

Dynamic data masking takes place at the time of data commute

• Deterministic data masking: How deterministic data masking works


is, it replaces a value in a column of a given table with a similar value
present in the very same row. This can be done in various formats for
example., substitute format.
• On-the-fly data masking: In this type of data masking the data is
transferred from one place to another without having anything to do
with the disk while commuting. It is similar to dynamic data masking
except this is done with one value at a time.

On fly data masking masks data one record at a time


Techniques:
Data masking can be done using the following techniques
• Substitution: The substitution method is considered one of the most
efficient and reliable techniques, to achieve the desired result. In the
method, any sensitive information that needs to be protected should be
substituted with a fake yet realistic-looking value. Only the person with
authorized access to the system will be able to look under the masked
values.
• Pros: Makes the data look as realistic as possible
• Cons: Not applicable when dealing with large amounts of
data that are unrelated
• Before Substitution:
Participant Name Problem Type Score

Alena Hard 45.33

Rory Hard 33.21

Miguel Easy 20

Samara Medium 37.2

• After Substitution :
Participant Name Problem Type Score

Alena Hard 30.22

Rory Hard 40.9

Miguel Easy 50

Samara Medium 46.24


• Averaging: This method can be used in the case of numeric data.
Instead of showing individual numeric data, you can replace the value in
all cells with a collective average of all the values in the column. For
example, if you have student details and you don’t want other students
to see the total number of marks other students have got then you can
change the data by averaging the marks of all the students and replacing
it with the average in the column.
Participant Name Problem Type Score

Alena Hard 41.84

Rory Hard 41.84

Miguel Easy 41.84

Samara Medium 41.84

• Shuffling: Shuffling and averaging are similar techniques so to say but


there’s a difference that sets them apart. instead of replacing all the
values in the column, you can simply shuffle the values around. With this
nobody can tell which value belongs to which dataset because they will
be in different locations.
• Pros: Deals with large amounts of data efficiently while
keeping the data as realistic as possible.
• Cons: Can be undone easily if the data set is relatively small.
• Before Shuffling:
Participant Name Problem Type Score

Alena Hard 45.33

Rory Hard 33.21

Miguel Easy 20
Samara Medium 37.2

• After Shuffling:
Participant Name Problem Type Score

Alena Hard 50

Rory Hard 46.24

Miguel Easy 30.22

Samara Medium 40.9

• Encryption: Encryption is a very common concept in cyber security


and cryptography. It is achieved by completely changing the sensitive
dataset in an unreadable form. What this does is ensures that no one gets
to know what type of data or even what data is being represented. Only
personnel who have access to the encryption key will be able to see the
data.
• Pros: Masks the data effectively
• Cons: Anyone with the encryption key can easily get access to
the data. Also, anyone who knows cryptography and decrypts
the data with enough effort.

• Nulling out or deletion: Nulling out is exactly what the name suggests
you delete the values in a column by replacing them with NULL values.
This is a very effective method to eliminate showing any sensitive
information in a test environment.
• Pros: Very useful in situations where data is not essential
• Cons: Not applicable in test environments.

Participant Name Problem Type Score


Alena Hard NULL

Rory Hard NULL

Miguel Easy NULL

Samara Medium NULL

• Redaction Method: In this method, you can replace the sensitive


information with the same unique code or a generic value for the entirety
of the column.
• Pros: Difficult to make out what the data can be therefore
making the data more secure.
• Cons: this method should only be used when the values are
not being used for development or QA purposes.
Participant Name Problem Type Score

Alena Hard XXXXXXXXXX

Rory Hard XXXXXXXXXX

Miguel Easy XXXXXXXXXX

Samara Medium XXXXXXXXXX

• Date Aging: If you have dates in your data set that you don’t want to
reveal then you can set the dates a little back or forth than what actually
is given. For example, if you have a date set to 20-8-21 then you can set
the date to 300 days back that is 01-02-21. This can also be done with
any kind of numeric data. Make sure that the data in a column or row is
aged to a definite number or similar algorithm
• Pros: Easy to remember the algorithm and effective masking
of information
• Cons: Only appropriate for numeric data.
• Original Data Set:
Participant Name Problem Type Score

Alena Hard 30.22

Rory Hard 40.9

Miguel Easy 50

Samara Medium 46.24

• Mask data set by adding 45 to all the elements of the row:


Participant Name Problem Type Score

Alena Hard 30.22

Rory Hard 40.9

Miguel Easy 50

Samara Medium 46.24

Applications of data masking:


There is a myriad of applications of data masking, especially in information
security. Some of them are:
• Auditing: In auditing, you need to keep track of and maintain the
accuracy of all the data given by an organization or some other source.
Naturally, it is important to keep the data safe and secure which can be
achieved by data masking.
• Access Control: Making sure that only authorized personnel gets to
access any sensitive data and modify them is known as access control.
Data masking plays a vital role in access control as it can cover up for any
mishaps that may indefinitely happen and prevent major damage.
• Cryptography: As discussed earlier in the techniques section, there is a
technique called encryption. Encryption is a method used in
cryptography to hide sensitive data. Hence, data masking is an important
concept to know in order to pursue cryptography.
Types of data that can be masked:
Any type of data can be masked. Here are some examples:
• Personal Information: Personal information is the most sensitive
information out there. It is important for personal information to be
masked be it in a professional setting or personal setting. Vulnerable
personal information is always a threat to safety.
• Financial Data: It is important for an organization to keep its financial
data safe. Important and sensitive information like transactions, profit,
and loss statements, and other information is very dangerous to be
disclosed in a test environment.
Benefits of data masking:
Data masking provides a solution to a myriad of cyber security problems.
Therefore, data masking comes with many benefits. Some of them are:
• Data Masking is highly effective in securing data breaches.
• It does not allow hijackers to easily hack into your system.
• Insiders cannot use data in a vitriolic way if the data is masked.
• Secures any vulnerable interfaces.
• It is very cost-effective unlike other methods of information security.
• Data can be shared with authorized personnel without feeling any threat
to your security.
Challenges of data masking:
There are certain challenges that can be encountered whilst attempting data
masking. One such challenge is that you will need to mask the data in a way that it
doesn’t lose its original identity to authorized personnel while being masked
enough for cybercriminals to not be able to breach the original data. This in theory
might seem rather simple but the practical implementation is fairly tricky.
Data masking should also be able to mask the data without actually modifying the
data or the application itself. The integrity of data should also be maintained while
masking. The masking system should be able to follow the parameters set by the
database and not override those set parameter
Data masking is a very important concept that needs to be implemented in every
organization. Soon enough data masking will not only be a concept for institutions
but also be available to the common public to keep their information safe in
cyberspace. This emphasizes the importance of learning data masking techniques
in order to imply them in your everyday data. Safety starts at home.

22. Explain different types of database Failure?


Failure Classification
To find that where the problem has occurred, we generalize a failure into the following
categories:

1. Transaction failure
2. System crash
3. Disk failure

1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point
from where it can't go any further. If a few transaction or process is hurt, then
this is called as transaction failure.

Reasons for a transaction failure could be -

1. Logical errors: If a transaction cannot complete due to some code error or an


internal error condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction
because the database system is not able to execute it. For example, The system
aborts an active transaction, in case of deadlock or resource unavailability.

2. System Crash

o System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.

Fail-stop assumption: In the system crash, non-volatile storage is


assumed not to be corrupted.

3. Disk Failure

o It occurs where hard-disk drives or storage drives used to fail frequently. It was
a common problem in the early days of technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head crash, and
unreachability to the disk or any other failure, which destroy all or part of disk
storage.

23. Difference between primary key and foreign key ?


Difference between Primary Key and Foreign
Key
• Read

• Discuss

• Courses

• Video

••
Pre-Requisite: Relational Database Model
Keys are one of the most important elements in a relational database to maintain
the relationship between the tables and it also helps in uniquely identifying the
data from a table. The primary Key is a key that helps in uniquely identifying the
tuple of the database whereas the Foreign Key is a key that is used to identify the
relationship between the tables through the primary key of one table that is the
primary key one table acts as a foreign key to another table. Now, let’s discuss both
of them in some detail.

What is Primary Key?


A primary key is used to ensure that data in the specific column is unique. A
column cannot have NULL values. It is either an existing table column or a column
that is specifically generated by the database according to a defined sequence.
Example: STUD_NO, as well as STUD_PHONE both, are candidate keys for relation
STUDENT but STUD_NO can be chosen as the primary key (only one out of many
candidate keys).
Table STUDENT
STUD_N STUD_NA STUD_PHO STUD_STA STUD_COU STUD_A
O ME NE TE NT GE

1 RAM 9865278251 Haryana India 20

2 RAM 9655470231 Punjab India 19

3 SUJIT 7514290359 Rajasthan India 18

4 SURESH 8564103258 Punjab India 21

Table STUDENT_COURSE
STUD_NO COURSE_NO COURSE_NAME

1 C1 DBMS

2 C2 Computer Networks

1 C2 Computer Networks

What is Foreign Key?


A foreign key is a column or group of columns in a relational database table that
provides a link between data in two tables. It is a column (or columns) that
references a column (most often the primary key) of another table.
Example: STUD_NO in STUDENT_COURSE is a foreign key to STUD_NO in
STUDENT relation.
Difference between Primary Key and Foreign Key
PRIMARY KEY FOREIGN KEY

A primary key is used to ensure A foreign key is a column or group of columns in a


data in the specific column is relational database table that provides a link
unique. between data in two tables.

It uniquely identifies a record in It refers to the field in a table which is the primary
the relational database table. key of another table.

Only one primary key is allowed Whereas more than one foreign key is allowed in
in a table. a table.

It is a combination of UNIQUE It can contain duplicate values and a table in a


and Not Null constraints. relational database.

It does not allow NULL values. It can also contain NULL values.
PRIMARY KEY FOREIGN KEY

Its value cannot be deleted from


Its value can be deleted from the child table.
the parent table.

It constraint can be implicitly


It constraint cannot be defined on the local or
defined on the temporary
global temporary tables.
tables.

Conclusion
In this article, we have basically mentioned the primary key and foreign key, and
the differences between them. Both the keys, whether the primary key or the
foreign key, play an important role in the Database management system. Primary
Key contains unique values, whereas Foreign Key contains values taking reference
from Primary Keys. The main characteristic property of the Primary key is that it
can’t be repeated, it is unique. There are some differences between their functions,
as Primary Keys determines a row in the table and Foreign Key determines the
relation between tables.

24. BCNF ?

Boyce Codd normal form (BCNF)


o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one
department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300


364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

PauseNext

Unmute

Current Time 0:27

Duration 18:10

Loaded: 8.07%
Â

Fullscreen

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a
key.
25. Raid in dbms?

RAID (Redundant Arrays of Independent Disks)


• Read

• Discuss

• Courses

••
RAID is a technique that makes use of a combination of multiple disks instead
of using a single disk for increased performance, data redundancy, or both.
The term was coined by David Patterson, Garth A. Gibson, and Randy Katz at
the University of California, Berkeley in 1987.
Why Data Redundancy?
Data redundancy, although taking up extra space, adds to disk reliability. This
means, that in case of disk failure, if the same data is also backed up onto
another disk, we can retrieve the data and go on with the operation. On the
other hand, if the data is spread across multiple disks without the RAID
technique, the loss of a single disk can affect the entire data.
Key Evaluation Points for a RAID System
Reliability: How many disk faults can the system tolerate?

• Availability: What fraction of the total session time is a system in
uptime mode, i.e. how available is the system for actual use?
• Performance: How good is the response time? How high is the
throughput (rate of processing work)? Note that performance
contains a lot of parameters and not just the two.
• Capacity: Given a set of N disks each with B blocks, how much
useful capacity is available to the user?
RAID is very transparent to the underlying system. This means, that to the
host system, it appears as a single big disk presenting itself as a linear array
of blocks. This allows older technologies to be replaced by RAID without
making too many changes to the existing code.
Different RAID Levels
1. RAID-0 (Stripping)
2. RAID-1 (Mirroring)
3. RAID-2 (Bit-Level Stripping with Dedicated Parity)
4. RAID-3 (Byte-Level Stripping with Dedicated Parity)
5. RAID-4 (Block-Level Stripping with Dedicated Parity)
6. RAID-5 (Block-Level Stripping with Distributed Parity)
7. RAID-6 (Block-Level Stripping with two Parity Bits)

RAID (Redundant Array of Independent


Disk)
RAID refers to redundancy array of the independent disk. It is a technology which is
used to connect multiple secondary storage devices for increased performance, data
redundancy or both. It gives you the ability to survive one or more drive failure
depending upon the RAID level used.

It consists of an array of disks in which multiple disks are connected to achieve different
goals.

RAID technology
There are 7 levels of RAID schemes. These schemas are as RAID 0, RAID 1, ...., RAID 6.

These levels contain the following characteristics:

o It contains a set of physical disk drives.


o In this technology, the operating system views these separate disks as a single logical
disk.
o In this technology, data is distributed across the physical drives of the array.
o Redundancy disk capacity is used to store parity information.
o In case of disk failure, the parity information can be helped to recover the data.

Standard RAID levels


RAID 0

o RAID level 0 provides data stripping, i.e., a data can place across multiple disks. It is
based on stripping that means if one disk fails then all data in the array is lost.
o This level doesn't provide fault tolerance but increases the system performance.
Example:

Disk 0 Disk 1 Disk 2 Disk 3

20 21 22 23

24 25 26 27

28 29 30 31

32 33 34 35

In this figure, block 0, 1, 2, 3 form a stripe.

In this level, instead of placing just one block into a disk at a time, we can work with
two or more blocks placed it into a disk before moving on to the next one.

Disk 0 Disk 1 Disk 2 Disk 3

20 22 24 26

21 23 25 27

28 30 32 34

29 31 33 35

In this above figure, there is no duplication of data. Hence, a block once lost cannot
be recovered.

Pros of RAID 0:

o In this level, throughput is increased because multiple data requests probably not on
the same disk.
o This level full utilizes the disk space and provides high performance.
o It requires minimum 2 drives.
Cons of RAID 0:

o It doesn't contain any error detection mechanism.


o The RAID 0 is not a true RAID because it is not fault-tolerance.
o In this level, failure of either disk results in complete data loss in respective array.

RAID 1
This level is called mirroring of data as it copies the data from drive 1 to drive 2. It
provides 100% redundancy in case of a failure.

Example:

Disk 0 Disk 1 Disk 2 Disk 3

A A B B

C C D D

E E F F

G G H H

Only half space of the drive is used to store the data. The other half of drive is just a
mirror to the already stored data.

Pros of RAID 1:

o The main advantage of RAID 1 is fault tolerance. In this level, if one disk fails, then the
other automatically takes over.
o In this level, the array will function even if any one of the drives fails.

Cons of RAID 1:

o In this level, one extra drive is required per drive for mirroring, so the expense is higher.

RAID 2
o RAID 2 consists of bit-level striping using hamming code parity. In this level, each data
bit in a word is recorded on a separate disk and ECC code of data words is stored on
different set disks.
o Due to its high cost and complex structure, this level is not commercially used. This
same performance can be achieved by RAID 3 at a lower cost.

Pros of RAID 2:

o This level uses one designated drive to store parity.


o It uses the hamming code for error detection.

Cons of RAID 2:

o It requires an additional drive for error detection.

RAID 3
o RAID 3 consists of byte-level striping with dedicated parity. In this level, the parity
information is stored for each disk section and written to a dedicated parity drive.
o In case of drive failure, the parity drive is accessed, and data is reconstructed from the
remaining devices. Once the failed drive is replaced, the missing data can be restored
on the new drive.
o In this level, data can be transferred in bulk. Thus high-speed data transmission is
possible.

Disk 0 Disk 1 Disk 2 Disk 3

A B C P(A, B, C)

D E F P(D, E, F)

G H I P(G, H, I)

J K L P(J, K, L)

Pros of RAID 3:
o In this level, data is regenerated using parity drive.
o It contains high data transfer rates.
o In this level, data is accessed in parallel.

Cons of RAID 3:

o It required an additional drive for parity.


o It gives a slow performance for operating on small sized files.

RAID 4
o RAID 4 consists of block-level stripping with a parity disk. Instead of duplicating data,
the RAID 4 adopts a parity-based approach.
o This level allows recovery of at most 1 disk failure due to the way parity works. In this
level, if more than one disk fails, then there is no way to recover the data.
o Level 3 and level 4 both are required at least three disks to implement RAID.

Disk 0 Disk 1 Disk 2 Disk 3

A B C P0

D E F P1

G H I P2

J K L P3

In this figure, we can observe one disk dedicated to parity.

In this level, parity can be calculated using an XOR function. If the data bits are 0,0,0,1
then the parity bits is XOR(0,1,0,0) = 1. If the parity bits are 0,0,1,1 then the parity bit
is XOR(0,0,1,1)= 0. That means, even number of one results in parity 0 and an odd
number of one results in parity 1.

C1 C2 C3 C4 Parity
0 1 0 0 1

0 0 1 1 0

Suppose that in the above figure, C2 is lost due to some disk failure. Then using the
values of all the other columns and the parity bit, we can recompute the data bit stored
in C2. This level allows us to recover lost data.

RAID 5
o RAID 5 is a slight modification of the RAID 4 system. The only difference is that in RAID
5, the parity rotates among the drives.
o It consists of block-level striping with DISTRIBUTED parity.
o Same as RAID 4, this level allows recovery of at most 1 disk failure. If more than one
disk fails, then there is no way for data recovery.

Disk 0 Disk 1 Disk 2 Disk 3 Disk 4

0 1 2 3 P0

5 6 7 P1 4

10 11 P2 8 9

15 P3 12 13 14

P4 16 17 18 19

This figure shows that how parity bit rotates.

This level was introduced to make the random write performance better.

Pros of RAID 5:

o This level is cost effective and provides high performance.


o In this level, parity is distributed across the disks in an array.
o It is used to make the random write performance better.
Cons of RAID 5:

o In this level, disk failure recovery takes longer time as parity has to be calculated from
all available drives.
o This level cannot survive in concurrent drive failure.

RAID 6
o This level is an extension of RAID 5. It contains block-level stripping with 2 parity bits.
o In RAID 6, you can survive 2 concurrent disk failures. Suppose you are using RAID 5,
and RAID 1. When your disks fail, you need to replace the failed disk because if
simultaneously another disk fails then you won't be able to recover any of the data, so
in this case RAID 6 plays its part where you can survive two concurrent disk failures
before you run out of options.

Disk 1 Disk 2 Disk 3 Disk 4

A0 B0 Q0 P0

A1 Q1 P1 D1

Q2 P2 C2 D2

P3 B3 C3 Q3

Pros of RAID 6:

o This level performs RAID 0 to strip data and RAID 1 to mirror. In this level, stripping is
performed before mirroring.
o In this level, drives required should be multiple of 2.

Cons of RAID 6:

o It is not utilized 100% disk capability as half is used for mirroring.


o It contains very limited scalability.

You might also like