Dbms Imp Notes
Dbms Imp Notes
1. Physical Level
2. Conceptual Level
3. External Level
1-Tier Architecture
In 1-Tier Architecture the database is directly available to the user, the user can
directly sit on the DBMS and use it that is, the client, server, and Database are all
present on the same machine. For Example: to learn SQL we set up an SQL server
and the database on the local system. This enables us to directly interact with the
relational database and execute operations. The industry won’t use this
architecture they logically go for 2-tier and 3-tier Architecture.
2-Tier Architecture
The 2-tier architecture is similar to a basic client-server model. The application at
the client end directly communicates with the database on the server side. APIs
like ODBC and JDBC are used for this interaction. The server side is responsible for
providing query processing and transaction management functionalities. On the
client side, the user interfaces and application programs are run. The application
on the client side establishes a connection with the server side to communicate
with the DBMS.
An advantage of this type is that maintenance and understanding are easier, and
compatible with existing systems. However, this model gives poor performance
when there are a large number of users.
DBMS 2-Tier Architecture
3-Tier Architecture
In 3-Tier Architecture, there is another layer between the client and the server.
The client does not directly communicate with the server. Instead, it interacts with
an application server which further communicates with the database system and
then the query processing and transaction management takes place. This
intermediate layer acts as a medium for the exchange of partially processed data
between the server and the client. This type of architecture is used in the case of
large web applications.
DBMS 3-Tier Architecture
Codd’s rules are proposed by a computer scientist named Dr. Edgar F. Codd
and he also invent the relational model for database management. These
rules are made to ensure data integrity, consistency, and usability. This set of
rules basically signifies the characteristics and requirements of a relational
database management system (RDBMS). In this article, we will learn about
various Codd’s rules.
• Decides hardware –
They decide on economical hardware, based on cost, performance,
and efficiency of hardware, and best suits the organization. It is
hardware that is an interface between end users and the database.
• Manages data integrity and security –
Data integrity needs to be checked and managed accurately as it
protects and restricts data from unauthorized use. DBA eyes on
relationships within data to maintain data integrity.
• Database Accessibility –
Database Administrator is solely responsible for giving permission
to access data available in the database. It also makes sure who
has the right to change the content.
• Database design –
DBA is held responsible and accountable for logical, physical
design, external model design, and integrity and security control.
• Database implementation –
DBA implements DBMS and checks database loading at the time
of its implementation.
• Query processing performance –
DBA enhances query processing by improving speed, performance,
and accuracy.
• Tuning Database Performance –
If the user is not able to get data speedily and accurately then it
may lose organization’s business. So by tuning SQL commands
DBA can enhance the performance of the database.
•
DBMS - Transaction
A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
ACID Properties
A transaction is a very small unit of a program and it may contain
several lowlevel tasks. A transaction in a database system must
maintain Atomicity, Consistency, Isolation, and Durability −
commonly known as ACID properties − in order to ensure
accuracy, completeness, and data integrity.
Serializability
When multiple transactions are being executed by the operating
system in a multiprogramming environment, there are
possibilities that instructions of one transactions are interleaved
with some other transaction.
Equivalence Schedules
An equivalence schedule can be of the following types −
Result Equivalence
If two schedules produce the same result after execution, they
are said to be result equivalent. They may yield the same result
for some value and different results for another set of values.
That's why this equivalence is not generally considered
significant.
View Equivalence
For example −
• If T reads the initial data in S1, then it also reads the initial
data in S2.
• If T reads the value written by J in S1, then it also reads the
value written by J in S2.
• If T performs the final write on the data value in S1, then it
also performs the final write on the data value in S2.
Conflict Equivalence
States of Transactions
A transaction in a database can be in one of the following states
−
1. Active State –
When the instructions of the transaction are running then the
transaction is in active state. If all the ‘read and write’ operations are
performed without any error then it goes to the “partially committed
state”; if any instruction fails, it goes to the “failed state”.
2. Partially Committed –
After completion of all the read and write operation the changes are
made in main memory or local buffer. If the changes are made
permanent on the DataBase then the state will change to “committed
state” and in case of failure it will go to the “failed state”.
3. Failed State –
When any instruction of the transaction fails, it goes to the “failed state”
or if failure occurs in making a permanent change of data on Data Base.
4. Aborted State –
After having any type of failure the transaction goes from “failed state”
to “aborted state” and since in previous states, the changes are only
made to local buffer or main memory and hence these changes are
deleted or rolled-back.
5. Committed State –
It is the state when the changes are made permanent on the Data Base
and the transaction is complete and therefore terminated in the
“terminated state”.
6. Terminated State –
If there isn’t any roll-back or the transaction comes from the
“committed state”, then the system is consistent and ready for new
transaction and the old transaction is terminated.
Whenever the Basic TO algorithm detects two conflicting operations that occur in
an incorrect order, it rejects the latter of the two operations by aborting the
Transaction that issued it. Schedules produced by Basic TO are guaranteed to
be conflict serializable. Already discussed that using Timestamp can ensure that
our schedule will be deadlock free.
One drawback of the Basic TO protocol is that Cascading Rollback is still
possible. Suppose we have a Transaction T1 and T2 has used a value written by T1.
If T1 is aborted and resubmitted to the system then, T2 must also be aborted and
rolled back. So the problem of Cascading aborts still prevails.
Let’s gist the Advantages and Disadvantages of Basic TO protocol:
Advantages :
Disadvantages:
1 2 4
2 2 3
3 2 3
4 3 4
For the above relation, σ(c>3)R will select the tuples which have c more than 3.
A B C
1 2 4
A B C
4 3 4
Note: The selection operator only selects the required tuples but does not display
them. For display, the data projection operator is used.
2. Projection(π): It is used to project required column data from a relation.
Example: Consider Table 1. Suppose we want columns B and C from Relation R.
π(B,C)R will show following columns.
B C
2 4
2 3
3 4
Ram 01
Mohan 02
Vivek 13
Geeta 17
GERMAN
Student_Name Roll_Number
Vivek 13
Geeta 17
Shyam 21
Rohan 25
Consider the following table of Students having different optional subjects in their
course.
π(Student_Name)FRENCH U π(Student_Name)GERMAN
Student_Name
Ram
Mohan
Vivek
Geeta
Shyam
Rohan
Note: The only constraint in the union of two relations is that both relations must
have the same set of Attributes.
4. Set Difference(-): Set Difference in relational algebra is the same set difference
operation as in set theory.
Example: From the above table of FRENCH and GERMAN, Set Difference is used as
follows
π(Student_Name)FRENCH - π(Student_Name)GERMAN
Student_Name
Ram
Mohan
Note: The only constraint in the Set Difference between two relations is that both
relations must have the same set of Attributes.
5. Set Intersection(∩): Set Intersection in relational algebra is the same set
intersection operation in set theory.
Example: From the above table of FRENCH and GERMAN, the Set Intersection is
used as follows
π(Student_Name)FRENCH ∩ π(Student_Name)GERMAN
Student_Name
Vivek
Geeta
Note: The only constraint in the Set Difference between two relations is that both
relations must have the same set of Attributes.
6. Rename(ρ): Rename is a unary operation used for renaming attributes of a
relation.
ρ(a/b)R will rename the attribute 'b' of the relation by 'a'.
7. Cross Product(X): Cross-product between two relations. Let’s say A and B, so
the cross product between A X B will result in all the attributes of A followed by
each attribute of B. Each record of A will pair with every record of B.
Example:
A
Name Age Sex
Ram 14 M
Sona 15 F
Name Age Sex
Kim 20 M
B
ID Course
1 DS
2 DBMS
AXB
Name Age Sex ID Course
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Sona 15 F 2 DBMS
Kim 20 M 1 DS
Kim 20 M 2 DBMS
Note: If A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘ n*m ‘ tuples.
Derived Operators
These are some of the derived operators, which are derived from the fundamental
operators.
1. Natural Join(⋈)
2. Conditional Join
1. Natural Join(⋈): Natural join is a binary operator. Natural join between two or
more relations will result in a set of all combinations of tuples where they have an
equal common attribute.
Example:
EMP
Name ID Dept_Name
A 120 IT
B 125 HR
C 110 Sales
D 111 IT
DEPT
Dept_Name Manager
Sales Y
Production Z
IT A
A 120 IT A
C 110 Sales Y
Name ID Dept_Name Manager
D 111 IT A
1 F 45
2 F 55
3 F 60
S
ID Sex Marks
10 M 20
11 M 22
12 M 59
1 F 45 10 M 20
1 F 45 11 M 22
R.ID R.Sex R.Marks S.ID S.Sex S.Marks
2 F 55 10 M 20
2 F 55 11 M 22
3 F 60 10 M 20
3 F 60 11 M 22
3 F 60 12 M 59
Relational Calculus
As Relational Algebra is a procedural query language, Relational Calculus is a non-
procedural query language. It basically deals with the end results. It always tells
me what to do but never tells me how to do it.
There are two types of Relational Calculus
1. Tuple Relational Calculus(TRC)
2. Domain Relational Calculus(DRC)
8. Mapping Cardinalities ?
Cardinality in DBMS
•••
In database management, cardinality plays an important role. Here cardinality
represents the number of times an entity of an entity set participates in a
relationship set. Or we can say that the cardinality of a relationship is the number
of tuples (rows) in a relationship. Types of cardinality in between tables are:
•one-to-one
• one-to-many
• many-to-one
• many-to-many
Mapping Cardinalities
In a database, the mapping cardinality or cardinality ratio means to denote the
number of entities to which another entity can be linked through a certain relation
set. Mapping cardinality is most useful in describing binary relation sets, although
they can contribute to the description of relation sets containing more than two
entity sets. Here, we will focus only on binary relation sets means we will find the
relation between entity sets A and B for the set R. So we can map any one of
following the cardinality:
1. One-to-one: In this type of cardinality mapping, an entity in A is connected to
at most one entity in B. Or we can say that a unit or item in B is connected to at
most one unit or item in A.
Figure 1
Example:
In a particular hospital, the surgeon department has one head of department. They
both serve one-to-one relationships.
Example:
In a particular hospital, the surgeon department has multiple doctors. They serve
one-to-many relationships.
Example:
In a particular hospital, multiple surgeries are done by a single surgeon. Such a
type of relationship is known as a many-to-one relationship.
Applications of Hashing
Hashing is applicable in the following area −
• Password verification
• Associating filename with their paths in operating systems
• Data Structures, where a key-value pair is created in which the key is a unique value,
whereas the value associated with the keys can be either same or different for
different keys.
• Board games such as Chess, tic-tac-toe, etc.
• Graphics processing, where a large amount of data needs to be matched and fetched.
• Delete − Search a record address and delete a record at the same address or delete a
chunk of records from records for that address in memory.
• Insertion − While entering a new record using static hashing, the hash function (h)
calculates bucket address "h(K)" for the search key (k), where the record is going to
be stored.
• Search − A record can be obtained using a hash function by locating the address of
the bucket where the data is stored.
• Update − It supports updating a record once it is traced in the data bucket.
• Delete − Locate the desired location and support deleting data (or a chunk of data) at
that location.
• Insertion − Support inserting new data into the data bucket if there is a space
available in the data bucket.
• Query − Perform querying to compute the bucket address.
• Update − Perform a query to update the data.
• The location of the data in memory keeps changing according to the bucket size. Hence
if there is a phenomenal increase in data, then maintaining the bucket address table
becomes a challenge.
Conclusion
Hashing is a computation technique that uses mathematical functions called
Hash Functions to calculate the location (address) of the data in the memory.
We learnt that there are two different hashing functions namely, Static hashing
and Dynamic hashing.
How it works ?
Now, we will see how actually thing happens behind the scene. As we know that
MongoDB is a database server and the data is stored in these databases. Or in
other words, MongoDB environment gives you a server that you can start and
then create multiple databases on it using MongoDB.
Because of its NoSQL database, the data is stored in the collections and
documents. Hence the database, collection, and documents are related to each
other as shown below:
MongoDB RDBMS
Features of MongoDB –
1. Document-Oriented:
• MongoDB stores data in flexible, JSON-like BSON documents. A
document is a set of key-value pairs, where values can include other
documents, arrays, and data types.
2. Schema-less:
• MongoDB is schema-less, meaning that documents in a collection can
have different fields and data types. This flexibility is beneficial for
evolving data models and handling diverse data structures.
3. Indexing:
• MongoDB supports the creation of indexes on any field, similar to
traditional relational databases. Indexing improves query performance
by allowing the database to locate and retrieve data more efficiently.
4. Query Language:
• MongoDB uses a rich query language that supports a wide range of
queries, including field queries, range queries, and regular expression
searches. The query language is expressive and allows for complex
filtering and projection.
5. Aggregation Framework:
• MongoDB provides a powerful aggregation framework that allows
users to perform data transformation and manipulation operations,
such as filtering, grouping, sorting, and projecting. It enables the
execution of complex analytics and reporting tasks directly within the
database.
6. Sharding:
• MongoDB supports horizontal scaling through sharding. Sharding
allows the distribution of data across multiple machines, enabling the
system to handle large datasets and high-throughput workloads.
7. Replication:
• MongoDB supports replica sets, which are clusters of MongoDB servers
that maintain the same data set. Replica sets provide high availability
and fault tolerance by automatically promoting a secondary server in
case the primary server fails.
8. Flexible Storage Engine:
• MongoDB allows users to choose from different storage engines, such
as WiredTiger and MMAPv1, based on specific use cases and
performance requirements.
9. GridFS:
• MongoDB includes GridFS, a specification for storing and retrieving
large files, such as images, videos, and audio files. GridFS enables
efficient storage and retrieval of files exceeding the BSON document
size limit.
10. Geospatial Indexing:
• MongoDB provides support for geospatial indexing, allowing for the
efficient storage and querying of geospatial data, such as coordinates
and shapes.
MongoDB's design and feature set make it suitable for a wide range of applications,
including content management systems, e-commerce platforms, real-time analytics,
and more. Its flexibility and scalability make it a popular choice for developers
working on modern, data-intensive applications.
Like every other technology, NoSQL databases also offer some benefits and
suffer from some limitations too.
In an era where relational databases are mainly used for data storage and
retrieval, modern web technologies posed a major challenge in the form of
unstructured data, high scale data, enormous concurrency etc.
NoSQL databases allow you to dynamically update the schema to evolve with
changing requirements while ensuring that it would cause no interruption or
downtime to your application.
NoSQL databases can scale to accommodate any type of data growth while
maintaining low cost.
NoSQL databases are built for great performance, measured in terms of both
throughput (it is a measure of overall performance) and latency (it is the delay
between request and actual response).
(v) Open-source:
NoSQL databases don’t require expensive licensing fees and can run on
inexpensive hardware, rendering their deployment cost-effective.
There is no standard that defines rules and roles of NoSQL databases. The
design and query languages of NoSQL databases vary widely between different
NoSQL products – much more widely than they do among traditional SQL
databases.
(ii) Backup of Database:
(iii) Consistency:
NoSQL databases are a diverse set of database management systems that offer
alternatives to traditional relational databases. Each NoSQL database has its own
strengths and weaknesses, and the suitability of a particular NoSQL solution depends
on the specific requirements of the application. Here are some general strengths and
weaknesses associated with NoSQL databases:
1. Scalability:
• Strength: NoSQL databases are often designed to scale horizontally,
allowing them to handle large amounts of data and traffic by adding
more servers to the database cluster.
• Example: MongoDB, Cassandra.
2. Flexibility and Schema-less Design:
• Strength: NoSQL databases, being schema-less, provide flexibility in
terms of data models. They can easily accommodate changes to the
data structure without requiring a predefined schema.
• Example: MongoDB, Couchbase.
3. Performance:
• Strength: NoSQL databases can offer high performance for specific
use cases, particularly those involving read and write operations with
large volumes of data.
• Example: Redis, Cassandra.
4. Handling Unstructured Data:
• Strength: NoSQL databases are well-suited for handling unstructured
or semi-structured data, making them suitable for scenarios where data
formats are evolving.
• Example: MongoDB, Couchbase.
5. Support for Large Data Sets:
• Strength: NoSQL databases are often used in scenarios with large
datasets and can efficiently handle data distributed across multiple
nodes.
• Example: HBase, Cassandra.
6. Horizontal Partitioning (Sharding):
• Strength: NoSQL databases typically support easy horizontal
partitioning, or sharding, which allows for distributing data across
multiple servers to improve performance and scalability.
• Example: MongoDB, Cassandra.
In summary, the choice between NoSQL and traditional relational databases depends
on the specific needs of the application. NoSQL databases excel in certain use cases,
providing scalability, flexibility, and performance, but they may have trade-offs in
terms of consistency and ease of use. Understanding the strengths and weaknesses is
crucial for making an informed decision based on the requirements of a particular
project.
1. Authorized Access:
• When an authorized user needs to access the encrypted data, they
must go through an authentication process to verify their identity.
2. Decryption Key:
• Obtain the appropriate decryption key. This key is used in conjunction
with the decryption algorithm to convert the ciphertext back into
plaintext.
3. Decryption Process:
• Apply the decryption algorithm to the encrypted data using the
decryption key. This reverses the encryption process and transforms the
ciphertext back into readable plaintext.
4. Access to Original Data:
• Once decrypted, the user has access to the original, human-readable
data. The decrypted data is temporarily available for the authorized
user to perform necessary operations.
1. Data-at-Rest Encryption:
• Protects data stored on disk or other storage media. Even if someone
gains physical access to the storage, they won't be able to access the
actual data without the decryption key.
2. Data-in-Transit Encryption:
• Secures data as it is transmitted between the database server and
clients. This prevents eavesdropping or interception of sensitive
information during communication.
3. Column-Level Encryption:
• Encrypts specific columns containing sensitive information within
database tables. This allows for a granular approach to protecting only
the most critical data.
4. Compliance Requirements:
• Many industries have regulatory requirements (e.g., GDPR, HIPAA) that
mandate the use of encryption to protect sensitive data, and DBMS
encryption helps organizations comply with these regulations.
5. Multi-Tenancy Security:
• In multi-tenant environments where multiple users or entities share the
same database, encryption ensures that each entity's data remains
confidential and isolated from others.
Types of Databases
There are various types of databases used for storing different varieties of data:
1) Centralized Database
It is the type of database that stores data at a centralized database system. It comforts
the users to access the stored data from different locations through several
applications. These applications contain the authentication process to let users access
data securely. An example of a Centralized database can be Central Library that carries
a central database of each library in a college/university.
o It has decreased the risk of data management, i.e., manipulation of data will not affect
the core data.
o Data consistency is maintained as it manages data in a central repository.
o It provides better data quality, which enables organizations to establish data standards.
o It is less costly because fewer vendors are required to handle the data sets.
o The size of the centralized database is large, which increases the response time for
fetching the data.
o It is not easy to update such an extensive database system.
o If any server failure occurs, entire data will be lost, which could be a huge loss.
2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed among
different database systems of an organization. These database systems are connected
via communication links. Such links help the end-users to access the data
easily. Examples of the Distributed database are Apache Cassandra, HBase, Ignite, etc.
3) Relational Database
This database is based on the relational data model, which stores data in the form of
rows(tuple) and columns(attributes), and together forms a table(relation). A relational
database uses SQL for storing, manipulating, as well as maintaining the data. E.F. Codd
invented the database in 1970. Each table in the database carries a key that makes the
data unique from others. Examples of Relational databases are MySQL, Microsoft SQL
Server, Oracle, etc.
C means Consistency: If we perform any operation over the data, its value before and
after the operation should be preserved. For example, the account balance before and
after the transaction should be correct, i.e., it should remain conserved.
I means Isolation: There can be concurrent users for accessing data at the same time
from the database. Thus, isolation between the data should remain isolated. For
example, when multiple transactions occur at the same time, one transaction effects
should not be visible to the other transactions in the database.
D means Durability: It ensures that once it completes the operation and commits the
data, data changes should remain permanent.
4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of
data sets. It is not a relational database as it stores data not only in tabular form but in
several different ways. It came into existence when the demand for building modern
applications increased. Thus, NoSQL presented a wide variety of database technologies
in response to the demands. We can further divide a NoSQL database into the
following four types:
a. Key-value storage: It is the simplest type of database storage where it stores every
single item as a key (or attribute name) holding its value, together.
b. Document-oriented Database: A type of database used to store data as JSON-like
document. It helps developers in storing data by using the same document-model
format as used in the application code.
c. Graph Databases: It is used for storing vast amounts of data in a graph-like structure.
Most commonly, social networking websites use the graph database.
d. Wide-column stores: It is similar to the data represented in relational databases. Here,
data is stored in large columns together, instead of storing in rows.
5) Cloud Database
A type of database where data is stored in a virtual environment and executes over the
cloud computing platform. It provides users with various cloud computing services
(SaaS, PaaS, IaaS, etc.) for accessing the database. There are numerous cloud platforms,
but the best options are:
6) Object-oriented Databases
The type of database that uses the object-based data model approach for storing data
in the database system. The data is represented and stored as objects which are similar
to the objects used in the object-oriented programming language.
7) Hierarchical Databases
It is the type of database that stores data in the form of parent-children relationship
nodes. Here, it organizes data in a tree-like structure.
Data get stored in the form of records that are connected via links. Each child record
in the tree will contain only one parent. On the other hand, each parent record can
have multiple child records.
8) Network Databases
It is the database that typically follows the network data model. Here, the
representation of data is in the form of nodes connected via links between them. Unlike
the hierarchical database, it allows each record to have multiple children and parent
nodes to form a generalized graph structure.
9) Personal Database
Collecting and storing data on the user's system defines a Personal Database. This
database is basically designed for a single user.
There are several types of database systems, each designed to handle specific types
of data and applications. Here are some common types of database systems:
• Overview: Uses a tabular structure with rows and columns to store data. It
enforces the principles of the relational model, including ACID properties
(Atomicity, Consistency, Isolation, Durability).
• Examples: MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server.
2. NoSQL Database:
• Overview: Organizes data by columns rather than rows, making it suitable for
analytics and data warehousing. Columnar databases can efficiently handle
queries involving aggregations and analytics.
• Examples: Apache Cassandra, Amazon Redshift, Google Bigtable.
4. Document-Oriented Database:
5. Graph Database:
6. In-Memory Database:
• Overview: Stores data in the system's main memory (RAM) rather than on
disk, resulting in faster data access and retrieval. In-memory databases are
suitable for applications that require high-speed data processing.
• Examples: Redis, SAP HANA, Memcached.
7. Time-Series Database:
8. Spatial Database:
Each type of database system has its strengths and weaknesses, and the choice
depends on the specific requirements of the application, scalability needs, data
structure, and other factors. Organizations often use a combination of database
systems to meet different needs within their IT infrastructure.
The conceptual data model describes the database at a very high level and is useful
to understand the needs or requirements of the database. It is this model, that is
used in the requirement-gathering process i.e. before the Database Designers start
making a particular database. One such popular model is the entity/relationship
model (ER model). The E/R model specializes in entities, relationships, and even
attributes that are used by database designers. In terms of this concept, a
discussion can be made even with non-computer science(non-technical) users and
stakeholders, and their requirements can be understood.
Entity-Relationship Model( ER Model): It is a high-level data model which is
used to define the data and the relationships between them. It is basically a
conceptual design of any database which is easy to design the view of data.
Components of ER Model:
1. Entity: An entity is referred to as a real-world object. It can be a name,
place, object, class, etc. These are represented by a rectangle in an ER
Diagram.
2. Attributes: An attribute can be defined as the description of the entity.
These are represented by Eclipse in an ER Diagram. It can be Age, Roll
Number, or Marks for a Student.
3. Relationship: Relationships are used to define relations among
different entities. Diamonds and Rhombus are used to show
Relationships.
This type of data model is used to represent only the logical part of the database
and does not represent the physical structure of the database. The
representational data model allows us to focus primarily, on the design part of the
database. A popular representational model is a Relational model. The relational
Model consists of Relational Algebra and Relational Calculus. In the Relational
Model, we basically use tables to represent our data and the relationships between
them. It is a theoretical concept whose practical implementation is done in
Physical Data Model.
The advantage of using a Representational data model is to provide a foundation
to form the base for the Physical model
The physical Data Model is used to practically implement Relational Data Model.
Ultimately, all data in a database is stored physically on a secondary storage device
such as discs and tapes. This is stored in the form of files, records, and certain other
data structures. It has all the information on the format in which the files are
present and the structure of the databases, the presence of external data
structures, and their relation to each other. Here, we basically save tables in
memory so they can be accessed efficiently. In order to come up with a good
physical model, we have to work on the relational model in a better
way. Structured Query Language (SQL) is used to practically implement Relational
Algebra.
This Data Model describes HOW the system will be implemented using a specific
DBMS system. This model is typically created by DBA and developers. The
purpose is actual implementation of the database.
• The physical data model describes data need for a single project or
application though it maybe integrated with other physical data models
based on project scope.
• Data Model contains relationships between tables that which addresses
cardinality and nullability of the relationships.
• Developed for a specific version of a DBMS, location, data storage or
technology to be used in the project.
• Columns should have exact datatypes, lengths assigned and default
values.
• Primary and Foreign keys, views, indexes, access profiles, and
authorizations, etc. are defined
Some Other Data Models
1. Hierarchical Model
The hierarchical Model is one of the oldest models in the data model which was
developed by IBM, in the 1950s. In a hierarchical model, data are viewed as a
collection of tables, or we can say segments that form a hierarchical relation. In
this, the data is organized into a tree-like structure where each record consists of
one parent record and many children. Even if the segments are connected as a
chain-like structure by logical associations, then the instant structure can be a fan
structure with multiple branches. We call the illogical associations as directional
associations.
2. Network Model
The Network Model was formalized by the Database Task group in the 1960s. This
model is the generalization of the hierarchical model. This model can consist of
multiple parent segments and these segments are grouped as levels but there
exists a logical association between the segments belonging to any level. Mostly,
there exists a many-to-many logical association between any of the two segments.
In the Object-Oriented Data Model, data and their relationships are contained in a
single structure which is referred to as an object in this data model. In this, real-
world problems are represented as objects with different attributes. All objects
have multiple relationships between them. Basically, it is a combination of Object
Oriented programming and a Relational Database Model.
The float data model basically consists of a two-dimensional array of data models
that do not contain any duplicate elements in the array. This data model has one
drawback it cannot store a large amount of data that is the tables can not be of
large size.
The Context data model is simply a data model which consists of more than one
data model. For example, the Context data model consists of ER Model, Object-
Oriented Data Model, etc. This model allows users to do more than one thing which
each individual data model can do.
Semi-Structured data models deal with the data in a flexible way. Some entities
may have extra attributes and some entities may have some missing attributes.
Basically, you can represent data here in a flexible way.
Conclusion
• Data modeling is the process of developing data model for the data to
be stored in a Database.
• Data Models ensure consistency in naming conventions, default values,
semantics, security while ensuring quality of the data.
• Data Model structure helps to define the relational tables, primary and
foreign keys and stored procedures.
• There are three types of conceptual, logical, and physical.
• The main aim of conceptual model is to establish the entities, their
attributes, and their relationships.
• Logical data model defines the structure of the data elements and set
the relationships between them.
• A Physical Data Model describes the database specific implementation
of the data model.
• The main goal of a designing data model is to make certain that data
objects offered by the functional team are represented accurately.
• The biggest drawback is that even smaller change made in structure
require modification in the entire application.
• Reading this Data Modeling tutorial, you will learn from the basic
concepts such as What is Data Model? Introduction to different types of
Data Model, advantages, disadvantages, and data model example.
Lossless Decomposition
Decomposition is lossless if it is feasible to reconstruct relation R from
decomposed tables using Joins. This is the preferred choice. The information
will not lose from the relation when decomposed. The join would result in the
same original relation.
<EmpInfo>
<EmpDetails>
<DeptDetails>
Dpt2 E002 HR
Lossy Decomposition
As the name suggests, when a relation is decomposed into two or more
relational schemas, the loss of information is unavoidable when the original
relation is retrieved.
<EmpInfo>
<EmpDetails>
<DeptDetails>
Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance
Now, you won’t be able to join the above tables, since Emp_ID isn’t part of
the DeptDetails relation.
1. Lossy Decomposition:
Definition:
Characteristics:
1. Information Loss:
• Lossy decomposition discards certain attributes or dependencies,
leading to a loss of information.
2. Irreversible:
• The original relation cannot be precisely reconstructed from the
decomposed relations due to the discarded information.
3. Storage Space Reduction:
• Lossy decomposition may be chosen for the purpose of reducing
storage space, but this comes at the cost of sacrificing certain details.
Use Case:
2. Lossless Decomposition:
Definition:
• Lossless decomposition involves breaking down a relation into smaller
relations in a way that preserves all the functional dependencies and
information present in the original relation. The original relation can be
reconstructed exactly from the decomposed relations.
Characteristics:
1. Preservation of Information:
• All information and dependencies in the original relation are preserved
in the decomposed relations.
2. Reversibility:
• The original relation can be reconstructed exactly from the
decomposed relations without any loss of data.
3. Data Integrity:
• Lossless decomposition ensures that the decomposed relations
maintain the same level of data integrity as the original relation.
Use Case:
Example:
A database index is a data structure that is helpful to quickly locate and access
the data in a database table.
• The first column is the Search key which contains a copy of the primary key or
candidate key of the table.
• The second column is the data reference that contains a set of pointers which hold the
address of the disk block where the key value can be found.
Structure of Index
The structure of an index in the database management system (DBMS) is given
below −
Types of indexes
The different types of index are as follows −
• Primary
• Clustering
• Secondary
Secondary Index
• Index (Unique value) is created for each record in a data file which is a candidate key.
• Secondary index is a type of dense index and also called a non clustering index.
• Secondary mapping size will be small as the two level DB indexing is used.
Primary Index
• Primary index is defined on an ordered data file. The data file is ordered on a key field.
The key field is generally the primary key of the relation.
17.
• Popular Articles
If transactions are executed serially, i.e., sequentially with no overlap in time, no
transaction concurrency exists. However, if concurrent transactions with interleaving
operations are allowed in an uncontrolled manner, some unexpected, undesirable result
may occur, such as:
• The lost update problem: A second transaction writes a second value of a data-
item (datum) on top of a first value written by a first concurrent transaction, and
the first value is lost to other transactions running concurrently which need, by
their precedence, to read the first value. The transactions that have read the
wrong value end with incorrect results.
• The dirty read problem: Transactions read a value written by a transaction that
has been later aborted. This value disappears from the database upon abort, and
should not have been read by any transaction (“dirty read”). The reading
transactions end with incorrect results.
• The incorrect summary problem: While one transaction takes a summary over
the values of all the instances of a repeated data-item, a second transaction
updates some instances of that data-item. The resulting summary does not
reflect a correct result for any (usually needed for correctness) precedence
order between the two transactions (if one is executed before the other), but
rather some random result, depending on the timing of the updates, and
whether certain update results have been included in the summary or not.
Most high-performance transactional systems need to run transactions concurrently to
meet their performance requirements. Thus, without concurrency control such systems
can neither provide correct results nor maintain their databases consistently.
Categories
The main categories of concurrency control mechanisms are:
The mutual blocking between two transactions (where each one blocks the other) or
more results in a deadlock, where the transactions involved are stalled and cannot reach
completion. Most non-optimistic mechanisms (with blocking) are prone to deadlocks
which are resolved by an intentional abort of a stalled transaction (which releases the
other transactions in that deadlock), and its immediate restart and re-execution. The
likelihood of a deadlock is typically low.
Blocking, deadlocks, and aborts all result in performance reduction, and hence the trade-
offs between the categories.
18. What is concurrency control in DBMS?
DBMSDatabaseBig Data Analytics
For example, if we take ATM machines and do not use concurrency, multiple
persons cannot draw money at a time in different places. This is where we
need concurrency.
Advantages
The advantages of concurrency control are as follows −
Control concurrency
The simultaneous execution of transactions over shared databases can create
several data integrity and consistency problems.
For example, if too many people are logging in the ATM machines, serial
updates and synchronization in the bank servers should happen whenever the
transaction is done, if not it gives wrong information and wrong data in the
database.
Locking
Lock guaranties exclusive use of data items to a current transaction. It first
accesses the data items by acquiring a lock, after completion of the transaction
it releases the lock.
Types of Locks
• Shared Lock [Transaction can read only the data item values]
• Exclusive Lock [Used for both read and write data item values]
Time Stamping
Time stamp is a unique identifier created by DBMS that indicates relative
starting time of a transaction. Whatever transaction we are doing it stores the
starting time of the transaction and denotes a specific time.
This can be generated using a system clock or logical counter. This can be
started whenever a transaction is started. Here, the logical counter is
incremented after a new timestamp has been assigned.
Optimistic
It is based on the assumption that conflict is rare and it is more efficient to
allow transactions to proceed without imposing delays to ensure serializability.
Database Schemas
•••
Pre-requisite: Introduction to Database Management System
Nowadays data is one of the most important things in the business world, every
business captures its customer’s data to understand their behavior, And in the
world of the internet, data is growing like crazy, so businesses need more
advanced database solutions, by which they can maintain the database systems
and whenever they need data to solve business problems, they can easily get what
data they want without any problem. To fulfil this condition, there is a requirement
of database schema in picture.
What is Schema?
• The Skeleton of the database is created by the attributes and this
skeleton is named as Schema.
• Schema is mentioning the logical constraints like table, primary key etc.
• Schema does not represent the data type of the attributes.
Details of a Customer
Schema of Customer
Database Schema
• A database schema is a logical representation of data that shows how
the data in a database should be stored logically. It shows how the data
is organized and the relationship between the tables.
• Database schema contains table, field, views and relation between
different keys like primary key, foreign key.
• Data are stored in the form of files which is unstructured in nature which
makes accessing the data difficult. Thus to resolve the issue the data are
organized in structured way with the help of database schema.
• Database schema provides the organization of data and the relationship
between the stored data.
• Database schema defines a set of guidelines that control the database
along with that it provides information about the way of accessing and
modifying the data.
Types of Database Schemas
There are 3 types of database schema:
• Physical Database Schema:
• A Physical schema defines, how the data or information is
stored physically in the storage systems in the form of files &
indices. This is the actual code or syntax needed to create the
structure of a database, we can say that when we design a
database at a physical level, it’s called physical schema.
• The Database administrator chooses where and how to store
the data in the different blocks of storage.
• Logical Database Schema:
• A logical database schema defines all the logical constraints
that need to be applied to the stored data, and also describes
tables, views, entity relationships, and integrity constraints.
• The Logical schema describes how the data is stored in the
form of tables & how the attributes of a table are connected.
• Using ER modelling the relationship between the
components of the data is maintained.
• In logical schema different integrity constraints are defined in
order to maintain the quality of insertion and update the data.
• View Database Schema:
• It is a view level design which is able to define the interaction
between end-user and database.
• User is able to interact with the database with the help of the
interface without knowing much about the stored mechanism
of data in database.
Three Layer Schema Design
Hierarchical Model
The hierarchical model has a tree-like structure, this tree structure contains the
root node that links to its child nodes. Each child node & parent node have a one-
to-many relationship. In other words, we can say that hierarchical schema has a
root table that is associated with multiple tables, and every table can have multiple
child tables, but every child table can only have one parent table. This type of
schema is presented by XML or JSON files.
Network Model
The network model and the hierarchical model are quite similar with an important
difference that is related to data relationships. The network model allows many-
to-many relationships whereas hierarchical models allow one-to-many
relationships.
Star Schema
Star schema is better for storing and analyzing large amounts of data. It has a fact
table at its center & multiple dimension tables connected to it just like a star, where
the fact table contains the numerical data that run business processes and the
dimension table contains data related to dimensions such as product, time, people,
etc. or we can say, this table contains the description of the fact table. The star
schema allows us to structure the data of RDBMS.
Designing Star Schema
Snowflake Schema
Just like star schema, the snowflake schema also has a fact table at its center and
multiple dimension tables connected to it, but the main difference in both models
is that in snowflake schema – dimension tables are further normalized into
multiple related tables. The snowflake schema is used for analyzing large amounts
of data.
Designing Snowflake Schema
The design of database is independent The design of a database must work with a
to any database specific database
management system. management system or hardware platform.
Changes in Physical schema effects the Any changes made in logical schema have
logical schema minimal effect in the physical schema
Physical Schema Logical Schema
Physical schema contains the attributes Logical schema does not contain any
and their data types. attributes or data types.
Conclusion
• The Structure of the database is referred to as the Schema, and it
represents logical restrictions like Table and Key, among other things.
• Three Schema Architecture was developed to prevent the user from
direct access to the database.
• Since the information that is saved in the database is subject to frequent
change, Instance is a representation of a data at a specific time.
Deadlock Prevention
To prevent any deadlock situation in the system, the DBMS
aggressively inspects all the operations, where transactions are
about to execute. The DBMS inspects the operations and analyzes
if they can create a deadlock situation. If it finds that a deadlock
situation might occur, then that transaction is never allowed to be
executed.
Wait-Die Scheme
In this scheme, if a transaction requests to lock a resource (data
item), which is already held with a conflicting lock by another
transaction, then one of the two possibilities may occur −
This scheme allows the older transaction to wait but kills the
younger one.
Wound-Wait Scheme
Deadlock Avoidance
Aborting a transaction is not always a practical approach.
Instead, deadlock avoidance mechanisms can be used to detect
any deadlock situation in advance. Methods like "wait-for graph"
are available but they are suitable for only those systems where
transactions are lightweight having fewer instances of resource.
In a bulky system, deadlock prevention techniques may work
well.
Wait-for Graph
• Discuss
• Courses
••
Data masking is a very important concept to keep data safe from any breaches.
Especially, for big organizations that contain heaps of sensitive data that can be
easily compromised. Details like credit card information, phone numbers, house
addresses are highly vulnerable information that must be protected. To
understand data masking better we first need to know what computer networks
are.
Network security consists of many layers and an attack can happen in any one of
these layers. These networks usually consist of three controls
• Physical network security
• Technical network security
• Administrative network security
Physical network security: this is designed to keep the system network safe from
unauthorized personnel from breaking into the network components that include
OUI, Fiber optic cable, etc.,
Technical network security: this protects the data that is stored in the network
or which is transmitted throughout it. It ensures no one gets away with any
unauthorized activities apart from the user themself.
Administrative network security: this includes all the policies and procedures
that need to be followed by the authorized users for other personnel.
Data masking:
Data masking means creating an exact replica of pre-existing data in order to keep
the original data safe and secure from any safety breaches. Various data masking
software is being created so organizations can use them to keep their data safe.
That is how important it is to emphasize data masking.
There are various types of data masking. Some of them are given below
• Static data masking(SDM): Static data masking works at a state of rest
by altering the data thereby, permanently replacing sensitive data. It
helps an organization to create a clean and nearly breaches free copy of
their database. SDM is commonly used for development and data
testing.
Miguel Easy 20
• After Substitution :
Participant Name Problem Type Score
Miguel Easy 50
Miguel Easy 20
Samara Medium 37.2
• After Shuffling:
Participant Name Problem Type Score
Alena Hard 50
• Nulling out or deletion: Nulling out is exactly what the name suggests
you delete the values in a column by replacing them with NULL values.
This is a very effective method to eliminate showing any sensitive
information in a test environment.
• Pros: Very useful in situations where data is not essential
• Cons: Not applicable in test environments.
• Date Aging: If you have dates in your data set that you don’t want to
reveal then you can set the dates a little back or forth than what actually
is given. For example, if you have a date set to 20-8-21 then you can set
the date to 300 days back that is 01-02-21. This can also be done with
any kind of numeric data. Make sure that the data in a column or row is
aged to a definite number or similar algorithm
• Pros: Easy to remember the algorithm and effective masking
of information
• Cons: Only appropriate for numeric data.
• Original Data Set:
Participant Name Problem Type Score
Miguel Easy 50
Miguel Easy 50
1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point
from where it can't go any further. If a few transaction or process is hurt, then
this is called as transaction failure.
2. System Crash
o System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.
3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail frequently. It was
a common problem in the early days of technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head crash, and
unreachability to the disk or any other failure, which destroy all or part of disk
storage.
• Discuss
• Courses
• Video
••
Pre-Requisite: Relational Database Model
Keys are one of the most important elements in a relational database to maintain
the relationship between the tables and it also helps in uniquely identifying the
data from a table. The primary Key is a key that helps in uniquely identifying the
tuple of the database whereas the Foreign Key is a key that is used to identify the
relationship between the tables through the primary key of one table that is the
primary key one table acts as a foreign key to another table. Now, let’s discuss both
of them in some detail.
Table STUDENT_COURSE
STUD_NO COURSE_NO COURSE_NAME
1 C1 DBMS
2 C2 Computer Networks
1 C2 Computer Networks
It uniquely identifies a record in It refers to the field in a table which is the primary
the relational database table. key of another table.
Only one primary key is allowed Whereas more than one foreign key is allowed in
in a table. a table.
It does not allow NULL values. It can also contain NULL values.
PRIMARY KEY FOREIGN KEY
Conclusion
In this article, we have basically mentioned the primary key and foreign key, and
the differences between them. Both the keys, whether the primary key or the
foreign key, play an important role in the Database management system. Primary
Key contains unique values, whereas Foreign Key contains values taking reference
from Primary Keys. The main characteristic property of the Primary key is that it
can’t be repeated, it is unique. There are some differences between their functions,
as Primary Keys determines a row in the table and Foreign Key determines the
relation between tables.
24. BCNF ?
Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
PauseNext
Unmute
Duration 18:10
Loaded: 8.07%
Â
Fullscreen
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a
key.
25. Raid in dbms?
• Discuss
• Courses
••
RAID is a technique that makes use of a combination of multiple disks instead
of using a single disk for increased performance, data redundancy, or both.
The term was coined by David Patterson, Garth A. Gibson, and Randy Katz at
the University of California, Berkeley in 1987.
Why Data Redundancy?
Data redundancy, although taking up extra space, adds to disk reliability. This
means, that in case of disk failure, if the same data is also backed up onto
another disk, we can retrieve the data and go on with the operation. On the
other hand, if the data is spread across multiple disks without the RAID
technique, the loss of a single disk can affect the entire data.
Key Evaluation Points for a RAID System
Reliability: How many disk faults can the system tolerate?
•
• Availability: What fraction of the total session time is a system in
uptime mode, i.e. how available is the system for actual use?
• Performance: How good is the response time? How high is the
throughput (rate of processing work)? Note that performance
contains a lot of parameters and not just the two.
• Capacity: Given a set of N disks each with B blocks, how much
useful capacity is available to the user?
RAID is very transparent to the underlying system. This means, that to the
host system, it appears as a single big disk presenting itself as a linear array
of blocks. This allows older technologies to be replaced by RAID without
making too many changes to the existing code.
Different RAID Levels
1. RAID-0 (Stripping)
2. RAID-1 (Mirroring)
3. RAID-2 (Bit-Level Stripping with Dedicated Parity)
4. RAID-3 (Byte-Level Stripping with Dedicated Parity)
5. RAID-4 (Block-Level Stripping with Dedicated Parity)
6. RAID-5 (Block-Level Stripping with Distributed Parity)
7. RAID-6 (Block-Level Stripping with two Parity Bits)
It consists of an array of disks in which multiple disks are connected to achieve different
goals.
RAID technology
There are 7 levels of RAID schemes. These schemas are as RAID 0, RAID 1, ...., RAID 6.
o RAID level 0 provides data stripping, i.e., a data can place across multiple disks. It is
based on stripping that means if one disk fails then all data in the array is lost.
o This level doesn't provide fault tolerance but increases the system performance.
Example:
20 21 22 23
24 25 26 27
28 29 30 31
32 33 34 35
In this level, instead of placing just one block into a disk at a time, we can work with
two or more blocks placed it into a disk before moving on to the next one.
20 22 24 26
21 23 25 27
28 30 32 34
29 31 33 35
In this above figure, there is no duplication of data. Hence, a block once lost cannot
be recovered.
Pros of RAID 0:
o In this level, throughput is increased because multiple data requests probably not on
the same disk.
o This level full utilizes the disk space and provides high performance.
o It requires minimum 2 drives.
Cons of RAID 0:
RAID 1
This level is called mirroring of data as it copies the data from drive 1 to drive 2. It
provides 100% redundancy in case of a failure.
Example:
A A B B
C C D D
E E F F
G G H H
Only half space of the drive is used to store the data. The other half of drive is just a
mirror to the already stored data.
Pros of RAID 1:
o The main advantage of RAID 1 is fault tolerance. In this level, if one disk fails, then the
other automatically takes over.
o In this level, the array will function even if any one of the drives fails.
Cons of RAID 1:
o In this level, one extra drive is required per drive for mirroring, so the expense is higher.
RAID 2
o RAID 2 consists of bit-level striping using hamming code parity. In this level, each data
bit in a word is recorded on a separate disk and ECC code of data words is stored on
different set disks.
o Due to its high cost and complex structure, this level is not commercially used. This
same performance can be achieved by RAID 3 at a lower cost.
Pros of RAID 2:
Cons of RAID 2:
RAID 3
o RAID 3 consists of byte-level striping with dedicated parity. In this level, the parity
information is stored for each disk section and written to a dedicated parity drive.
o In case of drive failure, the parity drive is accessed, and data is reconstructed from the
remaining devices. Once the failed drive is replaced, the missing data can be restored
on the new drive.
o In this level, data can be transferred in bulk. Thus high-speed data transmission is
possible.
A B C P(A, B, C)
D E F P(D, E, F)
G H I P(G, H, I)
J K L P(J, K, L)
Pros of RAID 3:
o In this level, data is regenerated using parity drive.
o It contains high data transfer rates.
o In this level, data is accessed in parallel.
Cons of RAID 3:
RAID 4
o RAID 4 consists of block-level stripping with a parity disk. Instead of duplicating data,
the RAID 4 adopts a parity-based approach.
o This level allows recovery of at most 1 disk failure due to the way parity works. In this
level, if more than one disk fails, then there is no way to recover the data.
o Level 3 and level 4 both are required at least three disks to implement RAID.
A B C P0
D E F P1
G H I P2
J K L P3
In this level, parity can be calculated using an XOR function. If the data bits are 0,0,0,1
then the parity bits is XOR(0,1,0,0) = 1. If the parity bits are 0,0,1,1 then the parity bit
is XOR(0,0,1,1)= 0. That means, even number of one results in parity 0 and an odd
number of one results in parity 1.
C1 C2 C3 C4 Parity
0 1 0 0 1
0 0 1 1 0
Suppose that in the above figure, C2 is lost due to some disk failure. Then using the
values of all the other columns and the parity bit, we can recompute the data bit stored
in C2. This level allows us to recover lost data.
RAID 5
o RAID 5 is a slight modification of the RAID 4 system. The only difference is that in RAID
5, the parity rotates among the drives.
o It consists of block-level striping with DISTRIBUTED parity.
o Same as RAID 4, this level allows recovery of at most 1 disk failure. If more than one
disk fails, then there is no way for data recovery.
0 1 2 3 P0
5 6 7 P1 4
10 11 P2 8 9
15 P3 12 13 14
P4 16 17 18 19
This level was introduced to make the random write performance better.
Pros of RAID 5:
o In this level, disk failure recovery takes longer time as parity has to be calculated from
all available drives.
o This level cannot survive in concurrent drive failure.
RAID 6
o This level is an extension of RAID 5. It contains block-level stripping with 2 parity bits.
o In RAID 6, you can survive 2 concurrent disk failures. Suppose you are using RAID 5,
and RAID 1. When your disks fail, you need to replace the failed disk because if
simultaneously another disk fails then you won't be able to recover any of the data, so
in this case RAID 6 plays its part where you can survive two concurrent disk failures
before you run out of options.
A0 B0 Q0 P0
A1 Q1 P1 D1
Q2 P2 C2 D2
P3 B3 C3 Q3
Pros of RAID 6:
o This level performs RAID 0 to strip data and RAID 1 to mirror. In this level, stripping is
performed before mirroring.
o In this level, drives required should be multiple of 2.
Cons of RAID 6: