ET UNIT-2
ET UNIT-2
UNIT-2
DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server architecture
is used to deal with a large number of PCs, web servers, database servers and other
components that are connected with networks.
o The client/server architecture consists of many PCs and a workstation which are
connected via the network.
o DBMS architecture depends upon how users are connected to the database to get their
request done.
Database architecture can be seen as a single tier or multi-tier. But logically, database
architecture is of two types like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user can
directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide
a handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
2-Tier Architecture
1
EMERGING TECHNOLOGIES IN DATA PROCESSING
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and
transaction management.
o To communicate with the DBMS, client-side application establishes a connection
with the server side.
3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In this
architecture, client can't directly communicate with the server.
o The application on the client-end interacts with an application server which further
communicates with the database system.
o End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web application.
2
EMERGING TECHNOLOGIES IN DATA PROCESSING
Architectures for DBMSs have generally followed trends seen in architectures for larger
computer systems.
The primary processing for all system functions, including user application programs, user
interface programs, and all DBMS capabilities, was handled by mainframe computers in
earlier systems.
The primary cause of this was that the majority of users accessed such systems using
computer terminals with limited processing power and merely display capabilities.
Only display data and controls were delivered from the computer system to the display
terminals, which were connected to the central node by a variety of communications
networks, while all processing was done remotely on the computer system.
The majority of users switched from terminals to PCs and workstations as hardware prices
decreased.
Initially, Database Systems operated on these computers in a manner akin to how they had
operated display terminals. As a result, the DBMS itself continued to operate as a centralized
DBMS, where all DBMS functionality, application program execution, and UI processing
were done on a single computer.
The physical elements of a centralized architecture Client/server DBMS designs emerged as
DBMS systems gradually began to take advantage of the user side's computing capability.
In order to handle computing settings with a high number of PCs, workstations, file servers,
printers, database servers, etc., the client/server architecture was designed.
A network connects various pieces of software and hardware, including email and web server
software.
To define specialized servers with a particular functionality is the aim. For instance, it is
feasible to link a number of PCs or compact workstations to a file server that manages the
client machines' files as clients.
3
EMERGING TECHNOLOGIES IN DATA PROCESSING
Here, the term "two-tier" refers to our architecture's two layers-the Client layer and the Data layer.
There are a number of client computers in the client layer that can contact the database server. The
API on the client computer will use JDBC or some other method to link the computer to the database
server. This is due to the possibility of various physical locations for clients and database servers.
The Business Logic Layer is an additional layer that serves as a link between the Client layer and the
Data layer in this instance. The layer where the application programs are processed is the business
logic layer, unlike a Two-tier architecture, where queries are performed in the database server. Here,
the application programs are processed in the application server itself.
Server Architecture
Server architecture is the foundational layout or model of a server, based on which a server is
created and/or deployed.
It defines how a server is designed, different components the server is created from, and the
services that it provides.
Server Architecture : Server architecture primarily helps in designing and evaluating the
server and its associated operations as well as services in whole before it is actually deployed.
Server architecture includes, but is not limited to:
4
EMERGING TECHNOLOGIES IN DATA PROCESSING
1. Parallel Database :
A parallel DBMS is a DBMS that runs across multiple processors and is designed to
execute operations in parallel, whenever possible.
The parallel DBMS link a number of smaller machines to achieve the same
throughput as expected from a single large machine.
Features :
1. There are parallel working of CPUs
2. It improves performance
3. It divides large tasks into various other tasks
4. Completes works very quickly
2. Distributed Database :
A Distributed database is defined as a logically related collection of data that is shared
which is physically distributed over a computer network on different sites.
The Distributed DBMS is defined as, the software that allows for the management of
the distributed database and makes the distributed data available for the users.
Features :
1. It is a group of logically related shared data
2. The data gets split into various fragments
3. There may be a replication of fragments
4. The sites are linked by a communication network
The main difference between the parallel and distributed databases is that the former
is tightly coupled and then later loosely coupled.
In parallel databases, processes are tightly In distributed databases, the sites are loosely
coupled and constitutes a single database coupled and share no physical components i.e.,
system i.e., the parallel database is a distributed database is our geographically
centralized database and data reside in a departed, and data are distributed at several
single location locations.
In parallel databases, query processing and In distributed databases, query processing and
5
EMERGING TECHNOLOGIES IN DATA PROCESSING
In parallel databases, the data is partitioned In distributed databases, each site preserve a
among various disks so that it can be local database system for faster processing due
retrieved faster. to the slow interconnection between sites
Skew is the major issue with the increasing Blocking due to site failure and transparency
degree of parallelism in parallel databases. are the major issues in distributed databases.
6
EMERGING TECHNOLOGIES IN DATA PROCESSING
1. I/O parallelism
2. Intra-query parallelism
3. Inter-query parallelism
4. Intra-operation parallelism
5. Inter-operation parallelism
1. I/O parallelism :
It is a form of parallelism in which the relations are partitioned on multiple disks a
motive to reduce the retrieval time of relations from the disk.
Within, the data inputted is partitioned and then processing is done in parallel with
each partition.
The results are merged after processing all the partitioned data. It is also known
as data-partitioning.
Hash partitioning has the advantage that it provides an even distribution of data across
the disks and it is also best suited for those point queries that are based on the
partitioning attribute.
It is to be noted that partitioning is useful for the sequential scans of the entire table
placed on ‘n‘ number of disks and the time taken to scan the relationship is
approximately 1/n of the time required to scan the table on a single disk system.
We have four types of partitioning in I/O parallelism:
Hash partitioning –
As we already know, a Hash Function is a fast, mathematical function. Each row of the
original relationship is hashed on partitioning attributes. For example, let’s assume that
there are 4 disks disk1, disk2, disk3, and disk4 through which the data is to be partitioned.
Now if the Function returns 3, then the row is placed on disk3.
Range partitioning –
In range partitioning, it issues continuous attribute value ranges to each disk. For
example, we have 3 disks numbered 0, 1, and 2 in range partitioning, and may assign
relation with a value that is less than 5 to disk0, values between 5-40 to disk1, and values
that are greater than 40 to disk2. It has some advantages, like it involves placing shuffles
containing attribute values that fall within a certain range on the disk. See figure 1: Range
partitioning given below:
Round-robin partitioning –
In Round Robin partitioning, the relations are studied in any order. The ith tuple is sent
to the disk number(i % n). So, disks take turns receiving new rows of data. This technique
ensures the even distribution of tuples across disks and is ideally suitable for applications
that wish to read the entire relation sequentially for each query.
Schema partitioning –
In schema partitioning, different tables within a database are placed on different disks.
7
EMERGING TECHNOLOGIES IN DATA PROCESSING
figure – 2
2. Intra-query parallelism :
Intra-query parallelism refers to the execution of a single query in a parallel process
on different CPUs using a shared-nothing paralleling architecture technique.
This uses two types of approaches:
First approach –
In this approach, each CPU can execute the duplicate task against some data portion.
Second approach –
In this approach, the task can be divided into different sectors with each CPU executing a
distinct subtask.
3. Inter-query parallelism :
In Inter-query parallelism, there is an execution of multiple transactions by each CPU.
It is called parallel transaction processing.
DBMS uses transaction dispatching to carry inter query parallelism.
We can also use some different methods, like efficient lock management. In this
method, each query is run sequentially, which leads to slowing down the running of
long queries.
In such cases, DBMS must understand the locks held by different transactions running
on different processes.
Inter query parallelism on shared disk architecture performs best when transactions
that execute in parallel do not accept the same data.
Also, it is the easiest form of parallelism in DBMS, and there is an increased
transaction throughput.
4. Intra-operation parallelism :
Intra-operation parallelism is a sort of parallelism in which we parallelize the
execution of each individual operation of a task like sorting, joins, projections, and so
on.
The level of parallelism is very high in intra-operation parallelism. This type of
parallelism is natural in database systems.
Let’s take an SQL query example:
SELECT * FROM Vehicles ORDER BY Model_Number;
In the above query, the relational operation is sorting and since a relation can have a large
number of records in it, the operation can be performed on different subsets of the relation in
multiple processors, which reduces the time required to sort the data.
5. Inter-operation parallelism :
When different operations in a query expression are executed in parallel, then it is called
inter-operation parallelism. They are of two types –
Pipelined parallelism –
In pipeline parallelism, the output row of one operation is consumed by the second
operation even before the first operation has produced the entire set of rows in its output.
Also, it is possible to run these two operations simultaneously on different CPUs, so that
one operation consumes tuples in parallel with another operation, reducing them. It is
useful for the small number of CPUs and avoids writing of intermediate results to disk.
8
EMERGING TECHNOLOGIES IN DATA PROCESSING
Independent parallelism –
In this parallelism, the operations in query expressions that are not dependent on each
other can be executed in parallel. This parallelism is very useful in the case of the lower
degree of parallelism.
A parallel DBMS is a DBMS that runs across multiple processors or CPUs and is mainly
designed to execute query operations in parallel, wherever possible. The parallel DBMS link
a number of smaller machines to achieve the same throughput as expected from a single large
machine.
In Parallel Databases, mainly there are three architectural designs for parallel DBMS. They
are as follows:
1. Shared Memory Architecture
2. Shared Disk Architecture
3. Shared Nothing Architecture
Let’s discuss them one by one:
1. Shared Memory Architecture-
In Shared Memory Architecture, there are multiple CPUs that are attached to an
interconnection network.
They are able to share a single or global main memory and common disk arrays. It is
to be noted that, In this architecture, a single copy of a multi-threaded operating
system and multithreaded DBMS can support these multiple CPUs.
Also, the shared memory is a solid coupled architecture in which multiple CPUs
share their memory.
It is also known as Symmetric multiprocessing (SMP).
This architecture has a very wide range which starts from personal workstations that
support a few microprocessors in parallel via RISC.
Advantages :
1. It has high-speed data access for a limited number of processors.
2. The communication is efficient.
Disadvantages :
1. It cannot use beyond 80 or 100 CPUs in parallel.
9
EMERGING TECHNOLOGIES IN DATA PROCESSING
2. The bus or the interconnection network gets block due to the increment of the large
number of CPUs.
2. Shared Disk Architectures :
In Shared Disk Architecture, various CPUs are attached to an interconnection
network.
In this, each CPU has its own memory and all of them have access to the same disk.
Also, note that here the memory is not shared among CPUs therefore each node has
its own copy of the operating system and DBMS.
Shared disk architecture is a loosely coupled architecture optimized for applications
that are inherently centralized. They are also known as clusters.
Advantages :
1. The interconnection network is no longer a bottleneck each CPU has its own memory.
2. Load-balancing is easier in shared disk architecture.
3. There is better fault tolerance.
Disadvantages :
1. If the number of CPUs increases, the problems of interference and memory contentions
also increase.
2. There’s also exists a scalability problem.
3. Shared Nothing Architecture :
Shared Nothing Architecture is multiple processor architecture in which each
processor has its own memory and disk storage.
In this, multiple CPUs are attached to an interconnection network through a node.
Also, note that no two CPUs can access the same disk area.
In this architecture, no sharing of memory or disk resources is done.
It is also known as Massively parallel processing (MPP).
10
EMERGING TECHNOLOGIES IN DATA PROCESSING
Advantages :
1. It has better scalability as no sharing of resources is done
2. Multiple CPUs can be added
Disadvantages:
1. The cost of communications is higher as it involves sending of data and software
interaction at both ends
2. The cost of non-local disk access is higher than the cost of shared disk architectures.
11
EMERGING TECHNOLOGIES IN DATA PROCESSING
Architectural Models
12
EMERGING TECHNOLOGIES IN DATA PROCESSING
13
EMERGING TECHNOLOGIES IN DATA PROCESSING
Design Alternatives
The distribution design alternatives for the tables in a DDBMS are as follows −
14
EMERGING TECHNOLOGIES IN DATA PROCESSING
different sites is low. If an appropriate distribution strategy is adopted, then this design
alternative helps to reduce the communication cost during data processing.
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since,
each site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost
during update operations. Hence, this is suitable for systems where a large number of queries
is required to be handled whereas the number of database updates is low.
Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the
tables is done in accordance to the frequency of access. This takes into consideration the fact
that the frequency of accessing the tables vary considerably from site to site. The number of
copies of the tables (or portions) depends on how frequently the access queries execute and
the site which generate the access queries.
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions,
and each fragment can be stored at different sites. This considers the fact that it seldom
happens that all data stored in a table is required at a given site. Moreover, fragmentation
increases parallelism and provides better disaster recovery. Here, there is only one copy of
each fragment in the system, i.e. no redundant data.
The three fragmentation techniques are −
Vertical fragmentation
Horizontal fragmentation
Hybrid fragmentation
Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are initially
fragmented in any form (horizontal or vertical), and then these fragments are partially
replicated across the different sites according to the frequency of accessing the fragments.
In this section we will talk about data stored at different sites in distributed database
management system.
There are two ways in which data can be stored at different sites. These are,
1. Replication.
2. Fragmentation.
Replication
As the name suggests, the system stores copies of data at different sites. If an entire
database is available on multiple sites, it is a fully redundant database.
The advantage of data replication is that it increases availability of data on different
sites. As the data is available at different sites, queries can be processed parallelly.
15
EMERGING TECHNOLOGIES IN DATA PROCESSING
Fragmentation
In Fragmentation, the relations are fragmented, which means they are split into
smaller parts. Each of the fragments is stored on a different site, where it is required.
In this, the data is not replicated, and no copies are created. Consistency of data is
highly benefitted from Fragmentation.
The prerequisite for fragmentation is to make sure that the fragments can later be
reconstructed into the original relation without losing any data.
Consistency is not a problem here as each site has a different piece of information.
There are two types of fragmentation,
o Horizontal Fragmentation – Splitting by rows.
o Vertical fragmentation – Splitting by columns.
The relation schema is fragmented into group of rows, and each group is then
assigned to one fragment.
Vertical Fragmentation
The relation schema is fragmented into group of columns, called smaller schemas.
These smaller schemas are then assigned to each fragment.
Each fragment must contain a common candidate key to guarantee a lossless join.
Transactions
16
EMERGING TECHNOLOGIES IN DATA PROCESSING
Transaction Operations
Transaction States
A transaction may go through a subset of five states, active, partially committed, committed,
failed and aborted.
Active − The initial state where the transaction enters is the active state. The
transaction remains in this state while it is executing read, write or other operations.
Partially Committed − The transaction enters this state after the last statement of the
transaction has been executed.
Committed − The transaction enters this state after successful completion of the
transaction and system checks have issued commit signal.
Failed − The transaction goes from partially committed state or active state to failed
state when it is discovered that normal execution can no longer proceed or system
checks fail.
Aborted − This is the state after the transaction has been rolled back after failure and
the database has been restored to its state that was before the transaction began.
The following state transition diagram depicts the states in the transaction and the low level
transaction operations that causes change in states.
Any transaction must maintain the ACID properties, viz. Atomicity, Consistency, Isolation,
and Durability.
17
EMERGING TECHNOLOGIES IN DATA PROCESSING
Atomicity − This property states that a transaction is an atomic unit of processing, that
is, either it is performed in its entirety or not performed at all. No partial update should
exist.
Consistency − A transaction should take the database from one consistent state to
another consistent state. It should not adversely affect any data item in the database.
Isolation − A transaction should be executed as if it is the only one in the system.
There should not be any interference from the other concurrent transactions that are
simultaneously running.
Durability − If a committed transaction brings about a change, that change should be
durable in the database and not lost in case of any failure.
The transaction manager in a local database system just needs to inform the recovery
manager of their choice to commit a transaction. However, in a distributed system, the
transaction manager should consistently enforce the decision to commit and communicate it
to all the servers in the various sites where the transaction is being conducted. Each site's
processing ends when it reaches the partially committed transaction state, where it waits for
all other transactions to get their partially committed states. Once it receives the signal that all
the sites are prepared, it begins to commit. Either every site commits in a distributed system,
or none of them does.
To guarantee atomicity, the execution's ultimate result must be accepted by every site where
transaction T was executed. T must either commit at every location or abort at every location.
The transaction coordinator of T must carry out a commit protocol in order to guarantee this
property.
o One-phase Commit,
o Two-phase Commit, and
o Three-phase Commit.
1. One-Phase Commit
The distributed one-phase commit is the most straightforward commit protocol. Consider the
scenario where the transaction is being carried out at a controlling site and several slave sites.
These are the steps followed in the one-phase distributed commit protocol:
o Each slave sends a "DONE" notification to the controlling site once it has
successfully finished its transaction locally.
o The slaves await the commanding site's "Commit" or "Abort" message. This period of
waiting is known as the window of vulnerability.
18
EMERGING TECHNOLOGIES IN DATA PROCESSING
o The controlling site decides whether to commit or abort after receiving the "DONE"
message from each slave. The commit point is where this happens. It then broadcasts
this message to every slave.
o An acknowledgement message is sent to the controlling site by the slave once it
commits or aborts in response to this message.
2. Two-Phase Commit
The two-phase commit protocol (2PC), which is explained in this Section, is one of the most
straightforward and popular commit methods. The vulnerability of one-phase commit
methods is decreased by distributed two-phase commit. The following actions are taken in the
two phases:
When T has completed, or when all the sites where T has run notify C i that T has
accomplished, Ci initiates the 2PC protocol.
o Ci inserts the record <prepare T> into the log and forces it to be stored in a stable
location. After that, it notifies every site where T was executed to prepare T.
o When such a communication is received, the transaction manager at that location
decides whether it is willing to commit its share of T.
o In response, it sends Ci an abort T message and adds a record with the text <no T> to
the log if the response is negative. If the response is affirmative, it adds a record
labelled <ready T> to the log and forces the log (along with every record labelled "T"
in the log) into stable storage.
o A ready T message is then returned to Ci by the transaction manager.
19
EMERGING TECHNOLOGIES IN DATA PROCESSING
o When all of the sites have responded to the "prepare T" message, or after a
predetermined amount of time has passed since the "prepare< T" message was
delivered, Ci can decide whether the transaction T can be committed or aborted.
o If Ci got a ready T message from every site involved in the transaction, transaction T
can be committed. If not, then transaction T must be abandoned. Depending on the
outcome, the log is either forced into stable storage, or a record is added to the log.
o The transaction's outcome has already been decided at this time. The coordinator then
sends one of two messages to all involved sites: a "commit T" message or an "abort
T" message. The message is entered into the log locally when a site receives it.
3. Three-Phase Commit
The two-phase commit protocol can be extended to overcome the blocking issue using the
three-phase commit (3PC) protocol, under particular assumptions.
It is specifically anticipated that there will be no network partitions and that there won't be
any more than k sites that fail, where k is a preset number. Under these presumptions, the
protocol prevents blocking by adding a third phase that involves several sites in the commit
decision.
The coordinator initially makes certain that at least k other sites are aware that it planned to
commit the transaction before immediately documenting the decision to commit in its
persistent storage. In the event that the coordinator fails, the surviving sites initially choose a
replacement.
The protocol's status is checked by the new coordinator from the remaining locations; I If the
coordinator had made the decision to commit, at minimum one of the other K sites it had
notified would be online and would make sure the commit decision was upheld. If some site
understood that the previous coordinator intended to complete the transaction, the new
coordinator starts over with the third phase of the procedure. Otherwise, the transaction is
aborted by the new coordinator.
20
EMERGING TECHNOLOGIES IN DATA PROCESSING
The 3PC protocol has the advantage of not blocking until k sites fail, but it also has the
disadvantage that a network partitioning might be mistaken for more than k sites failing,
which would result in blocking. In addition, the protocol must be properly developed to
prevent inconsistent results, such as transactions being committed in one partition but aborted
in another, in the event of network partitioning (or more than k sites failing). The 3PC
protocol is not frequently utilized due of its overhead.
The three phases of the distributed three-phase commit protocol are as follows:
Phase 2 of 2PC is divided into Phase Two and Phase Three in 3PC.
Phase Two -
Phase 2 involves the coordinator making a choice similar to the 2PC (known as the pre-
commit decision) and documenting it in several (at least K).
Phase Three -
Phase 3 involves the coordinator notifying all participating sites whether to commit or abort.
Under 3PC, despite the coordinator's failure, a choice can be committed using knowledge of
pre-commit decisions.
21
EMERGING TECHNOLOGIES IN DATA PROCESSING
Concurrency Control
Concurrency Control is the working concept that is required for controlling and managing the
concurrent execution of database operations and thus avoiding the inconsistencies in the
database. Thus, for maintaining the concurrency of the database, we have the concurrency
control protocols.
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by
the transaction.
o It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same
data simultaneously.
o The Timestamp Ordering Protocol is used to order the transactions based on their
Timestamps. The order of transaction is nothing but the ascending order of the
transaction creation.
22
EMERGING TECHNOLOGIES IN DATA PROCESSING
o The priority of the older transaction is higher that's why it executes first. To determine
the timestamp of the transaction, this protocol uses system time or logical counter.
o The lock-based protocol is used to manage the order between conflicting pairs among
transactions at the execution time. But Timestamp based protocols start working as
soon as a transaction is created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has
entered the system at 007 times and transaction T2 has entered the system at 009
times. T1 has the higher priority, so it executes first as it is entered the system first.
o The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write'
operation on a data.
Validation phase is also known as optimistic concurrency control technique. In the validation
based protocol, the transaction is executed in the following three phases:
1. Read phase: In this phase, the transaction T is read and executed. It is used to read
the value of various data items and stores them in temporary local variables. It can
perform all the write operations on temporary variables without an update to the
actual database.
2. Validation phase: In this phase, the temporary variable value will be validated
against the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary
results are written to the database or system otherwise the transaction is rolled back.
Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved
are:
23
EMERGING TECHNOLOGIES IN DATA PROCESSING
Initially, the given user queries get translated in high-level database languages such as
SQL. It gets translated into expressions that can be further used at the physical level
of the file system.
After this, the actual evaluation of the queries and a variety of query -optimizing
transformations and takes place.
Thus before processing a query, a computer system needs to translate the query into a
human-readable and understandable language.
Consequently, SQL or Structured Query Language is the best suitable choice for
humans.
But, it is not perfectly suitable for the internal representation of the query to the
system.
Relational algebra is well suited for the internal representation of a query. The
translation process in query processing is similar to the parser of a query.
When a user executes any query, for generating the internal form of the query, the
parser in the system checks the syntax of the query, verifies the name of the relation
in the database, the tuple, and finally the required attribute value. The parser creates a
tree of the query, known as 'parse-tree.'
Further, translate it into the form of relational algebra. With this, it evenly replaces all
the use of the views when used in the query.
Thus, we can understand the working of a query processing in the below-described diagram:
Suppose a user executes a query. As we have learned that there are various methods of
extracting the data from the database. In SQL, a user wants to fetch the records of the
employees whose salary is greater than or equal to 10000. For doing this, the following query
is undertaken:
Thus, to make the system understand the user query, it needs to be translated in the form of
relational algebra. We can bring this query in the relational algebra form as:
24
EMERGING TECHNOLOGIES IN DATA PROCESSING
After translating the given query, we can execute each relational algebra operation by using
different algorithms. So, in this way, a query processing begins its working.
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and
evaluating each operation. Thus, after translating the user query, the system executes a query
evaluation plan.
o In order to fully evaluate a query, the system needs to construct a query evaluation
plan.
o The annotations in the evaluation plan may refer to the algorithms to be used for the
particular index or the specific operations.
o Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the operation.
o Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query
execution plan.
o A query execution engine is responsible for generating the output of the given query.
It takes the query execution plan, executes it, and finally makes the output for the user
query.
Optimization
o The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not to
write their query efficiently.
25