0% found this document useful (0 votes)
44 views

Database Management Systems

The document discusses physical database design and tuning. It covers several topics: 1. Physical database design involves converting a data model into the physical structure of a database, considering factors like query analysis, expected frequency of queries, time constraints, and update frequencies. 2. Physical design decisions include whether to index attributes, which attributes to index, whether to use clustered indexes, and whether to use hashing. 3. Denormalization is discussed as a way to expedite queries by accepting some data redundancy to reduce costly joins. 4. Database tuning aims to adjust parameters and design choices to improve performance, and is a continuation of the physical design process.

Uploaded by

Satyam Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Database Management Systems

The document discusses physical database design and tuning. It covers several topics: 1. Physical database design involves converting a data model into the physical structure of a database, considering factors like query analysis, expected frequency of queries, time constraints, and update frequencies. 2. Physical design decisions include whether to index attributes, which attributes to index, whether to use clustered indexes, and whether to use hashing. 3. Denormalization is discussed as a way to expedite queries by accepting some data redundancy to reduce costly joins. 4. Database tuning aims to adjust parameters and design choices to improve performance, and is a continuation of the physical design process.

Uploaded by

Satyam Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Database

Management Systems
(CAMI13)
Assessment - 4

BTECH - IV Semester

Submitted by Satyam – 112118047

1|Page
Contents
_____________________________________
Sl. No. Topic Page No.
Database Design in Relational 3
1. Databases

Aspects Influencing Physical 3-4


2. Database design

Physical Database Design 5-6


3. Decisions

4. Creating an Index 6-7


Denormalization as a design 7-8
5. Decision for expediting queries

An Overview of Database 8 - 12
6. Tuning in Relational systems

a. Tuning Indexes 12 - 13
b. Tuning the Database Design 13 - 15
c. Tuning Queries 15 - 16
d. More Guidelines for Query 16 - 18
Tuning

7. Conclusion 18 - 19

2|Page
8. References 19

Physi Database Design in Relational


Databases
cal
Datab
ase
Desig
n and
Tunin
g
_____________________________________
The physical design of database optimizes performance and efficiency while
making sure the integrity of data is maintained by removing unnecessary data
redundancies. Building the physical design requires several factors to be
considered which necessitate periodic refinements.

3|Page
Physical database design is a procedure that includes converting a
data model into the physical data structure of a database management
system (DBMS). Physical Design is a multi-step process, in which a
business model is expanded into a fully attributed model (FAM), followed
by its transformation into a physical design model.

Aspects Influencing Physical Database Design:


_____________________________________
1. Analysis of database queries and transactions – For making a query,
several properties are recommended to be used including the files to be
accessed, the attributes with specified selection criteria, the attributes
whose values are to be obtained and/or the attributes on which any join
condition(s) are specified to link the multiple tables or objects.
Similarly for updating a transaction or operation, a variety of information
is needed such as the files to be updated, the operation type (insertion,
deletion or update) on each file, the attributes with selected criteria for an
update or delete to be specified and the attributes whose values are to be
changed by any operation.

2. Analysis of the expected frequency of invoking queries and


transactions – Using the expected frequency and attribute information
collected on each query and transaction, a cumulative list of expected
frequency of use for all the queries and transactions can be compiled. It is
expressed as the expected frequency of implementing each attribute in
each file as a selection attribute or join attribute, over all the queries and
transactions.
The 80–20 rule is used for large volumes of processing which states that
approximately 80 percent of the processing is accounted for by only
20 percent of the queries and transactions. Hence, it is enough to
determine only 20 percent of the invocation rates on all the queries and
transactions.

3. Analysis of time constraints of queries and transactions - Attributes


for access paths receive higher priorities because of performance
constraints. Hence, selection attributes used by queries and transactions

4|Page
with time constraints become higher-priority candidates for primary
access structure.

4. Analysis of the expected


frequencies of update operations –
A file undergoing frequent update
must have a minimum number of
access paths specified.

5. Analysis of the uniqueness


constraints on attributes - Access
paths must be specified for all
candidate key attributes — or set of
attributes — that are either the
primary key or constrained to be
unique
Figure 1: Physical and Logical
Access Paths

Physical Database Design


Decisions
_____________________________________
Design Decisions about Indexing: Indexing or hashing schemes enhance
the performance of queries as it facilitates in fastening the selection and
joining processes. However, indexing takes additional disk space and slows
down insert, update and delete operations because after each operation the
indexes must also be updated/changed.

Following questions need to be answered to decide the physical design for


indexing:

1. Whether to index an attribute? The attribute selected for indexing upon


must be unique i.e. a key or the attribute must have a use-case in a query
for specifying a selection or join condition. Indexing provides the
advantage of carrying out some operations without the need of
accessing the actual data file.

5|Page
2. What attribute(s) to index on? Single or several attributes (called
composite index) may be used for indexing on. A composite index is
necessarily required in case multiple attributes from one relation are
involved together in more than one queries.

3. Whether to set up a clustered index? A primary index is created on a


key attribute and a clustering index on a non-key attribute. A file is
physically arranged on a primary or clustering index hence, a maximum
of one such index (specified by the keyword CLUSTER) must exist in a
table. Clustering is more efficient when the records themselves need
to be retrieved and not for index search only. A clustering index may
be composite too if range retrieval helps in creating a report.

Figure 2: Pictorial representation of Non-Clustering and Clustering indexes

4. Whether to use a hash index over a tree index? B+-trees for indexing
are prevalent in RDMSs as they support equality and range queries
on the search key attribute. Hash indexes are used in ISAM and other
systems as they perform better with equality conditions especially when
finding a matching record(s) using joins.

6|Page
Figure 3: A Diagram of B+ tree

5. Whether to use dynamic hashing for the file? Dynamic hashing is most
suitable for highly volatile/changing files that frequently grow or shrink
in size.

Creating an Index:
_____________________________________
The general syntax common to several RDBMSs is:

CREATE [UNIQUE] INDEX <index name>

ON <table name> (<column name> [ <order>], <column name> [<order>])

[CLUSTER];

The keywords UNIQUE and CLUSTER enclosed within square brackets [ ]


denote their optional nature. <order> of the indexed attribute can take values
of either ASC (ascending) or DSC (descending). The keyword CLUSTER is
used when the index being created must sort the records on the indexing
attribute hence creating some variation in the respective primary or
clustering index.

7|Page
Denormalization as a Design Decision for
Expediting Queries:
_____________________________________

Denormalization is a database optimization technique that is applied after


doing normalization. In a traditional normalized database, data is stored in
separate logical tables in an attempt to minimize redundancy of data such
that only one copy of each piece of data exists in the entire database.

The drawback is that in large tables, an unnecessarily long time may be spent
in implementing joins on those tables. Therefore, under denormalization, a
trade-off is accepted between some redundancy and a little extra effort
to update the database for maintaining consistency of redundant
attributes versus the efficiency advantages of incorporating as few costly
joins as possible in a relational database.

Figure 4: A possible denormalization situation: a many-to-many


relationship with non-key attributes

In this way, denormalization enables faster execution of frequently made


queries and transactions by converting the logical database design (which
may be in Boyce-Codd Normal Forms or 4NF) in a weaker normal form
such as 2NF or 1NF.

8|Page
Figure 5: Various Normal Forms and their Distinctions

Another form of denormalization involves storing extra tables to maintain


original functional dependencies that are lost during Boyce-Codd Normal
Form decomposition. An alternative strategy also exists in which two
different tables may be created as updatable base tables and instead of
creating a join of the first two tables, a view (virtual table) on the base
tables can be created that allows only queries to be carried out. This way
join operations will not be avoided. Only specifying the joins will not be
required anymore. However, the joins can be completely avoided once the
view table is materialized.

An Overview of Database Tuning in Relational


Systems
_____________________________________
Tuning the performance of a database system involves adjusting various
parameters and design choices to improve the overall performance of a
specific application. Once the selected database is implemented in an
application, the processing of queries and transactions or setting up of views
may face problems that might not have been thought of during the initial
physical design evaluation stage. A thin demarcation exists between the
physical design and tuning stage. Database tuning is governed by the same

9|Page
design parameters that were discussed earlier and it is nothing but a
continuous optimization or refinement of the overall physical design.

Figure 6: Knowledge required about the various levels for efficient Tuning

The aspects of physical design are revisited and the actual statistics about
usage patterns (frequency of invoking certain queries and transactions,
frequency of updating operations and time constraints) are gathered while
implementation of the application making use of the database. Hence, the
process of continuously revising or adjusting the physical database
design by monitoring resource utilization and internal DBMS processing
to locate bottlenecks such as contention for the same data or devices is
called database tuning.

Tuning measures applied to a Database System aim at achieving the


following:

 Great benefits with little effort (if applied correctly).


 Storage cost of additional disk memory.
 Strong support within all available DBMS (index structures, index
usage controlled by optimizer)
 Cost efficiency for locking overhead, lock conflicts and index updates.

10 | P a g e
Figure 7: Database Tuning as a Process

The tuning process aims at:

 Faster execution of the application.


 Reduced response time of queries and transactions.
 Improved and smoother overall data passage to and from the server
or database.

The collected input parameters to the tuning process heavily rely upon data
statistics gathered from various sources such as:

(a) Internally collected in DBMSs:


1. Size of individual tables
2. Total number of distinct/unique values in a column
3. The frequency of a particular query or transaction
submitted/executed.
4. The average time duration required for different phases of query
and transaction processing.

(b) Obtained from monitoring:


1. Storage statistics – Data about storage allocation into tablespaces,
index-spaces, and buffer pools.

11 | P a g e
2. I/O and device performance statistics – Total amount of
read/write activity on disk extents and disk hot spots.
3. Query/transaction processing statistics - Execution and
optimization times of queries and transactions.
4. Locking/logging related statistics - Rates of different types of
locks issued, transaction throughput rates and log records activity.
5. Index statistics - Number of levels in an index or number of non-
contiguous leaf pages etc.

Figure 8: The Four Levels of Database Tuning

Below are a few suggestions that must be considered while tuning a


database:

1. To prevent excessive lock contention (a situation that occurs


when a process tries to access a lock held by another process),
thereby increasing concurrency among transactions by:
(a) Avoiding transactions that include user interaction.
(b) By making the available locks more fine-grained so that it
becomes less likely for other process to request a lock held by
the other.
(c) Avoiding situations in which many processes are trying to
perform updates or inserts on the same data page.
(d) Minimizing the fill-factor when indexing in small tables, that
are frequently accessed, to help reduce the chance of random
updates requiring the same page.

12 | P a g e
2. To minimize the overhead of logging and unnecessary dumping of
data.
3. To optimize the buffer size and scheduling of processes.
4. To allocate resources such as disks, RAM, and processes for most
efficient utilization.

Such suggestions can easily be incorporated by setting appropriate physical


DBMS parameters, changing configurations of devices or changing
operating system parameters.

Tuning Indexes
_____________________________________
After analysing the data statistics conveying insight into the usage pattern of
the application and its database, one may come across certain bottlenecks
slowing down the data throughput for queries, transactions or views. Some
of these bottlenecks may be easily dissolved by revisiting the choice of
indexes to improve upon the performance.

Some of the most common roadblocks that can be removed by tuning


indexes are:

1. Some queries may


take too long to run
because of the lack of
an index.
2. Some indexes may
not be getting utilized
at all.
3. Some indexes may
undergo excessive
updating because of a
frequently changing
underlying attribute.

Figure 9: Architecture of
Index Selection Tool

13 | P a g e
Such possible problems may be traced by an in-built command or facility of
the Database Management System used. These systems maintain a log or a
record book to note down the sequence of queries executed, operations
performed in a particular order and the indexes used. The cause of a
bottleneck may be diagnosed by going through these records and an
appropriate index may be created, modified or deleted.

The ultimate aim of tuning is to dynamically evaluate the requirements,


which may change seasonally or during different times of the month or
week, and to restructure the indexes and file organizations to achieve
the best possible performance.

Following are few of the options that can be implemented to tune indexes:

1. Drop or/and build new indexes


2. Change a non-clustered index to a clustered index (and vice versa)
3. Rebuilding the index

Tuning the Database Design


_____________________________________
For dynamically changing processing requirements, changes must be made
to the conceptual schema if necessary, as the design of database is as much
dependent on the processing requirements as on its data requirements. These
changes must be further reflected into the logical schema and physical
design.

Here are the possible changes that can be made to the Database Design:

1. A table may be split into two with a duplicated copy of the key
attribute to reduce the number of total number of attributes in
each when queries about any two attributes occur with high
frequencies. In that case, separation of tables may result in increased
efficiency. Mathematically, a relation of the form R(K, A, B, C, D, …)
that is in BCNF can be stored into multiple tables, for example R1(K,
A, B), R2(K, C, D, ), R3(K, ...)—that are also in BCNF by replicating
the key K in each table. This process is referred to as vertical
partitioning as it splits a table vertically into multiple tables.

14 | P a g e
2. Horizontal partitioning may be used as well, in case of data
comprising a large number of records. True to its name as the vertical
partitioning, horizontal partitioning splits the table into horizontal
slices and stores them as different tables with the same set of
columns (attributes) but containing a distinct set of tuples. In case
a query or transaction needs to be made on all the records, it will have
to be run against all the results to obtain a combined result.

Figure 10: Pictorial representation of Vertical and Horizontal Partitioning

3. Using the method of denormalization, attribute(s) from one table may


be repeated in yet another table even though this produces data
redundancy and potential anomalies. It may present a trade-off
between redundancy and a little extra effort to update the database for
maintaining data consistency versus the efficiency advantage of
including as fewer joins as possible in a database.

15 | P a g e
4. The normalization level from Boyce-Codd Normal Form to 3NF, 2NF
or 1NF may be reduced by joining (denormalizing) the existing tables
because certain attributes from two or more tables are frequently
accessed together.

5. Normalized designs may be changed according to requirement or


preference over data consistency or concurrency, for the given set of
tables. There exist several alternative design choices such as 3NF or
BCNF.

Tuning Queries
_____________________________________
The need for tuning queries may arise due to a number of situations. For
example, if too many disk accesses issues crop up or the plan of query
(log maintained by the Database management System) reveals that
relevant or intended indexes are not being used while processing data.

On analysing the record of query execution, it may be observed that some


query runs slower than expected. The index may be checked if it requires
restructuring or the statistics on which the inferences of invocation frequency
and such are dependent may be too old to be reliable. In any case, re-
checking the plan being used is a must. Accordingly, the choice of indexes
may be updated, or the query/view rewritten.

Therefore, to draw it out as a set of instructions, the following steps may be


followed:

1. Pick out the queries with minimal performance creating bottleneck in


the data throughput of transactions and queries (either the critical one
or those which do not satisfy the users)
2. Scrutinise the plans of physical query by observing:
- physical operators for tables.
- sorting of intermediate results.
- physical operators used in the query plans or as logical operators.

Sometimes, the DBMS may not be executing the plan originally thought of.
Common areas of weakness include:

16 | P a g e
o Selections involving null values.
o Selections involving arithmetic or string expressions.
o Selections involving OR conditions.
o Lack of evaluation features like index-only strategies, certain join
methods or poor size estimation.

Below listed are some of the typical situations in which tuning query proves
to be more effective than other methods:

1. In situations involving use of correlated queries, temporaries are


essential.
2. If multiple options for join condition are possible, it is advised to
use a clustering index and to avoid expressions containing string
comparisons.
3. The order of tables in the FROM clause may affect the join
processing. Hence the order may need to be rectified.
4. Some query optimizers fail to perform well on nested queries as
compared to their equivalent un-nested counterparts.
5. Many applications are based on views that define the data of
interest more significantly. As a result, these views may become an
overkill and need to be modified to simplified according to the
need.

More Guidelines for Query Tuning


_____________________________________
 A query with several selection conditions connected via OR may not be
prompting the query optimizer to use any index. Such a query may be
split up and expressed as a union of queries, each with a condition on an
attribute that causes an index to be used. SELECT with OR condition is
rewritten as one predicate IN, or with UNION of SELECT.

 SELECT with AND of predicates is rewritten as a condition with


BETWEEN.

17 | P a g e
Figure 11: Use of BETWEEN operator is recommended. It allows the optimizer
to recognize both parts of the range selection.

 The use of DISTINCT and ORDER BY from SELECT or SUBSELECTS


must be minimized as much as possible. If duplicates are acceptable or if
answer contains a key, DISTINCT and ORDER BY keywords may be
done away with.

 The definition of temporary views with grouping and aggregations must


be avoided. Similarly, aggregation functions in SUBQUERIES are to be
avoided.

 NOT condition may be transformed into a positive expression.

 If an equality join is set up between two tables, the range predicate on the
joining attribute set up in one table may be repeated for the other table.

 WHERE conditions may be rewritten to utilize the indexes on multiple


columns.

 SUBQUERIES are rewritten as join. Similarly, embedded SELECT


blocks may be replaced by joins.

 The use of HAVING or GROUP BY keywords must be minimized.

18 | P a g e
Figure 12: Optimized query block eliminating HAVING and GROUP BY

 DBMS use of index must be considered when writing arithmetic


expressions. For example, E.age=2*D.age will benefit from index on
E.age but might not benefit from index on D.age.

Conclusion
_____________________________________
The process of Database design involves several tasks such as
requirements analysis, conceptual design, schema refinement, physical
design, and tuning. Generally, one needs to go back and forth between
these tasks to refine a database design, and it may be observed that decisions
in one task can influence the choices in another task. Developing a good
design is heavily dependent on understanding the nature of the workload for
the application, and the performance goals for which clearly defining the
important queries and updates or involved attributes/relations ais extremely
essential.

The conceptual schema should be refined by considering performance


criteria and workload: 3NF or lower normal form over BCNF may be
chosen among alternative decompositions based upon the workload.
Denormalization, a horizontal decomposition of a relation or further
decomposition of a BCNF relation may be carried out.

Over a period of time, indexes must be fine-tuned (dropped, created or re-


built) for better performance. The plan used by the system must be
determined to adjust the choice of indexes appropriately. It is not
necessary that even after all the tuning and further adjustments, the system
will have a perfect design and infrastructure.

19 | P a g e
In query expressions, null values, arithmetic conditions, string
expressions, the use of ORs, etc. may confuse an optimizer. That may
require the query/view to be rewritten while making sure components such
as nested queries, temporary relations, complex conditions, and operations
like DISTINCT and GROUP BY are avoided.

References
_____________________________________
Books:
 Fundamentals of Database Systems, 5th Edition, 2007, Ramez Elmasri
and Shamkat B. Navathe, Chapter-16: Practical Database Design and
Tuning
 Database System Concepts, 6th Edition, Abraham Silberschatz (Yale
University), Henry F. Korth (Lehigh University), S. Sudarshan (IIT,
Bombay), Part Two: Database Design
 An Introduction to Database Systems, 8th Edition, C. J. Date, Part III:
Database Design.

Websites/Hyperlinks:
 Database Management Systems by R. Ramakrishnan, Database Tuning,
Module 5, lectures 6 and 7
http://pages.cs.wisc.edu/~dbbook/openAccess/firstEdition/slides/pdfslide
s/mod5l6-7.pdf
 https://www.slideshare.net/hhhchamp/physical-database-design-
performance
 https://www.studocu.com/en-au/document/charles-sturt-
university/database-management-systems/lecture-notes/chapter-20-
physical-database-design-and-tuning/4448391/view
 http://pages.di.unipi.it/ghelli/bd2/11.physicaldesign.pdf
 http://www.csbio.unc.edu/mcmillan/Media/Comp521F10Lecture20.pdf

20 | P a g e

You might also like