Complete Download SQL Performance Explained Markus Winand PDF All Chapters
Complete Download SQL Performance Explained Markus Winand PDF All Chapters
https://ebookultra.com/download/building-performance-dashboards-and-
balanced-scorecards-with-sql-server-reporting-services-1st-edition-
knight/
https://ebookultra.com/download/modern-fortran-in-practice-markus/
https://ebookultra.com/download/sql-server-t-sql-recipes-4th-edition-
jason-brimhall/
https://ebookultra.com/download/microsoft-sql-server-2012-t-sql-1st-
edition-tom-coffing/
European Democracies 9th Edition Markus M.L. Crepaz
https://ebookultra.com/download/european-democracies-9th-edition-
markus-m-l-crepaz/
https://ebookultra.com/download/modernizing-enterprise-java-1st-
edition-markus-eisele/
https://ebookultra.com/download/essential-sql-on-sql-server-2008-1st-
edition-dr-sikha-bagui/
https://ebookultra.com/download/myths-legends-explained-neil-philip/
https://ebookultra.com/download/joe-celko-s-sql-for-smarties-fourth-
edition-advanced-sql-programming-joe-celko/
SQL Performance Explained Markus Winand Digital
Instant Download
Author(s): Markus Winand
ISBN(s): 9783950307825, 3950307826
Edition: Paperback
File Details: PDF, 1.17 MB
Year: 2012
Language: english
MA CO
JOR VER
SQ S A
L D LL
ATA
BA
SQL SES
PERFORMANCE
EXPLAINED
ENGLISH EDITION
MARKUS WINAND
License Agreement
This ebook is licensed for your personal enjoyment only. This ebook may
not be re-sold or given away to other people. If you would like to share
this book with another person, please purchase an additional copy for each
person. If you’re reading this book and did not purchase it, or it was not
purchased for your use only, then please return to
http://SQL-Performance-Explained.com/
and purchase your own copy. Thank you for respecting the hard work of
the author.
Publisher:
Markus Winand
Maderspergerstasse 1-3/9/11
1160 Wien
AUSTRIA
<[email protected]>
While every precaution has been taken in the preparation of this book, the
publisher and author assume no responsibility for errors and omissions, or
for damages resulting from the use of the information contained herein.
The book solely reflects the author’s views. The database vendors men-
tioned have neither supported the work financially nor verified the content.
Cover design:
tomasio.design — Mag. Thomas Weninger — Wien — Austria
Cover photo:
Brian Arnold — Turriff — UK
Copy editor:
Nathan Ingvalson — Graz — Austria
2014-08-26
SQL Performance Explained
Markus Winand
Vienna, Austria
Contents
Preface ............................................................................................ vi
iv
SQL Performance Explained
v
Preface
SELECT date_of_birth
FROM employees
WHERE last_name = 'WINAND'
The SQL query reads like an English sentence that explains the requested
data. Writing SQL statements generally does not require any knowledge
about inner workings of the database or the storage system (such as disks,
files, etc.). There is no need to tell the database which files to open or how
to find the requested rows. Many developers have years of SQL experience
yet they know very little about the processing that happens in the database.
It turns out that the only thing developers need to learn is how to index.
Database indexing is, in fact, a development task. That is because the
most important information for proper indexing is not the storage system
configuration or the hardware setup. The most important information for
indexing is how the application queries the data. This knowledge —about
vi
Preface: Developers Need to Index
This book covers everything developers need to know about indexes — and
nothing more. To be more precise, the book covers the most important
index type only: the B-tree index.
The B-tree index works almost identically in many databases. The book only
uses the terminology of the Oracle® database, but the principles apply to
other databases as well. Side notes provide relevant information for MySQL,
PostgreSQL and SQL Server®.
This chapter makes up the main body of the book. Once you learn to
use these techniques, you will write much faster SQL.
vii
Preface: Developers Need to Index
viii
Chapter 1
Anatomy of an Index
“An index makes the query fast” is the most basic explanation of an index I
have ever seen. Although it describes the most important aspect of an index
very well, it is —unfortunately—not sufficient for this book. This chapter
describes the index structure in a less superficial way but doesn’t dive too
deeply into details. It provides just enough insight for one to understand
the SQL performance aspects discussed throughout the book.
Clustered Indexes
SQL Server and MySQL (using InnoDB) take a broader view of what
“index” means. They refer to tables that consist of the index structure
only as clustered indexes. These tables are called Index-Organized
Tables (IOT) in the Oracle database.
Chapter 5, “Clustering Data”, describes them in more detail and
explains their advantages and disadvantages.
1
Chapter 1: Anatomy of an Index
The database combines two data structures to meet the challenge: a doubly
linked list and a search tree. These two structures explain most of the
database’s performance characteristics.
The logical order is established via a doubly linked list. Every node has links
to two neighboring entries, very much like a chain. New nodes are inserted
between two existing nodes by updating their links to refer to the new
node. The physical location of the new node doesn’t matter because the
doubly linked list maintains the logical order.
The data structure is called a doubly linked list because each node refers
to the preceding and the following node. It enables the database to read
the index forwards or backwards as needed. It is thus possible to insert
new entries without moving large amounts of data—it just needs to change
some pointers.
Doubly linked lists are also used for collections (containers) in many
programming languages.
2
The Index Leaf Nodes
Databases use doubly linked lists to connect the so-called index leaf nodes.
Each leaf node is stored in a database block or page; that is, the database’s
smallest storage unit. All index blocks are of the same size —typically a few
kilobytes. The database uses the space in each block to the extent possible
and stores as many index entries as possible in each block. That means
that the index order is maintained on two different levels: the index entries
within each leaf node, and the leaf nodes among each other using a doubly
linked list.
lu 1
lu 2
lu 3
4
mn
co mn
co mn
co mn
mn
D
lu
WI
lu
co
RO
co
11 3C AF A 34 1 2
13 F3 91 A 27 5 9
18 6F B2
A 39 2 5
X 21 7 2
21 2C 50
27 0F 1B A 11 1 6
27 52 55
A 35 8 3
X 27 3 2
34 0D 1E
35 44 53 A 18 3 6
39 24 5D A 13 7 4
Figure 1.1 illustrates the index leaf nodes and their connection to the table
data. Each index entry consists of the indexed columns (the key, column 2)
and refers to the corresponding table row (via ROWID or RID). Unlike the
index, the table data is stored in a heap structure and is not sorted at all.
There is neither a relationship between the rows stored in the same table
block nor is there any connection between the blocks.
3
Chapter 1: Anatomy of an Index
es
Branch Node Leaf Nodes
od
es
e
N
od
od
ch
N
N
40 4A 1B
an
t
af
o
Ro
Le
Br
43 9F 71
46 A2 D2 11 3C AF
13 F3 91
18 6F B2
21 2C 50
18 27 0F 1B
27 27 52 55
39
46 8B 1C 34 0D 1E
35 44 53
39 24 5D
53 A0 A1 40 4A 1B
43 9F 71
53 0D 79 46 A2 D2
46 46 8B 1C
53 A0 A1
53 0D 79
53 39
46
53
57 55 9C F6
57 55 9C F6 83
98
83 57 B1 C1
57 50 29
83 57 B1 C1 67 C4 6B
83 FF 9D
83 AF E9
57 50 29 84 80 64
86 4C 2F
88 06 5B
89 6A 3E
88 90 7D 9A
94 94 36 D4
67 C4 6B 98
95 EA 37
98 5E B2
83 FF 9D 98 D8 4F
83 AF E9
Figure 1.2 shows an example index with 30 entries. The doubly linked list
establishes the logical order between the leaf nodes. The root and branch
nodes support quick searching among the leaf nodes.
The figure highlights a branch node and the leaf nodes it refers to. Each
branch node entry corresponds to the biggest value in the respective leaf
node. That is, 46 in the first leaf node so that the first branch node entry
is also 46. The same is true for the other leaf nodes so that in the end the
4
The Search Tree (B-Tree)
branch node has the values 46, 53, 57 and 83. According to this scheme, a
branch layer is built up until all the leaf nodes are covered by a branch node.
The next layer is built similarly, but on top of the first branch node level.
The procedure repeats until all keys fit into a single node, the root node.
The structure is a balanced search tree because the tree depth is equal at
every position; the distance between root node and leaf nodes is the same
everywhere.
Note
A B-tree is a balanced tree—not a binary tree.
46 8B 1C
53 A0 A1
46 53 0D 79
39
53
83
57
98 55 9C F6
83
57 B1 C1
57 50 29
Figure 1.3 shows an index fragment to illustrate a search for the key “57”.
The tree traversal starts at the root node on the left-hand side. Each entry
is processed in ascending order until a value is greater than or equal to (>=)
the search term (57). In the figure it is the entry 83. The database follows
the reference to the corresponding branch node and repeats the procedure
until the tree traversal reaches a leaf node.
Important
The B-tree enables the database to find a leaf node quickly.
5
Chapter 1: Anatomy of an Index
The first ingredient for a slow index lookup is the leaf node chain. Consider
the search for “57” in Figure 1.3 again. There are obviously two matching
entries in the index. At least two entries are the same, to be more precise:
the next leaf node could have further entries for “57”. The database must
read the next leaf node to see if there are any more matching entries. That
means that an index lookup not only needs to perform the tree traversal,
it also needs to follow the leaf node chain.
The second ingredient for a slow index lookup is accessing the table.
Even a single leaf node might contain many hits — often hundreds. The
corresponding table data is usually scattered across many table blocks (see
Figure 1.1, “Index Leaf Nodes and Corresponding Table Data”). That means
that there is an additional table access for each hit.
An index lookup requires three steps: (1) the tree traversal; (2) following the
leaf node chain; (3) fetching the table data. The tree traversal is the only
step that has an upper bound for the number of accessed blocks—the index
depth. The other two steps might need to access many blocks—they cause
a slow index lookup.
6
Slow Indexes, Part I
Logarithmic Scalability
In mathematics, the logarithm of a number to a given base is the
power or exponent to which the base must be raised in order to
1
produce the number [Wikipedia ].
In a search tree the base corresponds to the number of entries per
branch node and the exponent to the tree depth. The example index
in Figure 1.2 holds up to four entries per node and has a tree depth
3
of three. That means that the index can hold up to 64 (4 ) entries. If
4
it grows by one level, it can already hold 256 entries (4 ). Each time
a level is added, the maximum number of index entries quadruples.
The logarithm reverses this function. The tree depth is therefore
log4(number-of-index-entries).
The logarithmic growth enables
Tree Depth Index Entries
the example index to search a
million records with ten tree 3 64
levels, but a real world index is 4 256
even more efficient. The main 5 1,024
factor that affects the tree depth,
6 4,096
and therefore the lookup perfor-
mance, is the number of entries 7 16,384
in each tree node. This number 8 65,536
corresponds to— mathematically
9 262,144
speaking — the basis of the loga-
rithm. The higher the basis, the 10 1,048,576
shallower the tree, the faster the
traversal.
Databases exploit this concept to a maximum extent and put as many
entries as possible into each node— often hundreds. That means that
every new index level supports a hundred times more entries.
1
http://en.wikipedia.org/wiki/Logarithm
7
Chapter 1: Anatomy of an Index
The origin of the “slow indexes” myth is the misbelief that an index lookup
just traverses the tree, hence the idea that a slow index must be caused by a
“broken” or “unbalanced” tree. The truth is that you can actually ask most
databases how they use an index. The Oracle database is rather verbose in
this respect and has three distinct operations that describe a basic index
lookup:
The important point is that an INDEX RANGE SCAN can potentially read a large
part of an index. If there is one more table access for each row, the query
can become slow even when using an index.
8
Chapter 2
The where clause defines the search condition of an SQL statement, and it
thus falls into the core functional domain of an index: finding data quickly.
Although the where clause has a huge impact on performance, it is often
phrased carelessly so that the database has to scan a large part of the index.
The result: a poorly written where clause is the first ingredient of a slow
query.
This chapter explains how different operators affect index usage and how
to make sure that an index is usable for as many queries as possible. The
last section shows common anti-patterns and presents alternatives that
deliver better performance.
This section shows how to verify index usage and explains how
concatenated indexes can optimize combined conditions. To aid
understanding, we will analyze a slow query to see the real world impact
of the causes explained in Chapter 1.
9
Chapter 2: The Where Clause
Primary Keys
We start with the simplest yet most common where clause: the primary key
lookup. For the examples throughout this chapter we use the EMPLOYEES
table defined as follows:
The database automatically creates an index for the primary key. That
means there is an index on the EMPLOYEE_ID column, even though there is
no create index statement.
The following query uses the primary key to retrieve an employee’s name:
The where clause cannot match multiple rows because the primary key
constraint ensures uniqueness of the EMPLOYEE_ID values. The database does
not need to follow the index leaf nodes —it is enough to traverse the index
tree. We can use the so-called execution plan for verification:
---------------------------------------------------------------
|Id |Operation | Name | Rows | Cost |
---------------------------------------------------------------
| 0 |SELECT STATEMENT | | 1 | 2 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 2 |
|*2 | INDEX UNIQUE SCAN | EMPLOYEES_PK | 1 | 1 |
---------------------------------------------------------------
10
Primary Keys
The Oracle execution plan shows an INDEX UNIQUE SCAN — the operation that
only traverses the index tree. It fully utilizes the logarithmic scalability of
the index to find the entry very quickly —almost independent of the table
size.
Tip
The execution plan (sometimes explain plan or query plan) shows the
steps the database takes to execute an SQL statement. Appendix A on
page 165 explains how to retrieve and read execution plans with
other databases.
After accessing the index, the database must do one more step to
fetch the queried data (FIRST_NAME, LAST_NAME) from the table storage:
the TABLE ACCESS BY INDEX ROWID operation. This operation can become a
performance bottleneck —as explained in “Slow Indexes, Part I”— but there
is no such risk in connection with an INDEX UNIQUE SCAN. This operation
cannot deliver more than one entry so it cannot trigger more than one table
access. That means that the ingredients of a slow query are not present
with an INDEX UNIQUE SCAN.
11
Chapter 2: The Where Clause
Concatenated Indexes
Even though the database creates the index for the primary key
automatically, there is still room for manual refinements if the key consists
of multiple columns. In that case the database creates an index on all
primary key columns — a so-called concatenated index (also known as multi-
column, composite or combined index). Note that the column order of a
concatenated index has great impact on its usability so it must be chosen
carefully.
The index for the new primary key is therefore defined in the following way:
A query for a particular employee has to take the full primary key into
account— that is, the SUBSIDIARY_ID column also has to be used:
12
Concatenated Indexes
Whenever a query uses the complete primary key, the database can use
an INDEX UNIQUE SCAN — no matter how many columns the index has. But
what happens when using only one of the key columns, for example, when
searching all employees of a subsidiary?
----------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------
| 0 | SELECT STATEMENT | | 106 | 478 |
|* 1 | TABLE ACCESS FULL| EMPLOYEES | 106 | 478 |
----------------------------------------------------
The execution plan reveals that the database does not use the index. Instead
it performs a FULL TABLE SCAN. As a result the database reads the entire table
and evaluates every row against the where clause. The execution time grows
with the table size: if the table grows tenfold, the FULL TABLE SCAN takes ten
times as long. The danger of this operation is that it is often fast enough
in a small development environment, but it causes serious performance
problems in production.
13
Chapter 2: The Where Clause
The database does not use the index because it cannot use single columns
from a concatenated index arbitrarily. A closer look at the index structure
makes this clear.
A concatenated index is just a B-tree index like any other that keeps the
indexed data in a sorted list. The database considers each column according
to its position in the index definition to sort the index entries. The first
column is the primary sort criterion and the second column determines the
order only if two entries have the same value in the first column and so on.
Important
A concatenated index is one index across multiple columns.
Index-Tree
D
_I
RY
EE
IA
D
OY
ID
_I
D
PL
BS
_I
_I
RY
D
EM
SU
_I
RY
EE
IA
EE
IA
OY
ID
123 20 ROWID
OY
ID
PL
BS
PL
BS
EM
SU
123 21 ROWID
EM
SU
The index excerpt in Figure 2.1 shows that the entries for subsidiary 20 are
not stored next to each other. It is also apparent that there are no entries
with SUBSIDIARY_ID = 20 in the tree, although they exist in the leaf nodes.
The tree is therefore useless for this query.
14
Concatenated Indexes
Tip
Visualizing an index helps in understanding what queries the index
supports. You can query the database to retrieve the entries in index
order (SQL:2008 syntax, see page 144 for proprietary solutions
using LIMIT, TOP or ROWNUM):
If you put the index definition and table name into the query, you
will get a sample from the index. Ask yourself if the requested rows
are clustered in a central place. If not, the index tree cannot help find
that place.
We can take advantage of the fact that the first index column is always
usable for searching. Again, it is like a telephone directory: you don’t need
to know the first name to search by last name. The trick is to reverse the
index column order so that the SUBSIDIARY_ID is in the first position:
Both columns together are still unique so queries with the full primary
key can still use an INDEX UNIQUE SCAN but the sequence of index entries is
entirely different. The SUBSIDIARY_ID has become the primary sort criterion.
That means that all entries for a subsidiary are in the index consecutively
so the database can use the B-tree to find their location.
15
Chapter 2: The Where Clause
Important
The most important consideration when defining a concatenated
index is how to choose the column order so it can be used as often
as possible.
The execution plan confirms that the database uses the “reversed” index.
The SUBSIDIARY_ID alone is not unique anymore so the database must
follow the leaf nodes in order to find all matching entries: it is therefore
using the INDEX RANGE SCAN operation.
--------------------------------------------------------------
|Id |Operation | Name | Rows | Cost |
--------------------------------------------------------------
| 0 |SELECT STATEMENT | | 106 | 75 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 106 | 75 |
|*2 | INDEX RANGE SCAN | EMPLOYEE_PK | 106 | 2 |
--------------------------------------------------------------
Even though the two-index solution delivers very good select performance
as well, the single-index solution is preferable. It not only saves storage
space, but also the maintenance overhead for the second index. The fewer
indexes a table has, the better the insert, delete and update performance.
To define an optimal index you must understand more than just how
indexes work — you must also know how the application queries the data.
This means you have to know the column combinations that appear in the
where clause.
16
Concatenated Indexes
The only place where the technical database knowledge meets the
functional knowledge of the business domain is the development
department. Developers have a feeling for the data and know the access
path. They can properly index to get the best benefit for the overall
application without much effort.
17
Chapter 2: The Where Clause
The adopted EMPLOYEE_PK index improves the performance of all queries that
search by subsidiary only. It is however usable for all queries that search
by SUBSIDIARY_ID — regardless of whether there are any additional search
criteria. That means the index becomes usable for queries that used to use
another index with another part of the where clause. In that case, if there
are multiple access paths available it is the optimizer’s job to choose the
best one.
18
Slow Indexes, Part II
---------------------------------------------------------------
|Id |Operation | Name | Rows | Cost |
---------------------------------------------------------------
| 0 |SELECT STATEMENT | | 1 | 30 |
|*1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 30 |
|*2 | INDEX RANGE SCAN | EMPLOYEES_PK | 40 | 2 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("LAST_NAME"='WINAND')
2 - access("SUBSIDIARY_ID"=30)
The execution plan uses an index and has an overall cost value of 30.
So far, so good. It is however suspicious that it uses the index we just
changed— that is enough reason to suspect that our index change caused
the performance problem, especially when bearing the old index definition
in mind— it started with the EMPLOYEE_ID column which is not part of the
where clause at all. The query could not use that index before.
For further analysis, it would be nice to compare the execution plan before
and after the change. To get the original execution plan, we could just
deploy the old index definition again, however most databases offer a
simpler method to prevent using an index for a specific query. The following
example uses an Oracle optimizer hint for that purpose.
19
Chapter 2: The Where Clause
The execution plan that was presumably used before the index change did
not use an index at all:
----------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 477 |
|* 1 | TABLE ACCESS FULL| EMPLOYEES | 1 | 477 |
----------------------------------------------------
Even though the TABLE ACCESS FULL must read and process the entire table,
it seems to be faster than using the index in this case. That is particularly
unusual because the query matches one row only. Using an index to find
a single row should be much faster than a full table scan, but in this case
it is not. The index seems to be slow.
Tip
Appendix A, “Execution Plans”, explains how to find the “Predicate
Information” for other databases.
The INDEX RANGE SCAN with operation ID 2 (Example 2.1 on page 19)
applies only the SUBSIDIARY_ID=30 filter. That means that it traverses the
index tree to find the first entry for SUBSIDIARY_ID 30. Next it follows the
leaf node chain to find all other entries for that subsidiary. The result of the
INDEX RANGE SCAN is a list of ROWIDs that fulfill the SUBSIDIARY_ID condition:
depending on the subsidiary size, there might be just a few ones or there
could be many hundreds.
The next step is the TABLE ACCESS BY INDEX ROWID operation. It uses the
ROWIDs from the previous step to fetch the rows —all columns— from the
table. Once the LAST_NAME column is available, the database can evaluate
the remaining part of the where clause. That means the database has to
fetch all rows for SUBSIDIARY_ID=30 before it can apply the LAST_NAME filter.
20
Slow Indexes, Part II
The statement’s response time does not depend on the result set size
but on the number of employees in the particular subsidiary. If the
subsidiary has just a few members, the INDEX RANGE SCAN provides better
performance. Nonetheless a TABLE ACCESS FULL can be faster for a huge
subsidiary because it can read large parts from the table in one shot (see
“Full Table Scan” on page 13).
The query is slow because the index lookup returns many ROWIDs — one for
each employee of the original company— and the database must fetch them
individually. It is the perfect combination of the two ingredients that make
an index slow: the database reads a wide index range and has to fetch many
rows individually.
Choosing the best execution plan depends on the table’s data distribution
as well so the optimizer uses statistics about the contents of the database.
In our example, a histogram containing the distribution of employees over
subsidiaries is used. This allows the optimizer to estimate the number
of rows returned from the index lookup —the result is used for the cost
calculation.
Statistics
A cost-based optimizer uses statistics about tables, columns, and
indexes. Most statistics are collected on the column level: the number
of distinct values, the smallest and largest values (data range),
the number of NULL occurrences and the column histogram (data
distribution). The most important statistical value for a table is its
size (in rows and blocks).
The most important index statistics are the tree depth, the number
of leaf nodes, the number of distinct keys and the clustering factor
(see Chapter 5, “Clustering Data”).
The optimizer uses these values to estimate the selectivity of the
where clause predicates.
21
Chapter 2: The Where Clause
If there are no statistics available— for example because they were deleted—
the optimizer uses default values. The default statistics of the Oracle
database suggest a small index with medium selectivity. They lead to the
estimate that the INDEX RANGE SCAN will return 40 rows. The execution plan
shows this estimation in the Rows column (again, see Example 2.1 on page
19). Obviously this is a gross underestimate, as there are 1000 employees
working for this subsidiary.
---------------------------------------------------------------
|Id |Operation | Name | Rows | Cost |
---------------------------------------------------------------
| 0 |SELECT STATEMENT | | 1 | 680 |
|*1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 680 |
|*2 | INDEX RANGE SCAN | EMPLOYEES_PK | 1000 | 4 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("LAST_NAME"='WINAND')
2 - access("SUBSIDIARY_ID"=30)
The cost value of 680 is even higher than the cost value for the execution
plan using the FULL TABLE SCAN (477, see page 20). The optimizer will
therefore automatically prefer the FULL TABLE SCAN.
This example of a slow index should not hide the fact that proper indexing
is the best solution. Of course searching on last name is best supported by
an index on LAST_NAME:
22
Slow Indexes, Part II
--------------------------------------------------------------
| Id | Operation | Name | Rows | Cost |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 |
|* 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 3 |
|* 2 | INDEX RANGE SCAN | EMP_NAME | 1 | 1 |
--------------------------------------------------------------
The two execution plans from Example 2.1 (page 19) and Example 2.2
are almost identical. The database performs the same operations and
the optimizer calculated similar cost values, nevertheless the second plan
performs much better. The efficiency of an INDEX RANGE SCAN may vary
over a wide range —especially when followed by a table access. Using an
index does not automatically mean a statement is executed in the best way
possible.
23
Chapter 2: The Where Clause
Functions
The index on LAST_NAME has improved the performance considerably, but
it requires you to search using the same case (upper/lower) as is stored in
the database. This section explains how to lift this restriction without a
decrease in performance.
Note
MySQL 5.6 does not support function-based indexing as described
below. As an alternative, virtual columns were planned for MySQL
6.0 but were introduced in MariaDB 5.2 only.
Regardless of the capitalization used for the search term or the LAST_NAME
column, the UPPER function makes them match as desired.
Note
Another way for case-insensitive matching is to use a different
“collation”. The default collations used by SQL Server and MySQL do
not distinguish between upper and lower case letters— they are case-
insensitive by default.
24
Case-Insensitive Search Using UPPER or LOWER
The logic of this query is perfectly reasonable but the execution plan is not:
----------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 477 |
|* 1 | TABLE ACCESS FULL| EMPLOYEES | 10 | 477 |
----------------------------------------------------
It is a return of our old friend the full table scan. Although there is an index
on LAST_NAME, it is unusable —because the search is not on LAST_NAME but
on UPPER(LAST_NAME). From the database’s perspective, that’s something
entirely different.
This is a trap we all might fall into. We recognize the relation between
LAST_NAME and UPPER(LAST_NAME) instantly and expect the database to “see”
it as well. In reality the optimizer’s view is more like this:
The UPPER function is just a black box. The parameters to the function
are not relevant because there is no general relationship between the
function’s parameters and the result.
Tip
Replace the function name with BLACKBOX to understand the opti-
mizer’s point of view.
25
Chapter 2: The Where Clause
To support that query, we need an index that covers the actual search term.
That means we do not need an index on LAST_NAME but on UPPER(LAST_NAME):
The database can use a function-based index if the exact expression of the
index definition appears in an SQL statement —like in the example above.
The execution plan confirms this:
--------------------------------------------------------------
|Id |Operation | Name | Rows | Cost |
--------------------------------------------------------------
| 0 |SELECT STATEMENT | | 100 | 41 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 100 | 41 |
|*2 | INDEX RANGE SCAN | EMP_UP_NAME | 40 | 1 |
--------------------------------------------------------------
Warning
Sometimes ORM tools use UPPER and LOWER without the developer’s
knowledge. Hibernate, for example, injects an implicit LOWER for case-
insensitive searches.
The execution plan is not yet the same as it was in the previous section
without UPPER; the row count estimate is too high. It is particularly strange
that the optimizer expects to fetch more rows from the table than the
INDEX RANGE SCAN delivers in the first place. How can it fetch 100 rows from
the table if the preceding index scan returned only 40 rows? The answer is
that it can not. Contradicting estimates like this often indicate problems
with the statistics. In this particular case it is because the Oracle database
26
Case-Insensitive Search Using UPPER or LOWER
does not update the table statistics when creating a new index (see also
“Oracle Statistics for Function-Based Indexes” on page 28).
--------------------------------------------------------------
|Id |Operation | Name | Rows | Cost |
--------------------------------------------------------------
| 0 |SELECT STATEMENT | | 1 | 3 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 3 |
|*2 | INDEX RANGE SCAN | EMP_UP_NAME | 1 | 1 |
--------------------------------------------------------------
Note
The so-called “extended statistics” on expressions and column groups
were introduced with Oracle release 11g.
Tip
Appendix A, “Execution Plans”, describes the row count estimates in
SQL Server and PostgreSQL execution plans.
SQL Server does not support function-based indexes as described but it does
offer computed columns that can be used instead. To make use of this,
you have to first add a computed column to the table that can be indexed
afterwards:
SQL Server is able to use this index whenever the indexed expression
appears in the statement. You do not need to rewrite your query to use the
computed column.
27
Chapter 2: The Where Clause
28
User-Defined Functions
User-Defined Functions
There is one important exception. It is, for example, not possible to refer
to the current time in an index definition, neither directly nor indirectly,
as in the following example.
The function GET_AGE uses the current date (SYSDATE) to calculate the age
based on the supplied date of birth. You can use this function in all parts
of an SQL query, for example in select and the where clauses:
The reason behind this limitation is simple. When inserting a new row, the
database calls the function and stores the result in the index and there it
stays, unchanged. There is no periodic process that updates the index. The
database updates the indexed age only when the date of birth is changed
by an update statement. After the next birthday, the age that is stored in
the index will be wrong.
29
Chapter 2: The Where Clause
Caution
PostgreSQL and the Oracle database trust the DETERMINISTIC or
IMMUTABLE declarations— that means they trust the developer.
You can declare the GET_AGE function to be deterministic and use it in
an index definition. Regardless of the declaration, it will not work as
intended because the age stored in the index will not increase as the
years pass; the employees will not get older—at least not in the index.
Other examples for functions that cannot be “indexed” are random number
generators and functions that depend on environment variables.
Think about it
How can you still use an index to optimize a query for all 42-year-
old employees?
30
Over-Indexing
Over-Indexing
A single index cannot support both methods of ignoring the case. We could,
of course, create a second index on LOWER(last_name) for this query, but
that would mean the database has to maintain two indexes for each insert,
update, and delete statement (see also Chapter 8, “Modifying Data”). To
make one index suffice, you should consistently use the same function
throughout your application.
Tip
Unify the access path so that one index can be used by several
queries.
Tip
Always aim to index the original data as that is often the most useful
information you can put into an index.
31
Chapter 2: The Where Clause
Parameterized Queries
There is nothing bad about writing values directly into ad-hoc statements;
there are, however, two good reasons to use bind parameters in programs:
Security
1
Bind variables are the best way to prevent SQL injection .
Performance
Databases with an execution plan cache like SQL Server and the
Oracle database can reuse an execution plan when executing the same
statement multiple times. It saves effort in rebuilding the execution
plan but works only if the SQL statement is exactly the same. If you put
different values into the SQL statement, the database handles it like a
different statement and recreates the execution plan.
When using bind parameters you do not write the actual values but
instead insert placeholders into the SQL statement. That way the
statements do not change when executing them with different values.
1
http://en.wikipedia.org/wiki/SQL_injection
32
Parameterized Queries
Naturally there are exceptions, for example if the affected data volume
depends on the actual values:
99 rows selected.
SELECT first_name, last_name
FROM employees
WHERE subsidiary_id = 20;
---------------------------------------------------------------
|Id | Operation | Name | Rows | Cost |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99 | 70 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 99 | 70 |
|*2 | INDEX RANGE SCAN | EMPLOYEE_PK | 99 | 2 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("SUBSIDIARY_ID"=20)
An index lookup delivers the best performance for small subsidiaries, but a
TABLE ACCESS FULL can outperform the index for large subsidiaries:
33
Chapter 2: The Where Clause
The subsequent cost calculation will therefore result in two different cost
values. When the optimizer finally selects an execution plan it takes the
plan with the lowest cost value. For the smaller subsidiary, it is the one
using the index.
The cost of the TABLE ACCESS BY INDEX ROWID operation is highly sensitive
to the row count estimate. Selecting ten times as many rows will elevate
the cost value by that factor. The overall cost using the index is then even
higher than a full table scan. The optimizer will therefore select the other
execution plan for the bigger subsidiary.
Tip
Column histograms are most useful if the values are not uniformly
distributed.
For columns with uniform distribution, it is often sufficient to divide
the number of distinct values by the number of rows in the table.
This method also works when using bind parameters.
From this perspective, it is a little bit paradoxical that bind parameters can
improve performance if not using bind parameters enables the optimizer
to always opt for the best execution plan. But the question is at what price?
Generating and evaluating all execution plan variants is a huge effort that
does not pay off if you get the same result in the end anyway.
34
Parameterized Queries
Tip
Not using bind parameters is like recompiling a program every time.
As the developer, you can use bind parameters deliberately to help resolve
this dilemma. That is, you should always use bind parameters except for
values that shall influence the execution plan.
Unevenly distributed status codes like “todo” and “done” are a good
example. The number of “done” entries often exceeds the “todo” records by
an order of magnitude. Using an index only makes sense when searching
for “todo” entries in that case. Partitioning is another example — that is, if
you split tables and indexes across several storage areas. The actual values
can then influence which partitions have to be scanned. The performance
of LIKE queries can suffer from bind parameters as well as we will see in
the next section.
Tip
In all reality, there are only a few cases in which the actual values
affect the execution plan. You should therefore use bind parameters
if in doubt — just to prevent SQL injections.
The following code snippets show how to use bind parameters in various
programming languages.
35
Another Random Scribd Document
with Unrelated Content
yesterday. I accepted the omen, and we parted in the best of
humours, and with the hope of a speedy meeting.
This is the first Latin verse, the subject of which ever stood visibly
before me, and now, in the present moment, when the wind is
blowing stronger and stronger, and the lake casts loftier billows
against the little harbour, it is just as true as it was hundreds of years
ago. Much, indeed, has changed, but the wind still roars about the
lake, the aspect of which gains even greater glory from a line of
Virgil's.
The above was written in a latitude of 45° 50'.
I went out for a walk in the cool of the evening, and now I really find
myself in a new country, surrounded by objects entirely strange. The
people lead a careless, sauntering life. In the first place, the doors are
without locks, but the host assured me that I might be quite at ease,
even though all I had about me consisted of diamonds. In the second
place, the windows are covered with oiled paper instead of glass. In
the third place, an extremely necessary convenience is wanting, so
that one comes pretty close to a state of nature. When I asked the
waiter for a certain place, he pointed down into the court-yard: "Qui,
abasso puo servirsi!" "Dove?" asked I. "Da per tutto, dove vuol," was
the friendly reply. The greatest carelessness is visible everywhere, but
still there is life and bustle enough. During the whole day there is a
constant chattering and shrieking of the female neighbors, all have
something to do at the same time. I have not yet seen an idle
woman.
The host, with Italian emphasis, assured me, that
he felt great pleasure in being able to serve me with Lago Di Garda.
the finest trout. They are taken near Torbole, where
the stream flows down from the mountains, and the fish seeks a
passage upwards. The Emperor farms this fishery for 10,000 gulden.
The fish, which are large, often weighing fifty pounds, and spotted
over the whole body to the head, are not trout, properly so called.
The flavour, which is between that of trout and salmon, is delicate and
excellent.
But my real delight is in the fruit.—in the figs, and in the pears, which
must, indeed, be excellent, where citrons are already growing.
Sept. 11th.
The wind, which blew against me yesterday, and drove me into the
harbour of Malsesine, was the cause of a perilous adventure, which I
got over with good humour, and the remembrance of which I still find
amusing. According to my plan, I went early in the morning into the
old castle, which having neither gate nor guard, is accessible to
everybody. Entering the court-yard, I seated myself opposite to the
old tower, which is built on and among the rocks. Here I had selected
a very convenient spot for drawing;—a carved stone seat in the wall,
near a closed door, raised some three or four feet high, such as we
also find in the old buildings in our own country.
I had not sat long before several persons entered
the yard, and walked backwards and forwards, An incident at
Malsesine.
looking at me. The multitude increased, and at last
so stood as completely to surround me. I remarked that my drawing
had excited attention; however, I did not allow myself to be disturbed,
but quietly continued my occupation. At last a man, not of the most
prepossessing appearance, came up to me, and asked me what I was
about. I replied that I was copying the old tower, that I might have
some remembrance of Malsesine. He said that this was not allowed,
and that I must leave off. As he said this in the common Venetian
dialect, so that I understood him with difficulty, I answered, that I did
not understand him at all. With true Italian coolness he took hold of
my paper, and tore it, at the same time letting it remain on the
pasteboard. Here I observed an air of dissatisfaction among the by-
standers; an old woman in particular said that it was not right, but
that the podestà ought to be called, who was the best judge of such
matters. I stood upright on the steps, having my back against the
door, and surveyed the assembly, which was continually increasing.
The fixed eager glances, the good humoured expression of most of
the faces, and all the other characteristics of a foreign mob, made the
most amusing impression upon me. I fancied that I could see before
me the chorus of birds, which, as Treufreund, I had often laughed at,
in the Ettersburg theatre. This put me in excellent humour, and when
the podestà came up with his actuary, I greeted him in an open
manner, and when he asked me why I was drawing the fortification,
modestly replied, that I did not look upon that wall as a fortification. I
called the attention of him and the people to the decay of the towers
and walls, and to the generally defenceless position of the place,
assuring him that I thought I only saw and drew a ruin.
I was answered thus: "If it was only a ruin, what could there be
remarkable about it?" As I wished to gain time and favour, I replied
very circumstantially, that they must be well aware how many
travellers visited Italy, for the sake of the ruins only, that Rome, the
metropolis of the world, having suffered the depredations of
barbarians, was now full of ruins, which had been drawn hundreds of
times, and that all the works of antiquity were not in such good
preservation as the amphitheatre at Verona, which I hoped soon to
see.
The podestà, who stood before me, though in a less elevated position,
was a tall man, not exactly thin, of about thirty years of age. The flat
features of his spiritless face perfectly accorded with the slow
constrained manner, in which he put his questions. Even the actuary,
a sharp little fellow, seemed as if he did not know what to make of a
case so new, and so unexpected. I said a great deal of the same sort;
the people seemed to take my remarks good naturedly, and on
turning towards some kindly female faces, I thought I could read
assent and approval.
When, however, I mentioned the amphitheatre at Verona, which in
this country, is called the "Arena," the actuary, who had in the
meanwhile collected himself, replied, that this was all very well,
because the edifice in question was a Roman building, famed
throughout the world. In these towers, however, there was nothing
remarkable, excepting that they marked the boundary between the
Venetian domain and Austrian Empire, and therefore espionage could
not be allowed. I answered by explaining at some length, that not
only the Great and Roman antiquities, but also those of the Middle-
Ages were worth attention. They could not be blamed, I granted, if,
having been accustomed to this building from their youth upwards,
they could not discern in it so many picturesque beauties as I did.
Fortunately the morning sun, shed the most beautiful lustre on the
tower, rocks, and walls, and I began to describe the scene with
enthusiasm. My audience, however, had these much lauded objects
behind them, and as they did not wish to turn altogether away from
me, they all at once twisted their heads, like the birds, which we call
"wry necks" (Wendehälse), that they might see with their eyes, what I
had been lauding to their ears. Even the podestà turned round
towards the picture I had been describing, though with more dignity
than the rest. This scene appeared to me so ridiculous that my good
humour increased, and I spared them nothing—least of all, the ivy,
which had been suffered for ages to adorn the rocks and walls.
The actuary retorted, that this was all very good, but the Emperor
Joseph was a troublesome gentleman, who certainly entertained
many evil designs against Venice; and I might probably have been
one of his subjects, appointed by him, to act as a spy on the borders.
"Far from belonging to the Emperor," I replied, "I can boast, as well as
you, that I am a citizen of a republic, which also governs itself, but
which is not, indeed, to be compared for power and greatness to the
illustrious state of Venice, although in commercial activity, in wealth,
and in the wisdom of its rulers, it is inferior to no state in Germany. I
am a native of Frankfort-on-the-Main, a city, the name and fame of
which has doubtless reached you."
"Of Frankfort-on-the-Main!" cried a pretty young
woman, "then, Mr. Podestà, you can at once see all An incident at
Malsesine.
about the foreigner, whom I look upon as an honest
man. Let Gregorio be called; he has resided there a long time, and
will be the best judge of the matter."
The kindly faces had already increased around me, the first adversary
had vanished, and when Gregorio came to the spot, the whole affair
took a decided turn in my favor. He was a man upwards of fifty, with
one of those well-known Italian faces. He spoke and conducted
himself like one, who feels that something foreign is not foreign to
him, and told me at once that he had seen service in Bolongari's
house, and would be delighted to hear from me something about this
family and the city in general, which had left a pleasant impression in
his memory. Fortunately his residence at Frankfort had been during
my younger years, and I had the double advantage of being able to
say exactly how matters stood in his time, and what alteration had
taken place afterwards. I told him about all the Italian families, none
of whom had remained unknown to me. With many particulars he was
highly delighted, as, for instance, with the fact that Herr Alessina had
celebrated his "golden wedding,"[2] in the year 1774, and that a
medal had been struck on the occasion, which was in my possession.
He remembered that the wife of this wealthy merchant was by birth a
Brentano. I could also tell him something about the children and
grand-children of these families, how they had grown up, and had
been provided for and married, and had multiplied themselves in their
descendants.
When I had given the most accurate information about almost
everything which he asked, his features alternately expressed
cheerfulness and solemnity. He was pleased and touched, while the
people cheered up more and more, and could not hear too much of
our conversation, of which—it must be confessed—he was obliged to
translate a part into their own dialect.
At last he said: "Podestà, I am convinced that this is a good,
accomplished, and well-educated gentleman, who is travelling about
to acquire instruction. Let him depart in a friendly manner, that he
may speak well of us to his fellow-countrymen, and induce them to
visit Malsesine, the beautiful situation of which is well worthy the
admiration of foreigners. I gave additional force to these friendly
words by praising the country, the situation, and the inhabitants, not
forgetting to mention the magistrates as wise and prudent
personages."
This was well received, and I had permission to visit the place at
pleasure, in company with Master Gregorio. The landlord, with whom
I had put up, now joined us, and was delighted at the prospect of the
foreign guests, who would crowd upon him, when once the
advantages of Malsesine were properly known. With the most lively
curiosity he examined my various articles of dress, but especially
envied me the possession of a little pistol, which slipped conveniently
into the pocket. He congratulated those who could carry such pretty
weapons, this being forbidden in his country under the severest
penalties. This friendly but obtrusive personage I sometimes
interrupted to thank my deliverer. "Do not thank me," said honest
Gregorio, "for you owe me nothing. If the Podestà had understood his
business, and the Actuary had not been the most selfish man in the
world, you would not have got off so easily. The former was still more
puzzled than you, and the latter would have pocketed nothing by your
arrest, the information, and your removal to Verona. This he rapidly
thought over, and you were already free, before our dialogue was
ended."
Towards the evening the good man took me into his vineyard, which
was very well situated, down along the lake. We were accompanied
by his son, a lad of fifteen, who was forced to climb the trees, and
pluck me the best fruit, while the old man looked out for the ripest
grapes.
While thus placed between these two kindhearted people, both
strange to the world, alone, as it were, in the deep solitude of the
earth, I felt, in the most lively manner, as I reflected on the day's
adventure, what a whimsical being Man is—how the very thing, which
in company he might enjoy with ease and security, is often rendered
troublesome and dangerous, from his notion, that he can appropriate
to himself the world and its contents after his own peculiar fashion.
Towards midnight my host accompanied me to the barque, carrying
the basket of fruit with which Gregorio had presented me, and thus,
with a favorable wind, I left the shore, which had promised to become
a Læstrygonicum shore to me.
[2] The fiftieth anniversary of a wedding-day is so called in
Germany. Trans.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookultra.com