Statistics For Oracle Optimzer
Statistics For Oracle Optimzer
Optimizer statistics are a collection of data that describe the database and the objects in the
database. These statistics are used by the optimizer to choose the best execution plan for each SQL
statement. Statistics are stored in the data dictionary and can be accessed using data dictionary
views such as DBA_TAB_STATISTICS.
The optimizer uses statistics to get an estimate of the number of rows (and number of bytes)
retrieved from a table, partition, or index. The optimizer estimates the cost for the access,
determines the cost for possible plans, and then picks the execution plan with the lowest cost.
• Table statistics
• Number of rows
• Number of blocks
• Average row length
• Column statistics
• Number of distinct values (NDV) in a column
• Number of nulls in a column
• Data distribution (histogram)
• Extended statistics
• Index statistics
• Number of leaf blocks
• Number of levels
• Index clustering factor
• System statistics
• I/O performance and utilization
• CPU performance and utilization
As shown in Figure, the database stores optimizer statistics for tables, columns, indexes, and the
system in the data dictionary. You can access these statistics using data dictionary views.
Most types of optimizer statistics need be gathered or refreshed periodically to ensure that they
accurately reflect the nature of the data that’s stored in the database. If, for example, the data in the
database is highly volatile (perhaps there are tables that are rapidly and continuously populated)
then it will be necessary to gather statistics more frequently than if the data is relatively static
Selectivity and cardinality are the two most important thing that can influence optimizer to
find a better query execution plan in oracle
Selectivity represents the fraction of rows filtered by an operation so you can say it is a measure
of uniqueness. Its value is always between 0 and 1.
If a query return all the rows in the table then your selectivity is 1 or 100%. This is known as bad
Selectivity. It is BAD because as such all records are coming.
If a query returns a fraction of rows after putting a filter say 10 out of 100 then your selectivity is
0.1 (10/100) or you can also say your selectivity is 10%
A column is highly selective if a SQL return a small number of duplicate rows. This is
known as GOOD Selectivity.
A columns is least selective if a SQL returns all or large number of rows. This is known as
BAD Selectivity.
What is cardinality?
The cardinality is the expected number of rows returned from our query or The number of rows
returned by an operation is the cardinality
The relationship between selectivity and cardinality is below:
cardinality = selectivity × number of input rows
Table statistics include information on the number of rows in the table, the number of data blocks
used for the table, as well as the average row length in the table. The Optimizer uses this
information, with other statistics, to compute the cost of various operations in an execution plan,
and to estimate the number of rows the operation will produce. For example, the cost of a table
access is calculated using the number of data blocks combined with the value of the parameter
DB_FILE_MULTIBLOCK_READ_COUNT. You can view table statistics in the dictionary view
USER_TAB_STATISTICS.
Column statistics include information on the number of distinct values in a column (NDV) as well as
the minimum and maximum value found in the column. You can view column statistics in the
dictionary view USER_TAB_COL_STATISTICS. The Optimizer uses the column statistics information in
conjunction with the table statistics (number of rows) to estimate the number of rows that will be
returned by a SQL operation.
Basic table and column statistics tell the optimizer a great deal but they don’t provide a mechanism
to tell the Optimizer about the nature of the data in the table or column. For example, these
statistics can’t tell the Optimizer if there is a data skew in a column, or if there is a correlation
between columns in a table. Information on the nature of the data can be provided to the Optimizer
by using extensions to basic statistics like, histograms
Histograms
Histograms are a special type of column statistic created to provide more detailed information on
the data distribution in a table column.
Histograms tell the Optimizer about the distribution of data within a column. By default (without a
histogram), the Optimizer assumes a uniform distribution of rows across the distinct values in a
column. The Optimizer calculates the cardinality for an equality predicate (In where clause) by
dividing the total number of rows in the table by the number of distinct values in the column used in
the equality predicate. If the data distribution in that column is not uniform (i.e., a data skew) then
the cardinality estimate will be incorrect. To accurately reflect a non-uniform data distribution, a
histogram is required on the column. The presence of a histogram changes the formula used by
the Optimizer to estimate the cardinality and allows it to generate a more accurate execution
plan.
A column is a candidate for a histogram if it is used in equality or range
predicates such as WHERE col1= 'X' or WHERE col1 BETWEEN 'A' and 'B' and if it has a skew in the
distribution of column values. The optimizer knows which columns are used in query predicates
because this information is tracked and stored in the dictionary table SYS.COL_USAGE$
There are two types of histograms, frequency or height-balanced. Oracle determines the type of
histogram to be created based on the number of distinct values in the column.
Gathering Statistics
For database objects that are constantly changing, statistics must be regularly gathered so that they
accurately describe the database object. The PL/SQL package, DBMS_STATS, is Oracle’s preferred
method for gathering statistics. The DBMS_STATS package contains over 50 different procedures for
gathering and managing statistics but most important of these procedures are the GATHER_*_STATS
procedures. These procedures can be used to gather table, column, and index statistics.
GATHER_TABLE_STATS
The DBMS_STATS.GATHER_TABLE_STATS procedure allows you to gather table, partition, index, and
column statistics. Although it takes 15 different parameters, only the first two or three parameters
need to be specified to run the procedure if not specified their default value will be automatically
taken
Query :
exec dbms_stats.gather_table_stats (
ownname => ‘Amit_Schema’,
tabname => ‘Employee’,
estimate_percent => dbms_stats.auto_sample_size,
method_opt => ‘for all columns size auto’,
cascade => true,
degree => 5
)
/
PL/SQL procedure successfully completed.
ownname This is nothing but the schema name
tablename Name of table for gathering stats
Estimate of percentage of rows (NULL means compute). Use the constant
estimate_percent DBMS_STATS.AUTO_SAMPLE_SIZE to have Oracle determine the appropriate sample size for good
statistics. This is the default.
The METHOD_OPT parameter controls the creation of histograms during statistics collection FOR
ALL COLUMNS SIZE AUTO is the default value, Oracle automatically determines which columns
method opt
require histograms and
the number of buckets that will be used based on the column usage information
cascade This statement is used to Gather statistics on the indexes for this table.
degree This indicates degree of parallelism. The default for degree is NONE.
The GRANULARITY parameter dictates the levels at which statistics are gathered on a partitioned
table.
GRANULARITY The possible levels are table (global), partition, or sub-partition. By default, Oracle will determine
which
levels are necessary based on the table’s partitioning strategy
MANAGING STATISTICS
In addition to collecting appropriate statistics. Oracle offers a number of methods to do manage the
statistics, including the ability to restore statistics to a previous version, the option to transfer
statistics from one system to another, or even manually setting the statistics values yourself. These
options are extremely useful in specific cases, but are not recommended to replace standard
statistics gathering methods using the DBMS_STATS package