How SQL Server Indexes Work: Sharon F. Dooley
How SQL Server Indexes Work: Sharon F. Dooley
Intermediate node
Leaf node
Data pages
What Is a Node?
A page that contains key and pointer pairs
Key Key Key Key Key Key Key Key Pointer Pointer Pointer Pointer Pointer Pointer Pointer Pointer
Root (Level 0)
Abby
Ada
Andy
Ann
Node (Level 1)
Ada
Alan
Amanda
Amy
Leaf (Level 2)
Bob
Alan
Amanda
Carol
Amy
Dave
Ada
DB
Bob
Alan
Amanda
Carol
Amy
Dave
Ada
Alice
Adding Alice
Step 2: Split the next level up
Abby Ada Amanda Andy Ann
Ada
Alan
Alice
Amanda
Amy
Leaf
Bob
Alan
Amanda
Carol
Amy
Dave
Ada
Alice
DB
Abby
Ada
Amanda
Andy
Ann
Ada
Alan
Alice
Amanda
Amy
Leaf
Bob
Alan
Amanda
Carol
Amy
Dave
Ada
Alice
DB
Root (Level 0)
Carol Dave
Abby
Andy
Bob
Abby
Ada
Amanda
Andy
Ann
Ada
Alan
Alice
Amanda
Leaf (Level 3)
Ada Alice
Bob
Alan
Amanda
Carol
Amy
Dave
DB
...
Extent
...
...
Extent
Index Fragmentation
Index page fragmentation occurs when a new key-pointer pair must be added to an index page that is full Consider an Employee table with a nonclustered index on Social Security Number
...
Extent
Extent
...
Extent
Results
Employees LastName_IDx 0.685
Employees
Employees Employees
PK_Employees
City_IDX Region_IDX
3.0303
0 3.922
Repairing Fragmentation
Repair index fragmentation by rebuilding index Rebuilding clustered index repairs table fragmentation DBCC DBREINDEX DBCC DBREINDEX (tablename [, indexname [, fillfactor]]) Can reorganize indexes that implement primary key and unique constraints CREATE INDEX DROP_EXISTING causes SQL Server to create and drop the index in a single step Faster than dropping with the DROP INDEX command and then re-creating ALTER TABLE ADD CONSTRAINT PRIMARY KEY or UNIQUE
One clustered index per table Choose wisely Should always have a clustered index Allows reorganization of the data pages
249 nonclustered indexes per table
Clustered Index
Root
Abby Bob Carol Dave
Abby
Ada
Andy
Ann
Ada
Alan
Amanda
Amy
Nonclustered Index
Root
Abby Bob Carol Dave
Abby
Ada
Andy
Ann
Ada
Alan
Amanda
Amy
Leaf node
Amy
Ada
Amanda
Alan
Database
Clustered indexes are always unique If you dont specify unique when creating them, SQL Server may add a uniqueifier to the index key Only used when there actually is a duplicate Adds 4 bytes to the key The clustering key is used in nonclustered indexes This allows SQL Server to go directly to the record from the nonclustered index If there is no clustered index, a record identifier will be used instead Leaf node of a nonclustered index on LastName
Adams Douglas 3 4
Jones
Smith Adams Douglas
John
Mary Mark Susan
Jones
Smith
1
2
SQL Server can follow the pointers to move from page to page When there is no clustered index, the table is called a heap Data is located Through nonclustered indexes By scanning all the pages in the table
Deleting from a nonleaf node No ghost records Page is not compressed When rows are deleted, both nonclustered and clustered indexes must be maintained When the last row is deleted from a page (index or data), the page is deallocated and returned to the free space pool Unless it is the only page in the table A table always has at least one page, even if it is empty
Selectivity
The statistics allow the optimizer to determine the selectivity of an index A unique, single-column index always has a selectivity of 1 One index entry points to exactly one row Another term for this is density Density is the inverse of selectivity Density values range from 0 to 1 A selective index has a density of 0.10 or less A unique, single-column index always has a density of 0.0 When the index is composite, it becomes a little more complicated SQL Server maintains detailed statistics only on the leftmost column It does compute density for each column Assume there is an index on (col1, col2, col3) Density is computed for Col1 Col1 + Col2 Col1 + Col2 + Col3
Exploring Statistics
To see the index statistics, use DBCC SHOW_STATISTICS ('tablename', {'indexname' | 'statisticsname'}) DBCC SHOW_STATISTICS ('Employees', 'EmployeeName_Idx') Interpreting the step output
RANGE_HI_KEY RANGE_ROWS EQ_ROWS Upper-bound value of a histogram step Number of rows from the sample that fall within a histogram step, excluding the upper bound Number of rows from the sample that are equal in value to the upper bound of the histogram step
DISTINCT_RANGE_ROWS Number of distinct values within a histogram step, excluding the upper bound AVG_RANGE_ROWS Average number of duplicate values within a histogram step, excluding the upper bound
Avg key length -------------------- ------- ------------- ------ ------------ -----------Jan 27 2002 7:00PM 10009 10009 200 1.3958309E-4 28.475372 All density -----------1.4271443E-4 9.9940036E-5 9.9910081E-5 RANGE_HI_KEY -----------Aaby Abrahamson . . . Zuran Zvonek Average Length -------------13.252672 24.475372 28.475372 RANGE_ROWS ---------0.0 59.0 13.0 0.0 Columns -------LastName LastName, FirstName LastName, FirstName, EmployeeID DISTINCT_RANGE_ROWS -------------------0 43 10 0 The report for SQL Server 2005 and later is slightly different at the beginning
Distribution steps
Statistics Maintenance
By default, SQL Server will automatically maintain the statistics Index statistics are computed (or recomputed) when the index is created or rebuilt SQL Server keeps track of the updates to a table Each INSERT, UPDATE, or DELETE statement updates a counter in sysindexes named rowmodctr Note that TRUNCATE TABLE does not modify this counter Whenever the statistics are recomputed, the counter is set back to zero When you issue a query, the optimizer checks rowmodctr to see whether the statistics are up to date If they are not, the statistics will be updated
Statistics Maintenance (continued)time in a production system Note that this may not always happen at the best
Can turn off automatic update Can manually update In SQL Server 2005 and later, can set the AUTO_UPDATE_STATISTICS_ASYNC database option
Table type Empty condition Threshold when empty Number of changes >= 500 Number of changes >= 6 Threshold when not empty Number of changes >= 500 + (Number of rows * 20%) Number of changes >= 500
Example: Assume that a table has 1,000 rows The threshold would be 500 + (.20 * 1000) You would expect to see the statistics automatically updated after about 700 modifications
Abby
Ada
Andy
Ann
Ada
Alan
Amanda
Amy
Estimating Page Accesses for a Nonclustered Index rows or Number of levels + number of qualifying leaf pages + number of
number of levels + number of qualifying leaf pages + (number of rows * number of clustered index levels) Number of qualifying leaf pages = number of Abby Bob Carol Dave Root qualifying rows / rows per page Assumes every row is on a different page
Abby Ada Andy Ann
Ada
Alan
Amanda
Amy
Leaf node
Amy
Ada
Amanda
Alan
Database
Covering Indexes
When a nonclustered index includes all the data requested in a query (both the items in the SELECT list and the WHERE clause), it is called a covering index With a covering index, there is no need to access the actual data pages Only the leaf nodes of the nonclustered index are accessed Leaf node of a nonclustered index on LastName, FirstName, Birthdate
Adams Douglas Mark Susan 1/14/1956 12/12/1947 3 4
Jones
Smith
John
Mary
4/15/1967
7/14/1970
1
2
The last column is EmployeeID. Remember that the clustering key is always included in a nonclustered index.
Because the leaf node of a clustered index is the data itself, a clustered index covers all queries