Fragmentation of Database Term Paper
Fragmentation of Database Term Paper
TERM PAPER
on
“FRAGMENTATION OF DATBASES”
1|Page
Introduction to fragmentation:
Fragmentation is a defined as any condition which causes more than the optional amount
of disk I/O to be performed in accessing a table, or causes the I/Os that are performed to
take longer than they optimally would. Optimal performance of SELECT queries occurs
when data pages are as contiguous as possible within the database, and the data pages are
packed as fully as possible
In short, we can say it is a database server feature that controls where data is stored at the
table level. It enables you to define groups of rows or index keys within a table according
to some algorithm or scheme.
Due to this, we can store each group or in a separate databases space associated with a
specific physical disk. The scheme that is used to group rows or index keys into fragments
is called the distribution scheme. The distribution scheme and the set of database spaces
in which you locate the fragments together make up the fragmentation strategy.
When fragmented tables and indexes are created, the database server stores the location of
each table and index fragment with other related information in the system catalog table
named system fragments.
The database server has information on which fragments contain which data, so it can route
client requests for data to the appropriate fragment without accessing irrelevant fragments.
The database server cannot route client requests for data to the appropriate fragment for
round-robin and some expression-based distribution schemes. For more information, see
Distribution Schemes for Table Fragmentation.
Types of fragmentation:
• Internal- Internal fragmentation occurs when the various SQL data structures are
not in order within a particular SQL data file. Internal fragmentation occurs when
the index pages are not being used to their maximum volume. While this may be an
advantage on an application with heavy data inserts, setting a fill factor causes
space to be left on index pages, severe internal fragmentation can lead to increased
index size and cause additional reads to be performed to return needed data. These
extra reads can lead to degradation in query performance. Internal fragmentation
2|Page
can be corrected by reindexing the database, something most of us have running
automagically.
• External- External fragmentation occurs when an index leaf page is not in logical
order. When an index is created, the index keys are placed in a logical order on a
set of index pages. As new data is inserted into the index, it is possible for the new
keys to be inserted in between existing keys. This may cause new index pages to be
created to accommodate any existing keys that were moved so that the new keys
can be inserted in correct order. These new index pages usually will not be
physically adjacent to the pages the moved keys were originally stored in. It is the
process of creating new pages that causes the index pages to be out of logical order.
An INSERT statement adds new data to the index. In this case we will add a 5. The
INSERT will cause a new page to be created and the 7 and 8 to be moved to the new page
in order to make room for the 5 on the original page. This creation will cause the index
pages to be out of logical order.
In cases of queries that have specific searches or that return unordered result sets, the index
pages being out of order do not pose a problem. For queries that return ordered result sets,
extra processing is needed to search the index pages that are not in order. An example of an
ordered result set would be a query that is returning everything from 4 to 10. This query
would have to complete an extra page switch in order to return the 7 and 8. While one extra
page switch is nothing in the long run, imagine this condition on a very large table with
hundreds of page out of order.
In short we can say, it occurs when the SQL data file itself gets scattered in multiple
sections over the physical disk. It is not so easy to fix. It needs to defragment the disk with
SQL Server offline so the data files sections can be moved around and strung back
together.
Rules to Avoid External Fragmentation:
1) Never Use Auto-Expand for data files. Note that I do not say disable auto-expand,
just never actually use it. Pre-expand your databases in large increments to handle
several months’ worth of growth at a time. Leave Auto-Growth as a “safety valve”
to keep unexpected allocations from killing your server, but don’t rely on it to
manage data file space allocations.
3|Page
2) Never Shrink data files. Shrinking inevitable leads to later growth. Allocate once
and leave it. Shrink also causes massive Internal fragmentation in a SQL database.
The shrink algorithm moves data to new locations but does not attempt to keep data
in contiguous segments. Auto-Shrink lets you do this on a regularly scheduled
basis. How helpful.
3) Keep SQL disks dedicated to SQL Data only. Creating and optionally destroying
many data files causes physical fragmentation on the disks. To go back to the
building analogy, most disks allocate new from new. That is instead of reusing old
rooms freed up from deleted files, the file system simply grabs the next unused
room (or rooms) after the last used one and keep moving. This is nice until the first
pass through the building is complete and stuff is all over the place.
Fragmentation brings us back to a fundamental dichotomy in SQL Server management.
A distribution scheme is a method that the database server( which means fragmentation)
uses to distribute rows or index entries to fragments. Informix database servers support the
following distribution schemes:
• Expression-based. This distribution scheme puts rows that contain specified values
in the same fragment. You specify a fragmentation expression that defines criteria
for assigning a set of rows to each fragment, either as a range rule or some arbitrary
rule. You can specify a remainder fragment that holds all rows that do not match
the criteria for any other fragment, although a remainder fragment reduces the
efficiency of the expression-based distribution scheme.
• Round-robin. This distribution scheme places rows one after another in fragments,
rotating through the series of fragments to distribute the rows evenly. The database
server defines the rule internally. For INSERT statements, the database server uses
a hash function on a random number to determine the fragment in which to place
the row. For INSERT cursors, the database server places the first row in a random
fragment, the second in the next sequential fragment, and so on. If one of the
fragments is full, it is skipped.
• Range distribution. This distribution scheme ensures that rows are fragmented
evenly across dbspaces. In range distribution, the database server determines the
distribution of rows among fragments based on minimum and maximum integer
values that the user specifies. Use a range distribution scheme when the data
distribution is both dense and uniform.
4|Page
• System-defined hash. This distribution scheme uses an internal, system-defined
rule that distributes rows with the objective of keeping the same number of rows in
each fragment.
• Hybrid. This distribution scheme combines two distribution schemes. The primary
distribution scheme chooses the dbslice. The secondary distribution scheme puts
rows in specific dbspaces within the dbslice. The dbspaces usually reside on
different coservers.
Reorganization usually requires the database to be down. The high cost of downtime
creates pressures both to perform and to delay preventive maintenance a familiar quandary
for DBAs. Third party tools are available that automate the manual process of reorganizing
tables, indexes, and entire table spaces eliminating the need for time- and resource-
consuming database rebuilds. In addition to automation, these type of tools typically can
analyze whether a reorganization is needed at all.
5|Page
fragmentation more data pages need to be traversed to fulfill a query request. But,
fragmentation is a manageable problem that is resolved by re-building the indexes to
reduce the fragmentation at the index or table level.
The tables that have high levels of fragmentation was a potentially time consuming process
in the SQL Server 2000 environment because DBCC SHOWCONTIG had to be issued
against the table or index.
Solution
SQL Server 2000 - DBCC SHOWCONTIG
In SQL Server 2005 DBCC SHOWCONTIG remains a viable short term option to
determine the database fragmentation.
Below outlines the DBCC SHOWCONTIG code to determine table level fragmentation for
a sample table and database:
This output indicates that the table is free of fragmentation based on the logical scan
fragmentation (0.00%), extent scan fragmentation (0.00%) and scan density statistics
(100%). This table should be considered healthy as it pertains to fragmentation because
the indexes have been recently rebuilt.
6|Page
to support the table.
Extent Switches The number of extents that were traversed.
Avg. Pages per Extent The average number of 8KB pages to support
the extents.
Scan Density [Best Count:Actual Count] The percentage of compactness for the
extents with an ultimate goal of 100 or the
closer the better.
Logical Scan Fragmentation The percentage of fragmentation of with an
ultimate goal of 0 or the closer the better.
Extent Scan Fragmentation The percentage of extent fragmentation of
with an ultimate goal of 0 or the closer the
better.
Avg. Bytes Free per Page The average bytes free per page.
Avg. Page Density The average compactness of the B-Tree to
support the table with an ultimate goal of 100
or the closer the better.
The reorganize process reorganizes the leaf nodes of the index physically to match it
with the logical order, this physical order matching the logical order improves the
performance of index scans.
7|Page
-- <object> -> Name of the object on which the index(es) exist(s)
-- partition_number -> can be specified only if the index_name is a
partitioned
--index and specifies the partition which you need to reorganize
-- LOB_COMPACTION -> Specifies that all pages that contain large object
(LOB)
--data are compacted. The default is ON.
• CREATE INDEX Command - One way is to simply drop the index using a DROP
INDEX statement followed by a CREATE INDEX statement. Though you can
combine these two commands by using the DROP_EXISTING clause of CREATE
INDEX command as given below in the script table. You can use the
DROP_EXISTING clause to rebuild the index, add or drop columns, modify
options, modify column sort order, or change the partition scheme or filegroup.
DROP_EXISTING enhances performance when you re-create a clustered index,
with either the same or different set of keys, on a table that also has non-clustered
indexes. DROP_EXISTING replaces the execution of a DROP INDEX statement
on the old clustered index followed by the execution of a CREATE INDEX
statement for the new clustered index. The benefit it gives, the non-clustered
indexes are rebuilt only once, and even then only if the index definition has
changed. With this command you can rebuild the index online.
8|Page
• Using ALTER INDEX command - This statement replaces the DBCC
DBREINDEX statement. The ALTER INDEX statement allows for the rebuilding
of the clustered and non-clustered indexes on the table. The drawback with this
statement is that you cannot change the index definition the way you can do with
the CREATE INDEX command. Though with this command you can rebuild the
index online. ALTER INDEX cannot be used to repartition an index or move it to a
different filegroup. This statement cannot be used to modify the index definition at
all, such as adding or deleting columns or changing the column order. Use
CREATE INDEX with the DROP_EXISTING clause to perform these operations
as stated above.
DBCC SHOWCONTIG
Database console command that displays fragmentation information for the data and
indexes of the specified table.
Permissions default to members of the sysadmin server role, the db_owner and
db_ddladmin database roles and the table owner and are not transferable.
9|Page
If tables are frequently changed via UPDATE and INSERT operations, having a small
amount of free space on the index or data pages (having a small amount of internal
fragmentation) will cause a new page addition (page split) in order to allocate that new
data. This leads ultimately to external fragmentation since the new added data page won't
be probably adjacent to the original page. Internal fragmentation, therefore, can be
desirable at low levels in order to avoid frequent page split, while external fragmentation,
however, should always be avoided. Please understand that by 'low levels' I simply mean
'low levels'. The amount of free space that can be reserved on a index can be controlled
using the Fill Factor.
SQL Server 2005 introduces a new DMV (Dynamic Management View) to check index
fragmentation levels: sys.dm_db_index_physical_stats. Although SQL Server 2005 still
supports the SQL Server 2000 DBCC SHOWCONTING command, this feature will be
removed on a future version of SQL Server. Here you can check the differences between
both instructions when checking for fragmentation on the HumanResources.Employee
table in the sample database AdventureWorks:
USE AdventureWorks;
GO
DBCC SHOWCONTIG ('HumanResources.Employee')
GO
USE AdventureWorks
GO
SELECT object_id, index_id, avg_fragmentation_in_percent, page_count
FROM sys.dm_db_index_physical_stats(DB_ID('AdventureWorks'),
OBJECT_ID('HumanResources.Employee'), NULL, NULL, NULL);
10 | P a g e
In this last example I have selected only relevant information to show from the DMV, you
will see that DMV can provide much more details about the index structure. In case you
wanted to show fragmentation details for all the objects in the AdventureWorks database,
the command would be as follows:
SELECT *
FROM sys.dm_db_index_physical_stats(DB_ID('AdventureWorks'), NULL, NULL,
NULL , NULL);
Please, refer to SQL Server 2005 Books Online for more information on
sys.dm_db_index_physical_stats syntax.
11 | P a g e
If you are looking for an easy way to automate these processes the SQL Server Books
Online reference for the sys.dm_db_index_physical_stats contains a sample script you can
implements within minutes. This script will take care of reorganizing any index where
avg_fragmentation_in_percent is below 30% and rebuilding any index where this values is
over 30% (you can change this parameters for your specific needs). Add a new SQL Server
Execute T-SQL statement task to your weekly or daily maintenance plan containing this
script so you can keep you database fragmentation at optimum level.
When you perform any data modification operations (INSERT, UPDATE, or DELETE
statements) table fragmentation can occur. When changes are made to the data that
affect the index, index fragmentation can occur and the information in the index can
get scattered in the database. Fragmented data can cause SQL Server to perform
unnecessary data reads, so a queries performance against a heavy fragmented table
can be very poor. If you want to determine the level of fragmentation, you can use the
DBCC SHOWCONTIG statement. The DBCC SHOWCONTIG statement displays
fragmentation information for the data and indexes of the specified table or view.
USE pubs
DECLARE @TableName sysname
DECLARE cur_showfragmentation CURSOR FOR
SELECT table_name FROM information_schema.tables WHERE table_type =
'base table'
OPEN cur_showfragmentation
FETCH NEXT FROM cur_showfragmentation INTO @TableName
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT 'Show fragmentation for the ' + @TableName + ' table'
DBCC SHOWCONTIG (@TableName)
FETCH NEXT FROM cur_showfragmentation INTO @TableName
END
CLOSE cur_showfragmentation
DEALLOCATE cur_showfragmentation
When you need to perform the same actions for all the tables in a database (when you
need to show the fragmentation of all the tables in a database, as in the script above),
you can create cursor for this purpose, or you can use the sp_MSforeachtable
undocumented system stored procedure to accomplish the same goal with less work.
The following script shows fragmentation of all the tables in the pubs database:
USE pubs
GO
EXEC sp_MSforeachtable @command1="print '?' DBCC SHOWCONTIG('?')"
GO
Keep in mind that the undocumented stored procedures could not be supported in the
future SQL Server versions. So, you can use the sp_MSforeachtable undocumented
12 | P a g e
system stored procedure at your own risk.
See this article to get more useful undocumented extended stored procedures:
Useful undocumented extended stored procedures
You can reduce fragmentation and improve read-ahead performance by using one of
the following:
• An exclusive table lock is put on the table, preventing any table access by your
users.
• A shared table lock is put on the table, preventing all but SELECT operations to
be performed on it.
When you create a clustered index, the table will be copied, the data in the table will
be sorted, and then the original table will be deleted. So, you should have enough
empty space to hold a copy of the data.
Rebuilding an Index
You can rebuild all the indexes on all the tables in your database periodically (for
example, one time per week at Sunday) to reduce fragmentation. The DBCC
13 | P a g e
DBREINDEX statement cannot automatically rebuild all of the indexes on all the
tables in a database it can only work on one table at a time. You can write your own
script to rebuild all the indexes on all the tables in a database or you can use the
script below (the ind_rebuild stored procedure rebuilds all indexes with a fillfactor of
80 in every table in the current database):
This script rebuilds all indexes with a fillfactor of 80 in every table in the pubs
database:
USE pubs
GO
EXEC sp_MSforeachtable @command1="print '?' DBCC DBREINDEX ('?', ' ',
80)"
GO
During rebuilding a clustered index, an exclusive table lock is put on the table,
preventing any table access by your users, and during rebuilding a nonclustered index
a shared table lock is put on the table, preventing all but SELECT operations to be
performed on it, you should schedule DBCC DBREINDEX statement during CPU idle
time and slow production periods.
Defragmenting an Index
14 | P a g e
queries or updates. As the time to defragment is related to the amount of
fragmentation, you can use the DBCC INDEXDEFRAG statement to reduce
fragmentation if the index is not very fragmented. For a very fragmented index,
rebuilding (using DBCC DBREINDEX statement) can take less time.
You can defragment all the indexes on all the tables in your database periodically (for
example, one time per week at Sunday) to reduce fragmentation. The DBCC
INDEXDEFRAG statement cannot automatically defragment all of the indexes on all
the tables in a database; it can only work on one table and one index at a time. You
can use the script below to defragment all indexes in every table in the pubs
database:
USE pubs
DECLARE @TableName sysname
DECLARE @indid int
DECLARE cur_tblfetch CURSOR FOR
SELECT table_name FROM information_schema.tables WHERE table_type =
'base table'
OPEN cur_tblfetch
FETCH NEXT FROM cur_tblfetch INTO @TableName
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE cur_indfetch CURSOR FOR
SELECT indid FROM sysindexes WHERE id = OBJECT_ID (@TableName) and
keycnt > 0
OPEN cur_indfetch
FETCH NEXT FROM cur_indfetch INTO @indid
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT 'Derfagmenting index_id = ' + convert(char(3), @indid) + 'of
the '
+ rtrim(@TableName) + ' table'
IF @indid <> 255 DBCC INDEXDEFRAG (pubs, @TableName, @indid)
FETCH NEXT FROM cur_indfetch INTO @indid
END
CLOSE cur_indfetch
DEALLOCATE cur_indfetch
FETCH NEXT FROM cur_tblfetch INTO @TableName
END
CLOSE cur_tblfetch
DEALLOCATE cur_tblfetch
15 | P a g e
• Improved performance — data is located near the site of greatest demand, and the
database systems themselves are parallelized, allowing load on the databases to be
balanced among servers. (A high load on one module of the database won't affect
other modules of the database in a distributed database.)
• Economics — it costs less to create a network of smaller computers with the power
of a single large computer.
• Modularity — systems can be modified, added and removed from the distributed
database without affecting other modules (systems).
• Reliable transactions - Due to replication of database.
• Hardware, Operating System, Network, Fragmentation, DBMS, Replication and
Location Independence.
• Continuous operation.
• Single site failure does not affect performance of system. All transactions follow
A.C.I.D. property: a-atomicity, the transaction takes place as whole or not at all; c-
consistency, maps one consistent DB state to another; i-isolation, each transaction
sees a consistent DB; d-durability, the results of a transaction must survive system
failures. The Merge Replication Method used to consolidate the data between
databases.
• Complexity — extra work must be done by the DBAs to ensure that the distributed
nature of the system is transparent. Extra work must also be done to maintain
multiple disparate systems, instead of one big one. Extra database design work
must also be done to account for the disconnected nature of the database — for
example, joins become prohibitively expensive when performed across multiple
systems.
• Economics — increased complexity and a more extensive infrastructure means
extra labour costs.
• Security — remote database fragments must be secured, and they are not
centralized so the remote sites must be secured as well. The infrastructure must also
be secured (e.g., by encrypting the network links between remote sites).
• Difficult to maintain integrity — in a distributed database, enforcing integrity over
a network may require too much of the network's resources to be feasible.
• Inexperience — distributed databases are difficult to work with, and as a young
field there is not much readily available experience on proper practice.
• Lack of standards – there are no tools or methodologies yet to help users convert a
centralized DBMS into a distributed DBMS.
• Database design more complex – besides of the normal difficulties, the design of a
distributed database has to consider fragmentation of data, allocation of fragments
to specific sites and data replication.
• Additional software is required.
• Operating System should support distributed environment.
• Concurrency control: it is a major issue. It is solved by locking and timestamping.
16 | P a g e