0% found this document useful (0 votes)
42 views

2. SQL Server Storage Architecture

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

2. SQL Server Storage Architecture

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Part I

SQL Server Storage Architecture


Objectives
 Database Physical Structure
 Filegroups
 Pages and Extents Architecture
 Managing Extent Allocations
 Managing space used by objects
 Transaction Log Logical Architecture
 Log Truncation and Log Chain
 Write-Ahead Transaction Log
 Checkpoint Operation

Pearson Education © 2015


Database Physical Structure
Database has two operating system files
 Data files contain data and objects such as tables,
indexes, stored procedures, and views.
o Primary : Contains startup information for the
database and points to the other files in the database
(.MDF)
o Secondary : Optional user-defined data files (.NDF)
 Log files contain the information that is required to
recover all transactions in the database.
Data File Pages
 Unique file ID number
 File header page is the first page that
contains information about the attributes of
the file.
 Other pages at the start of the file also
contain system information, such as
allocation maps.
 One of the system pages stored in both the
primary data file and the first log file is a
database boot page that contains
information about the attributes of the
database.
Filegroups
 Containing the primary data file and any
secondary files

 User-defined filegroups can be created to group


data files together for administrative, data
allocation, and placement purposes.
o Tables or Indexs are store on a filegroup.
o Queries for data from the table will be spread across
the three disks; it will improve performance.
o Files and filegroups let you easily add new files to
new disks
Filegroups
Filegroup Types
Filegroups

 PRIMARY filegroup is default filegroup


unless it is changed by using the ALTER
DATABASE statement.
 Allocation for system objects and tables
remains within PRIMARY filegroup, not new
default filegroup.
 Filegroups use a proportional fill strategy
across all the files within each filegroup.
Filegroups
Rules pertain to files and filegroups:

 A file or filegroup cannot be used by more


than one database.
 A file can be a member of only one
filegroup.
 Transaction log files are never part of any
filegroups.
Filestream Filegroups
Filestream filegroups are required for FILESTREAM data.
Instead of containing files, these filegroups point to folder
locations in the operating system, using Windows cache

 FILESTREAM must be enabled on the instance in order


to create a FILESTREAM filegroup.
 FILESTREAM is a technology, allows to store binary
data in an unstructured manner.
 Often stored in the operating system
 Allow to overcome SQL Server’s 2GB maximum size
limitation for a single object
 Performance improvement for large binary objects over
storing them in the database.
Memory Optimized Filegroup

The memory-optimized filegroup is based on


filestream filegroup.

Only create one memory-optimized filegroup


per database.

Do not need to enable filestream to create a


memory-optimized filegroup
Strategies for Structured Filegroup

 Performance Strategies

 Backup and Restore Strategies

 Storage-Tiering Strategies

 Strategies for Memory-Optimized


Filegroups
Pages and Extents Architecture

 Page is the fundamental unit of data storage.


 Extent is a collection of eight physically
contiguous pages.

Understanding the architecture of pages and


extents is important for designing and
developing databases that perform efficiently.
Pages
Data pages contain actual rows of data and
Text/image pages (text, ntext, image,
nvarchar(max), varchar(max), varbinary(max),
and xml data)
Index pages contain index references
System pages store variety of metadata about
the organization of the data

Data pages are the same size - 8 KB.


Log files do not contain pages; they contain a series
of log records.
Pearson Education © 2015
Data rows
 Row offset table help SQL
Server locate rows on a page
very quickly.
 Maximum size of single row
on a page is 8,060 bytes (not
include the data stored in the
Text/Image page type)
 Record-size limit for tables
that use sparse columns is
8,018 bytes.

Pearson Education © 2015


Data rows
Rows cannot span pages.

SQL Server dynamically moves one or more


variable length columns to pages in the
ROW_OVERFLOW_DATA allocation unit,
starting with the column with the largest width,
24-byte pointer on the original page in the IN_ROW_DATA
allocation unit is maintained.

If a subsequent operation reduces the row size,


SQL Server dynamically moves the columns
back to the original data page.
Data rows
Row-Overflow Considerations
 Querying and performing other select
operations, such as sorts or joins on large
records that contain row-overflow data
slows processing time
 Index key of a clustered index cannot
contain varchar columns that have existing
data in the ROW_OVERFLOW_DATA
allocation unit.

Consider normalizing the table so that some


columns are moved to another table.
Extents
 Extents are the basic unit in which space is
managed.
 An extent is eight physically contiguous
pages, or 64 KB.
 Two types of extents
o Uniform extents are owned by a single object
o Mixed extents are shared by up to eight objects.
Managing Extent Allocations
Two types of allocation maps to record the allocation of
extents
 Global Allocation Map (GAM)
o GAM pages record what extents have been allocated.
o each GAM covers 64,000 extents (4GB)
o 1-bit for each extent. Bit is 1 (free), bit is 0 (allocated).
o Shared Global Allocation Map (SGAM)
o SGAM pages record which extents are currently being used
as mixed extents and also have at least one unused page.
o Each SGAM covers 64,000 extents (4GB)
o 1-bit for each extent, bit is 1 (used as a mixed
extent and has a free page), bit is 0 not used as a
mixed extent, or all mix extent pages are being
used.
Tracking free space
After an extent has been allocated to an object
 Uses Page Free Space (PFS) pages record the
allocation status of each page.
 PFS has 1-byte for each page, recording whether
the page is allocated or empty (amount of free space in a
page is only maintained for heap and Text/Image pages)
 Additional PFS pages in subsequent 8,088 page
intervals.
 Additional GAM pages or SGAM in
subsequent 64,000 extent intervals
Managing space used by objects
An allocation unit is a logical unit of storage that consists of
one or more pages in SQL Server. There are different types
IN_ROW_DATA
Holds a partition of a heap or index.
LOB_DATA
Holds large object (LOB) data types, such as XML,
VARBINARY(max), and VARCHAR(max).
ROW_OVERFLOW_DATA
Holds variable length data stored in VARCHAR,
NVARCHAR, VARBINARY, or SQL_VARIANT columns
that exceed the 8,060 byte row size limit.
An Index Allocation Map (IAM) page maps the extents
in a 4-GB part of a database file used by an allocation
unit.
Managing space used by objects
A IAM page has a large bitmap, every bit represents
a extent, bit is 0 (extent is not allocated to the
allocation unit), bit is 1 (allocated).

The SQL Server Database Engine uses the IAM


pages to find the extents allocated to the allocation
unit.

For each extent, the SQL Server Database Engine


searches the PFS pages to see if there is a page that
can be used.
Managing space used by objects
Tracking Modified Extents
SQL Server uses two internal data structures to track
extents modified by bulk copy operations and extents
modified since the last full backup.
 Differential Changed Map (DCM)
o Tracks the extents that have changed since the
last BACKUP DATABASE statement.
o Differential backups read just the DCM pages to
determine which extents have been modified.

 Bulk Changed Map (BCM)


o tracks the extents that have been modified by bulk logged
operations since the last BACKUP LOG statement (only
bulk-logged recovery model).
Transaction Log
 Transaction log is a string of log records.
 Each log record is identified by a log sequence
number (LSN).
 Each new log record is written to the logical end of
the log.
 Each log record contains the ID of the transaction.
 All log records associated with the transaction are
linked in a chain using backward pointers that speed
the rollback of the transaction.
 Log records record the before and after images of the
modified data.
Transaction Log
Steps to recover an operation depend on the type of log
record :
Logical operation logged
To roll the logical operation forward, the operation is
performed again.
To roll the logical operation back, the reverse logical
operation is performed.
Before and after image logged
To roll the operation forward, the after image is applied.
To roll the operation back, the before image is applied.
Active part of the log, active log, or tail of the log is
section of log file from first log record to the last-written
log record that must be present for a successful database-
wide rollback.
Transaction Log Logical Architecture
 Transaction log in a database maps over one or
more physical files.
 Log file is a string of log records, at least one
log file for each database.
 SQL Server Database Engine divides each
physical log file internally into a number of
virtual log files (VLFs).
 Virtual log files have no fixed size, and there is
no fixed number of virtual log files for a
physical log file.
Transaction Log Logical Architecture
The transaction log is a wrap-around file.
Transaction Log Logical Architecture
If the end of the logical log does reach the start of
the logical log, one of two things occurs :
 If FILEGROWTH setting is enabled for the log and
space is available on the disk, the file is extended by
the amount specified in the growth_increment
parameter and the new log records are added to the
extension.
 If FILEGROWTH setting is not enabled, or the disk
that is holding the log file has less free space than the
amount specified in growth_increment, an 9002 error
is generated.
Log Truncation
 Log truncation deletes inactive virtual log files from
the logical transaction log, freeing space in the logical
log for reuse by the physical transaction log.
 Before the log can be truncated, a checkpoint
operation must occur.
 A checkpoint writes the current in-memory modified
pages (known as dirty pages) and transaction log
information from memory to disk. When the
checkpoint is performed, the inactive portion of the
transaction log is marked as reusable
Log Truncation
Before :

After :
Log Truncation
Log truncation occurs automatically after the
following events
 Under the simple recovery model, after a
checkpoint.
 Under the full recovery model or bulk-logged
recovery model, after a log backup, if a
checkpoint has occurred since the previous
backup.
Write-Ahead Transaction Log
 SQL Server reads data pages into buffer cache when data must
be retrieved.
 When a page is modified in the buffer cache, it is not
immediately written back to disk; instead, the page is marked
as dirty.
 A data page can have more than one logical write made before
it is physically written to disk.
 For each logical write, a transaction log record is inserted in the
log cache that records the modification.
 The log records must be written to disk before the associated
dirty page is removed from the buffer cache and written to disk.
 Writing a modified data page from the buffer cache to disk is
called flushing the page
 Log records are written to disk when the log buffers are flushed
The Log Chain
 A continuous sequence of log backups is called a log
chain. A log chain starts with a full backup of the
database.
 a new log chain is only started when the database is
backed up for the first time or after the recovery model
is switched from simple recovery to full or bulk-
logged recovery.
 To restore a database up to the point of failure, the log
chain must be intact. That is, an unbroken sequence of
transaction log backups must extend up to the point of
failure.
Checkpoints and the Active Portion of
the Log
 Checkpoints flush dirty data pages from the buffer
cache of the current database to disk. This minimizes
the active portion of the log that must be processed
during a full recovery of a database.
 During a full recovery, the following types of actions
are performed:
o The log records of modifications not flushed to disk before
the system stopped are rolled forward.
o All modifications associated with incomplete transactions,
such as transactions for which there is no COMMIT or
ROLLBACK log record, are rolled back.
Checkpoint Operation
A checkpoint performs the following processes in
the database:
 Writes a record to the log file, marking the start of the
checkpoint.
 Stores information recorded for the checkpoint in a chain of
checkpoint log records (include Minimum Recovery LSN).
 The checkpoint records also contain a list of all the active
transactions that have modified the database.
 If the database uses the simple recovery model, marks for reuse
the space that precedes the MinLSN.
 Writes all dirty log and data pages to disk.
 Writes a record marking the end of the checkpoint to the log
file.
 Writes the LSN of the start of this chain to the database boot
page.
Activities that cause a Checkpoint
 A CHECKPOINT statement is explicitly executed.
 A minimally logged operation is performed in the database.
 Database files have been added or removed by using ALTER
DATABASE.
 An instance of SQL Server is stopped by a SHUTDOWN
statement or by stopping the SQL Server (MSSQLSERVER)
service.
 An instance of SQL Server periodically generates automatic
checkpoints in each database to reduce the time that the
instance would take to recover the database.
 A database backup is taken.
 An activity requiring a database shutdown is performed.

You might also like