Windows Server 2008 Performance Tuning Manual.
Windows Server 2008 Performance Tuning Manual.
Abstract
This guide describes important tuning parameters and settings that can result in
improved performance for the Windows Server® 2008 operating system. Each setting
and its potential effect are described to help you make an informed judgment about
its relevance to your system, workload, and performance goals.
This information applies to the Windows Server 2008 operating system.
The current version of this guide is maintained on the Web at:
http://www.microsoft.com/whdc/system/sysperf/Perf_tun_srv.mspx
Feedback: Please tell us if this paper was useful to you. Submit comments at:
http://go.microsoft.com/fwlink/?LinkId=102585
References and resources discussed here are listed at the end of this guide.
Performance Tuning Guidelines for Windows Server 2008 - 2
Disclaimer: This is a preliminary document and may be changed substantially prior to final commercial
release of the software described herein.
The information contained in this document represents the current view of Microsoft Corporation on the
issues discussed as of the date of publication. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot
guarantee the accuracy of any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under
copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or
transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or
for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights
covering subject matter in this document. Except as expressly provided in any written license agreement
from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks,
copyrights, or other intellectual property.
Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses,
logos, people, places and events depicted herein are fictitious, and no association with any real company,
organization, product, domain name, email address, logo, person, place or event is intended or should be
inferred.
Microsoft, Active Directory, MS-DOS, MSDN, SQL Server, Win-32, Windows, and Windows Server are
either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other
countries.
The names of actual companies and products mentioned herein may be the trademarks of their respective
owners.
Document History
Date Change
May 27, 2008 Added “Power Guidelines” under Server Hardware section and added
“Performance Tuning for Virtualization Servers” section.
October 16, 2007 Added “Performance Tuning for Terminal Server” and “Performance
Tuning for Terminal Server Gateway” sections.
August 31, 2007 First publication
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 3
Contents
Introduction .................................................................................................................... 6
In This Guide ................................................................................................................... 6
Performance Tuning for Server Hardware ..................................................................... 7
Power Guidelines ....................................................................................................... 9
Interrupt Affinity ...................................................................................................... 10
Performance Tuning for Networking Subsystem ......................................................... 10
Choosing a Network Adapter ................................................................................... 12
Offload Capabilities .............................................................................................. 12
Receive-Side Scaling (RSS).................................................................................... 12
Message-Signaled Interrupts (MSI/MSI-X) .......................................................... 12
Network Adapter Resources ................................................................................ 13
Interrupt Moderation .......................................................................................... 13
Tuning the Network Adapter ................................................................................... 13
Enabling Offload Features.................................................................................... 13
Increasing Network Adapter Resources .............................................................. 14
Enabling Interrupt Moderation............................................................................ 14
Binding Each Adapter to a CPU ............................................................................ 14
TCP Receive Window Auto-Tuning ........................................................................... 14
TCP Parameters ........................................................................................................ 15
Network-Related Performance Counters................................................................. 15
Performance Tuning for Storage Subsystem ................................................................ 16
Choosing Storage...................................................................................................... 17
Estimating the Amount of Data to Be Stored ...................................................... 18
Choosing a Storage Array Selection ..................................................................... 19
Hardware RAID Levels .......................................................................................... 19
Choosing the RAID Level ...................................................................................... 22
Selecting a Stripe Unit Size .................................................................................. 26
Determining the Volume Layout.......................................................................... 26
Storage-Related Parameters .................................................................................... 27
NumberOfRequests ............................................................................................. 27
I/O Priorities ......................................................................................................... 27
Storage-Related Performance Counters .................................................................. 28
Logical Disk and Physical Disk .............................................................................. 28
Processor.............................................................................................................. 29
Power Protection and Advanced Performance Option ....................................... 30
Block Alignment (DISKPART) ................................................................................ 30
Solid-State and Hybrid Drives .............................................................................. 30
Response Times ................................................................................................... 31
Queue Lengths ..................................................................................................... 32
Performance Tuning for Web Servers .......................................................................... 33
Selecting the Proper Hardware for Performance .................................................... 33
Operating System Practices ..................................................................................... 33
Tuning IIS 7.0 ............................................................................................................ 33
Kernel-Mode Tunings ............................................................................................... 35
Cache Management Settings ............................................................................... 35
Request and Connection Management Settings ................................................. 36
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 4
User-Mode Settings.................................................................................................. 37
User-Mode Cache Behavior Settings ................................................................... 37
Compression Behavior Settings ........................................................................... 37
Tuning the Default Document List ....................................................................... 38
Central Binary Logging ......................................................................................... 39
Application and Site Tunings................................................................................ 40
Managing IIS 7.0 Modules.................................................................................... 41
Classic ASP Settings .............................................................................................. 41
ASP.NET Concurrency Setting .............................................................................. 42
Worker Process and Recycling Options ............................................................... 42
Secure Sockets Layer Tuning Parameters ............................................................ 43
ISAPI ..................................................................................................................... 43
Managed Code Tuning Guidelines ....................................................................... 43
Other Issues that Affect IIS Performance ............................................................ 44
NTFS File System Setting ...................................................................................... 44
Networking Subsystem Performance Settings for IIS .......................................... 44
Performance Tuning for File Servers ............................................................................ 44
Selecting the Proper Hardware for Performance .................................................... 44
Server Message Block Model ................................................................................... 45
Configuration Considerations .................................................................................. 45
General Tuning Parameters for Servers ................................................................... 46
General Tuning Parameters for Client Computers ................................................... 47
Performance Tuning for Active Directory Servers ........................................................ 47
Considerations for Read-Heavy Scenarios ............................................................... 48
Considerations for Write-Heavy Scenarios .............................................................. 49
Using Indexing to Increase Query Performance ...................................................... 49
Optimizing Trust Paths ............................................................................................. 49
Active Directory Performance Counters .................................................................. 50
Performance Tuning for Terminal Server ..................................................................... 51
Selecting the Proper Hardware for Performance .................................................... 51
CPU Configuration................................................................................................ 51
Processor Architecture ........................................................................................ 51
Memory Configuration ........................................................................................ 52
Disk....................................................................................................................... 52
Network ............................................................................................................... 53
Tuning Applications for Terminal Server .................................................................. 53
Terminal Server Tuning Parameters......................................................................... 54
Pagefile ................................................................................................................ 54
Antivirus and Antispyware ................................................................................... 54
Task Scheduler ..................................................................................................... 54
Desktop Notification Icons................................................................................... 55
Client Experience Settings ................................................................................... 56
Desktop Size ............................................................................................................. 57
Windows System Resource Manager....................................................................... 57
Performance Tuning for Terminal Server Gateway ...................................................... 57
Monitoring and Data Collection ............................................................................... 58
Performance Tuning for Virtualization Servers ............................................................ 58
Terminology ............................................................................................................. 59
Hyper-V Architecture ............................................................................................... 60
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 5
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 6
Introduction
Windows Server® 2008 should perform very well out of the box for most customer
workloads. Optimal out-of-the-box performance was a major goal for this release and
influenced how Microsoft designed a new, dynamically tuned networking subsystem
that incorporates both IPv4 and IPv6 protocols and improved file sharing through
Server Message Block (SMB) 2.0. However, you can further tune the server settings
and obtain incremental performance gains, especially when the nature of the
workload varies little over time.
The most effective tuning changes consider the hardware, the workload, and the
performance goals. This guide describes important tuning considerations and settings
that can result in improved performance. Each setting and its potential effect are
described to help you make an informed judgment about its relevance to your
system, workload, and performance goals.
Note: Registry settings and tuning parameters have changed significantly from
Windows Server 2003 to Windows Server 2008. Remember this as you tune your
server—using earlier or out-of-date tuning guidelines might produce unexpected
results.
As always, be careful when you directly manipulate the registry. If you must edit the
registry, back it up first.
In This Guide
This guide contains key performance recommendations for the following
components:
• Server Hardware
• Networking Subsystem
• Storage Subsystem
This guide also contains performance tuning considerations for the following server
roles:
• Web Servers
• File Servers
• Active Directory Servers
• Terminal Servers
• Terminal Server Gateway
• Virtualization Server (Hyper-V)
• File Server Workload
• Networking Workload
• Terminal Server Knowledge Worker Workload
• SAP Sales and Distribution Two-Tier Workload
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 7
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 8
Component Recommendation
Disks Higher rotational speeds reduce random request service times (~2 ms on
average when you compare 7,200- and 15,000-RPM drives) and increase
sequential request bandwidth.
The latest generation of 2.5-inch enterprise-class disks can service a
significantly larger number of random requests per second compared to 3.5-
inch drives.
Store “hot” data near the “beginning” of a disk because this corresponds to the
outermost (fastest) tracks.
Consolidating small drives into fewer high-capacity drives can easily reduce
overall storage performance. Fewer spindles mean reduced request service
concurrency and therefore potentially lower throughput and longer response
times (depending on the workload intensity).
Table 2 lists the recommended settings for choosing networking and storage adapters
in a high-performing server environment. These settings can help keep your
networking or storage hardware from being the bottleneck when they are under
heavy load.
Table 2. Networking and Storage Adapter Recommendations
Recommen- Description
dation
WHQL certified The adapter has passed the Windows® Hardware Quality Labs (WHQL)
certification test suite.
64-bit capability Adapters that are 64-bit capable can perform direct memory access
(DMA) operations to and from high physical memory locations (greater
than 4 GB). If the driver does not support DMA greater than 4 GB, the
system double-buffers the I/O to a physical address space of less than
4 GB.
Copper and fiber Copper adapters generally have the same performance as their fiber
(glass) adapters counterparts, and both copper and fiber are available on some Fibre
Channel adapters. Certain environments are better suited to copper
adapters, whereas other environments are better suited to fiber
adapters.
Dual- or quad- Multiport adapters are useful for servers that have limited PCI slots.
port adapters To address SCSI limitations on the number of disks that can be
connected to a SCSI bus, some adapters provide two or four SCSI buses
on a single adapter card. Fibre Channel disks generally have no limits to
the number of disks that are connected to an adapter unless they are
hidden behind a SCSI interface.
Serial Attached SCSI (SAS) and Serial ATA (SATA) adapters also have a
limited number of connections because of the serial nature of the
protocols, but more attached disks are possible by using switches.
Network adapters have this feature for load-balancing or failover
scenarios. Using two single-port network adapters usually yields better
performance than using a single dual-port network adapter for the same
workload.
PCI bus limitation can be a major factor in limiting performance for
multiport adapters. Therefore, it is important to consider placing them
in a high-performing PCI slot that provides enough bandwidth.
Generally, PCI-E adapters provide more bandwidth than PCI-X adapters.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 9
Recommen- Description
dation
Interrupt Some adapters can moderate how frequently they interrupt the host
moderation processors to indicate activity (or its completion). Moderating interrupts
can often result in reduced CPU load on the host but, unless interrupt
moderation is performed intelligently, the CPU savings might increase
latency.
Offload capability Offload-capable adapters offer CPU savings that translate into improved
and other performance. For more information, see “Choosing a Network Adapter”
advanced later in this guide.
features such as
message-signaled
interrupt (MSI)-X
Dynamic interrupt Windows Server 2008 has new functionality that enables PCI-E storage
and deferred adapters to dynamically redirect interrupts and DPCs. This capability,
procedure call originally called “NUMA I/O,” can help any multiprocessor system by
(DPC) redirection improving workload partitioning, cache hit rates, and on-board
hardware interconnect usage for I/O-intensive workloads. At Windows
Server 2008 RTM, no adapters on the market had this capability, but
several manufacturers were developing adapters to take advantage of
this performance feature.
Power Guidelines
Although this guide focuses on how to obtain the best performance from Windows
Server 2008, the increasing importance of power efficiency must also be recognized
in enterprise and data center environments. High performance and low power usage
are often conflicting goals, but by carefully selecting server components you can
determine the correct balance between them. Table 3 contains guidelines for power
characteristics and capabilities of server hardware components.
Table 3. Server Hardware Power Savings Recommendations
Component Recommendation
Processors Higher frequencies in a specific processor family cause increased power
consumption when the processors are under heavy load. Also, processor
families usually include low-power versions. Newer generations of processors
expose more power states for the Windows power management algorithms,
which enables better power management at all levels of performance.
Memory Memory consumes an increasing part of system power. Many factors affect
(RAM) the power consumption of a memory “stick” such as memory technology,
error correction code (ECC), frequency, capacity, density, and number of
ranks. Therefore, it is best to compare expected power consumption ratings
before purchasing large quantities of memory. Low-power (“green”) memory
is now available, but a performance or monetary trade-off must be
considered. If paging is required, then the power cost of the paging disks
should also be considered.
Disks Higher RPM means increased power consumption. Also, new 2.5-inch drives
consume less than half the power of older 3.5-inch drives. More information
about the power cost for different RAID configurations is found in
“Performance Tuning for Storage Subsystem” later in this guide.
Network Some adapters decrease power consumption during idle periods. This
and storage becomes a more important consideration for 10-Gb networking and high-
adapters bandwidth storage links.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 10
The default power plan for Windows Server 2008 is Balanced, which tries to keep
performance high while it saves power whenever possible. The other predefined
plans are Low Power and High Performance, both of which are heavily weighted to
different goals. But server BIOS settings can prevent Windows from accomplishing
any of these goals, so make sure that you check whether power management by the
operating system or by the hardware is a BIOS option. Windows Server performance
lab tests show that Windows power management works very well when it is
compared to hardware-managed power management on enterprise servers, so the
operating system–managed setting is preferred. However, the most important
guideline is to make sure that the BIOS settings on a specific server are well
understood so that the administrator knows if the Windows power setting controls
(including the High Performance plan) are actually usable.
Interrupt Affinity
Interrupt affinity refers to the binding of interrupts from a specific device to one or
more specific processors in a multiprocessor server. The binding forces interrupt
processing to run on the specified processor or processors, unless the device specifies
otherwise. For some scenarios, such as a file server, the network connections and file
server sessions remain on the same network adapter. In those scenarios, binding
interrupts from the network adapter to a processor allows for processing incoming
packets (SMB requests and data) on a specific set of processors, which improves
locality and scalability.
The Interrupt-Affinity Filter tool (IntFiltr) lets you change the CPU affinity of the
interrupt service routine (ISR). The tool runs on most servers that run Windows
Server 2008, regardless of what processor or interrupt controller is used. However,
on some systems with more than eight logical processors or for devices that use MSI
or MSI-X, the tool is limited by the Advanced Programmable Interrupt Controller
(APIC) protocol. The Interrupt-Affinity Policy tool does not encounter this issue
because it sets the CPU affinity through the affinity policy of a device.
You can use this tool to direct any device's ISR to a specific processor or set of
processors (instead of sending interrupts to any of the CPUs in the system). Note that
different devices can have different interrupt affinity settings. For IntFiltr to work on
some systems, you must set the MAXPROCSPERCLUSTER=0 boot parameter. On some
systems, directing the ISR to a processor on a different nonuniform memory access
(NUMA) node can cause performance issues.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 11
User-Mode
Applications WMS DNS IIS
System
Drivers AFD.SY HTTP.SYS
Protocol
Stack TCP/IP UDP/IP VPN
NDIS
NDIS
Network
Interface NIC Driver
The network architecture is layered, and the layers can be broadly divided into the
following sections:
• The network driver and Network Driver Interface Specification (NDIS).
These are the lowest layers. NDIS exposes interfaces for the driver below it and
for the layers above it such as TCP/IP.
• The protocol stack.
This implements protocols such as TCP/IP and UDP/IP. These layers expose the
transport layer interface for layers above them.
• System drivers.
These are typically transport data interface extension (TDX) or Winsock Kernel
(WSK) clients and expose interfaces to user-mode applications. The WSK interface
is a new feature for Windows Server 2008 and Windows Vista® that is exposed by
Afd.sys. The interface improves performance by eliminating the switching
between user mode and kernel modes.
• User-mode applications.
These are typically Microsoft solutions or custom applications.
Tuning for network-intensive workloads can involve each layer. The following
sections describe some tuning changes.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 12
Offload Capabilities
Offloading tasks can help reduce CPU usage on the server, which improves overall
system performance. The Microsoft networking stack can offload one or more tasks
to a network adapter that has the appropriate task-offload capabilities. Table 4
provides more details about each offload.
Table 4. Offload Capabilities for Network Adapters
Offload type Description
Checksum The networking stack can offload the calculation and validation of both
calculation Transmission Control Protocol (TCP) and User Datagram Protocol (UDP)
checksums on sends and receives. It can also offload the calculation
and validation of both IPv4 and IPv6 checksums on sends and receives.
IP security The TCP/IP transport can offload the calculation and validation of
authentication and encrypted checksums for authentication headers and Encapsulating
encryption Security Payloads (ESPs). The TCP/IP transport can also offload the
encryption and decryption of ESPs.
Segmentation of The TCP/IP transport supports Giant Send Offload (GSO). With GSO,
large TCP packets also known as LSOv2, the TCP/IP transport can offload the
segmentation of large TCP packets.
TCP stack The TCP offload engine (TOE) enables a network adapter that has the
appropriate capabilities to offload the entire network stack.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 13
Interrupt Moderation
To control interrupt moderation, some network adapters expose either different
interrupt moderation levels, buffer coalescing parameters (sometimes separately for
send and receive buffers), or both. You should consider buffer coalescing or batching
when the network adapter does not perform interrupt moderation.
Table 5 provides a guideline of which high-performance features improve
performance in terms of throughput, latency, or scalability for some server roles.
Table 5. Benefits from Network Adapter Features for Different Server Roles
Server role Checksum Segmentation TCP offload Receive-side
offload offload engine (TOE) scaling (RSS)
File server X X X
Web server X X X X
Mail server (short- X X
lived connections)
Database server X X X X
FTP server X X X
Media server X X X
Disclaimer: The recommendations in Table 5 are intended to serve as guidance only
for choosing the most suitable technology for specific server roles under a
deterministic traffic pattern. User experience can be different, depending on
workload characteristics and the hardware that is used.
If your hardware supports TOE, then you must enable that option in the operating
system to benefit from the hardware’s capability. You can enable TOE by running the
following:
netsh int tcp set global chimney = enabled
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 14
be a limitation, you should enable offload capabilities even for such network
adapters. Note that some network adapters require offload features to be
independently enabled for send and receive paths.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 15
The remote file copy is a common network usage scenario that is likely to increase
demand on the infrastructure because of this change. Many improvements have
been made to the underlying operating system support for remote file copy that now
let large file copies perform at disk I/O speeds. If many concurrent remote large file
copies are typical within your network environment, your network infrastructure
might be taxed by the significant increase in network usage by each file copy
operation.
TCP Parameters
The following keywords, which for Windows Server 2003 were added in the registry,
are no longer supported and therefore are ignored for Windows Server 2008:
• TcpWindowSize
HKLM\System\CurrentControlSet\Services\Tcpip\Parameters
• NumTcbTablePartitions
HKLM\system\CurrentControlSet\Services\Tcpip\Parameters
• MaxHashTableSize
HKLM\system\CurrentControlSet\Services\Tcpip\Parameters
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 16
Processor
• Percent of processor time.
• Interrupts per second.
• DPCs queued per second.
This counter is an average rate at which DPCs were added to the processor's
DPC queue. Each processor has its own DPC queue. This counter measures
the rate that DPCs are added to the queue, not the number of DPCs in the
queue. It displays the difference between the values that were observed in
the last two samples, divided by the duration of the sample interval.
TCPv4
• Connection failures.
• Segments sent per second.
• Segments received per second.
• Segments retransmitted per second.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 17
Figure 2 shows the storage architecture, which covers many components in the driver
stack. The layered driver model in Windows sacrifices some performance for
maintainability and ease of use (in terms of incorporating drivers of varying types into
the stack). The following sections discuss tuning guidelines for storage workloads.
File System
Drivers NTFS FASTFAT
Volume Snapshot and
Management Drivers VolSnap VOLMGR VOLMGRX
Partition and
Class Drivers PartMgr ClassPNP DISK
Port Driver
SCSIPORT STORPORT ATAPORT
Adapter
Interface Miniport Driver
Figure 2. Storage Driver Stack
Choosing Storage
The most important considerations in choosing storage systems include the following:
• Understanding the characteristics for current and future storage workloads.
• Understanding that application behavior is essential for both storage subsystem
planning and performance analysis.
• Providing necessary storage space, bandwidth, and latency characteristics for
current and future needs.
• Selecting a data layout scheme (such as striping), redundancy architecture (such
as mirroring), and backup strategy.
• Using a procedure that provides the required performance and data recovery
capabilities.
• Using power guidelines, that is, calculating the expected power consumption in
total and per-unit volume (such as watts per rack).
When they are compared to 3.5-inch disks, 2.5-inch disks have greatly reduced
power consumption but they also are packed more tightly into racks or servers.
Note that spinning up disk drives increases power usage, so power-sensitive
environments should use arrays that spin up their drives in a staged manner.
The better you understand the workloads on the system, the more accurately you
can plan. The following are some important workload characteristics:
• Read:write ratio.
• Sequential/random (temporal and spatial locality).
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 18
• Request sizes.
• Interarrival rates, burstiness, and concurrency (patterns of request arrival rates).
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 19
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 20
Option Description
Spanning This is also not a RAID level, but instead is the simple concatenation of
multiple physical disks into a single logical disk. Each disk contains a set of
sequential logical blocks. Spanning has the same performance and reliability
characteristics as JBOD.
RAID 0 RAID 0 is a data layout scheme in which sequential logical blocks of a
(striping) prechosen size (the stripe unit) are laid out in a round-robin manner across
multiple disks. It presents a logical disk that stripes disk accesses over a set
of physical disks.
For most workloads, a striped data layout provides better performance than
JBOD if the stripe unit is appropriately selected based on server workload
and storage hardware characteristics. The overall storage load is balanced
across all physical drives.
This is the least expensive RAID configuration because all the disk capacity is
available for storing the single copy of data.
Because no capacity is allocated for redundant data, RAID 0 does not provide
data recovery mechanisms such as those in RAID 1 and RAID 5. Also, the loss
of any disk results in data loss on a larger scale than JBOD because the entire
file system spread across n physical disks is disrupted; every nth block of
data in the file system is missing.
RAID 1 RAID 1 is a data layout scheme in which each logical block exists on at least
(mirroring) two physical disks. It presents a logical disk that consists of a disk mirror pair.
RAID 1 often has worse bandwidth and latency for write operations
compared to RAID 0 (or JBOD) This is because data must be written to two or
more physical disks. Request latency is based on the slowest of the two (or
more) write operations that are necessary to update all copies of the
affected data blocks.
RAID 1 can provide faster read operations than RAID 0 because it can read
from the least busy physical disk from the mirrored pair.
RAID 1 is the most expensive RAID scheme in terms of physical disks because
half (or more) of the disk capacity stores redundant data copies. RAID 1 can
survive the loss of any single physical disk. In larger configurations it can
survive multiple disk failures, if the failures do not involve all the disks of a
specific mirrored disk set.
RAID 1 is the fastest ordinary RAID level for recovery time after a physical
disk failure. Only a single disk (the other part of the broken mirror pair)
brings up the replacement drive. Note that the second disk is typically still
available to service data requests throughout the rebuilding process.
RAID 0+1 The combination of striping and mirroring provides the performance
(striped benefits of RAID 0 and the redundancy benefits of RAID 1.
mirrors) This option is also known as RAID 1+0 and RAID 10.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 21
Option Description
RAID 5 RAID 5 presents a logical disk composed of multiple physical disks that have
(rotated data striped across the disks in sequential blocks (stripe units). However, the
parity) underlying physical disks have parity information scattered throughout the
disk array, as Figure 3 shows.
For read requests, RAID 5 has characteristics that resemble those of RAID 0.
However, small RAID 5 writes are much slower than those of JBOD or RAID 0
because each parity block that corresponds to the modified data block
requires three additional disk requests. Because four physical disk requests
are generated for every logical write, bandwidth is reduced by
approximately 75%.
RAID 5 provides data recovery capabilities because data can be
reconstructed from the parity. RAID 5 can survive the loss of any one
physical disk, as opposed to RAID 1, which can survive the loss of multiple
disks as long as an entire mirrored set is not lost.
RAID 5 requires additional time to recover from a lost physical disk
compared to RAID 1 because the data and parity from the failed disk can be
re-created only by reading all the other disks in their entirety. Performance
during the rebuilding period is severely reduced due only to the rebuilding
traffic but also because the reads and writes that target the data that was
stored on the failed disk must read all disks (an entire “stripe”) to re-create
the missing data.
RAID 5 is less expensive than RAID 1 because it requires only an additional
single disk per array, instead of double the total amount of disks in an array.
Power guidelines: RAID 5 has a significant power advantage over mirroring,
simply because it uses fewer drives.
RAID 6 RAID 6 is basically RAID 5 with additional redundancy built in. Instead of a
(double- single block of parity per stripe of data, two blocks of redundancy are
rotated included. The second block uses a different redundancy code (instead of
redundancy) parity), which enables data to be reconstructed after the loss of any two
disks. Or, disks can be arranged in a two-dimensional matrix, and both
vertical and horizontal parity can be maintained.
Power guidelines: RAID 6 has a significant power advantage over mirroring,
simply because it uses fewer drives.
Rotated redundancy schemes (such as RAID 5 and 6) are the most difficult to
understand and plan for. Figure 3 shows RAID 5.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 22
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 23
Cons:
• Writes that must
update all
mirrors.
RAID 0+1 Pros: Pros: Pros: Pros:
(striped • Two data sources • Single loss and • Single loss and • Twice the cost of
mirrors) for every read often multiple often multiple RAID 0 or JBOD.
request (up to losses (in large losses (in large • Four-disk
100% configurations) configurations) minimum.
performance that are survivable. that do not • Maximum power.
improvement). prevent access.
• Balanced load.
• Potential for
better response
times,
throughput, and
concurrency.
Cons:
• Writes that must
update mirrors.
• Difficult stripe
unit size choice.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 24
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 25
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 26
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 27
Storage-Related Parameters
On Windows Server 2008, you can adjust the following registry parameter for high-
throughput scenarios.
NumberOfRequests
This driver/device-specific parameter is passed to a miniport when it is initialized. A
higher value might improve performance and enable Windows to give more disk
requests to a logical disk, which is most useful for hardware RAID adapters that have
concurrency capabilities. This value is typically set by the driver when it is installed,
but you can set it manually through the following registry entry:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services
\MINIPORT_ADAPTER\Parameters\DeviceN\NumberOfRequests (REG_DWORD)
Replace MINIPORT_ADAPTER with the specific adapter name. Make an entry for each
device, and in each entry replace DeviceN with Device1, Device2, and so on,
depending on the number of devices that you are adding. For this setting to take
effect, a reboot is sometimes required. But for Storport miniports, only the adapters
must be “rebooted” (that is, disabled and re-enabled). For example, for two Emulex
miniport adapters whose miniport driver name is lp6nds35, you would create the
following registry entries set to 96:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\lp6nds35\Parameters
\Device0\NumberOfRequests
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\lp6nds35\Parameters
\Device1\NumberOfRequests
The following parameters do not apply to Windows Server 2008:
CountOperations
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Session Manager\I/O System\
DontVerifyRandomDrivers
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Session Manager
\Memory Management\
I/O Priorities
Windows Server 2008 can specify an internal priority level on individual
I/Os. Windows primarily uses this ability to de-prioritize background I/O activity and
to give precedence to response-sensitive I/Os (such as, multimedia). However,
extensions to file system APIs let applications specify /IO priorities per handle. The
storage stack code to sort out and manage I/O priorities has overhead, so if some
disks will be targeted only by a single priority of I/Os (such as a SQL database disk),
you can improve performance by disabling the I/O priority management for those
disks by setting the following registry entry to zero:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\DeviceClasses
\{Device GUID}\DeviceParameters\Classpnp\IdlePrioritySupported
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 28
• Average Disk Queue Length, Average Disk { Read | Write } Queue Length
These counters collect concurrency data, including burstiness and peak loads.
Guidelines for queue lengths are given later in this guide. These counters
represent the number of requests in flight below the driver that takes the
statistics. This means that the requests are not necessarily queued but could
actually be in service or completed and on the way back up the path. Possible in-
flight locations include the following:
• Waiting in an ATAport, SCSIPort, or Storport queue.
• Waiting in a queue in a miniport driver.
• Waiting in a disk controller queue.
• Waiting in an array controller queue.
• Waiting in a hard disk queue (that is, on board a physical disk).
• Actively receiving service from a physical disk.
• Completed, but not yet back up the stack to where the statistics are
collected.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 29
Processor
• % DPC Time, % Interrupt Time, % Privileged Time
If interrupt time and DPC time are a large part of privileged time, the kernel is
spending a long time processing I/Os. Sometimes, it is best to keep interrupts and
DPCs affinitized to only a few CPUs on a multiprocessor system, to improve cache
locality. Other times, it is best to distribute the interrupts and DPCs among many
CPUs to prevent the interrupt and DPC activity from becoming a bottleneck.
• DPCs Queued / second
This counter is another measurement of how DPCs are using CPU time and kernel
resources.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 30
• Interrupts / second
This counter is another measurement of how interrupts are using CPU time and
kernel resources. Modern disk controllers often combine or coalesce interrupts
so that a single interrupt causes the processing of multiple I/O completions. Of
course, it is a trade-off between delaying interrupts (and therefore completions)
and amortizing CPU processing time.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 31
the magnetic media. Because the amount of flash memory is quite small when it is
compared to the amount of data that can be stored on the platters, the cost is
acceptable. This is especially true when one considers the other benefits of flash
memory: improved power and greater tolerance of shock, vibration, and heat.
As the cost of flash memory continues to decrease, it becomes more possible to
improve storage subsystem response time on servers. The typical vehicle for
incorporating nonvolatile memory in a server is the solid-state disk (SSD). The most
cost-effective way is to place only the “hottest” data of a workload onto nonvolatile
memory. In Windows Server 2008, partitioning can be performed only by applications
that store data on the SSD. Windows Server 2008 does not try to dynamically
determine what data should optimally be stored on SSDs.
Response Times
You can use tools such as Perfmon to obtain data on disk request response times.
Write requests that enter a writeback hardware cache often have very low response
times (less than 1 ms) because completion depends on dynamic RAM (DRAM) instead
of disk speeds. The data is written back to disk media in the background. As the
workload begins to saturate the cache, response times increase until the write
cache’s only benefit is potentially a better ordering of requests to reduce positioning
delays.
For JBOD arrays, reads and writes have approximately the same performance
characteristics. With modern hard disks, positioning delays for random requests are
5 to 15 ms. Smaller 2.5-inch drives have shorter positioning distances and lighter
actuators, so they generally provide faster seek times than comparable larger 3.5-
inch drives. Positioning delays for sequential requests should be insignificant except
for write-through streams, where each positioning delay should approximate the
required time for a complete disk rotation.
Transfer times are usually less significant when they are compared to positioning
delays, except for sequential requests and large requests (larger than 256 KB) that are
instead dominated by disk media access speeds as the requests become larger or
more sequential. Modern hard disks access their media at 25 to 125 MB per second
depending on rotation speed and sectors per track, which varies across a range of
blocks on a specific disk model. Outermost tracks can have up to twice the sequential
throughput of innermost tracks.
If the stripe unit size of a striped array is well chosen, each request is serviced by a
single disk—except for a low-concurrency workload. So, the same general positioning
and transfer times still apply.
For mirrored arrays, a write completion might be required to wait for both disks to
complete the request. Depending on how the requests are scheduled, the two
completions of the requests could take a long time. However, although writes
generally should not take twice the time to complete for mirrored arrays, they are
probably slower than JBOD. Or, reads can experience a performance increase if the
array controller is dynamically load-balancing or considering spatial locality.
For RAID 5 arrays (rotated parity), small writes become four separate requests in the
typical read-modify-write scenario. In the best case, this is approximately the
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 32
equivalent of two mirrored reads plus a full rotation of the disks, if you assume that
the Read/Write pairs continue in parallel. RAID 6 incurs an even greater performance
hit for writes because each RAID 6 small write request becomes three reads plus
three writes.
You must consider the performance affect of redundant arrays on read and write
requests when you plan subsystems or analyze performance data. For example,
Perfmon might show that 50 writes per second are being processed by volume x, but
in reality this could mean 100 requests per second for a mirrored array, 200 requests
per second for a RAID 5 array, or even more than 200 requests per second if the
requests are split across stripe units.
The following are response time guidelines if no workload details are available. For a
lightly loaded system, average write response times should be less than 25 ms on
RAID 5 and less than 15 ms on non-RAID 5 disks. Average read response times should
be less than 15 ms. For a heavily loaded system that is not saturated, average write
response times should be less than 75 ms on RAID 5 and less than 50 ms on non-
RAID 5 disks. Average read response times should be less than 50 ms.
Queue Lengths
Several opinions exist about what constitutes excessive disk request queuing. This
guide assumes that the boundary between a busy disk subsystem and a saturated
one is a persistent average of two requests per physical disk. A disk subsystem is near
saturation when every physical disk is servicing a request and has at least one
queued-up request to maintain maximum concurrency—that is, to keep the data
pipeline flowing. Note that in this guideline, disk requests split into multiple requests
(because of striping or redundancy maintenance) are considered multiple requests.
This rule has caveats, because most administrators do not want all physical disks
constantly busy. But because disk workloads are generally bursty, this rule is more
likely applied over shorter periods of (peak) time. Requests are typically not uniformly
spread among all hard disks at the same time, so the administrator must consider
deviations between queues—especially for bursty workloads. Conversely, a longer
queue provides more opportunity for disk request schedulers to reduce positioning
delays or optimize for full-stripe RAID 5 writes or mirrored read selection.
Because hardware has an increased capability to queue up requests—either through
multiple queuing agents along the path or merely agents with more queuing
capability—increasing the multiplier threshold might allow more concurrency within
the hardware. This creates a potential increase in response time variance, however.
Ideally, the additional queuing time is balanced by increased concurrency and
reduced mechanical positioning times.
The following is a queue length target to use when few workload details are available.
For a lightly loaded system, the average queue length should be less than one per
physical disk, with occasional spikes of 10 or less. If the workload is write heavy, the
average queue length above a mirrored controller should be less than 0.6 per physical
disk and less than 0.3 per physical disk above a RAID 5 controller. For a heavily loaded
system that is not saturated, the average queue length should be less than 2.5 per
physical disk, with infrequent spikes up to 20. If the workload is write heavy, the
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 33
average queue length above a mirrored controller should be less than 1.5 per physical
disk and less than 1.0 per physical disk above a RAID 5 controller. For workloads of
sequential requests, larger queue lengths can be tolerated because services times
and therefore response times are much shorter than those for a random workload.
For more details on Windows storage performance, see “Disk Subsystem
Performance Analysis for Windows.”
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 34
response cache). Worker processes register for URL subspaces, and Http.sys routes
the request to the appropriate process (or set of processes for application pools).
Figure 4 shows the difference between the IIS 6.0 and IIS 7.0 process models. IIS 6.0
kept a single copy of the metabase in a global process, inetinfo.exe. IIS 7.0 no longer
uses the metabase and instead loads XML configuration files that are located
alongside Web content. Each worker process loads a unique copy of configuration.
IIS 7.0 also implements an “integrated pipeline.” The integrated pipeline model
exposes extensibility.
The IIS 7.0 process relies on the kernel-mode Web driver, Http.sys. Http.sys is
responsible for connection management and request handling. The request can be
either served from the Http.sys cache or handed to a worker process for further
handling (see Figure 5). Multiple worker processes can be configured, which provides
isolation at a reduced cost.
Http.sys includes a response cache. When a request matches an entry in the response
cache, Http.sys sends the cache response directly from kernel mode. Figure 5 shows
the request flow from the network through Http.sys (and possibly up to a worker
process). Some Web application platforms, such as ASP.NET, provide mechanisms to
enable any dynamic content to be cached in the kernel cache. The static file handler
in IIS 7.0 automatically caches frequently requested files in http.sys.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 35
Kernel-Mode Tunings
Performance-related Http.sys settings fall into two broad categories: cache
management, and connection and request management. All registry settings are
stored under the following entry:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\Parameters
If the HTTP service is already running, it must be stopped and restarted for the
changes to take effect.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 36
the limit. If memory is limited and large entries are crowding out smaller ones, it
might be helpful to lower the limit.
• UriScavengerPeriod. Default value 120 seconds.
The Http.sys cache is periodically scanned by a scavenger, and entries that are
not accessed between scavenger scans are removed. Setting the scavenger
period to a high value reduces the number of scavenger scans. However, the
cache memory usage might increase because older, less frequently accessed
entries can remain in the cache. Setting the period to too low a value causes
more frequent scavenger scans and might result in too many flushes and cache
churn.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 37
User-Mode Settings
The settings in this section affect the IIS 7.0 worker process behavior. Most of these
settings can be found in the %SystemRoot%\system32\inetsrv\config
\applicationHost.config XML configuration file. Use either appcmd.exe or the IIS 7.0
management console to change them. Most settings are automatically detected and
do not require a restart of the IIS 7.0 worker processes or Web Application Server.
maxResponseSize Lets files up to the specified size be cached. The actual value 262144
depends on the number and size of the largest files in the
dataset versus the available RAM. Caching large, frequently
requested files can reduce CPU usage, disk access, and
associated latencies. The default value is 256 KB.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 38
system.webServer/urlCompression
Attribute Description Default
doStaticCompression Specifies whether static content is compressed. True
doDynamicCompression Specifies whether dynamic content is compressed. False
Note: For IIS 7.0 servers that have low average CPU usage, consider enabling
compression for dynamic content, especially if responses are large. This should first
be done in a test environment to assess the effect on the CPU usage from the
baseline.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 39
system.applicationHost/log/centralBinaryLogFile
Attribute Description Default
enabled Specifies whether central binary logging is enabled. False
directory Specifies the directory where log entries are written. See des-
The default directory is cription
%SystemDrive%\inetpub\logs\LogFiles.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 40
system.applicationHost/sites/VirtualDirectoryDefault
Attribute Description Default
enabled Specifies whether IIS looks for Web.config files in True
content directories lower than the current level
(true) or does not look for Web.config files in
content directories lower than the current level
(false).
When configuration is queried in the IIS 7.0
pipeline, it is not known whether a URL
(/<name>.htm) is a reference to a directory or a file
name. By default, IIS 7.0 must assume that
/<name>.htm is a reference to a directory and
search for configuration in a
/<name>.htm/web.config file. This results in an
additional file system operation that can be costly.
By imposing a simple limitation, which allows
configuration only in virtual directories, IIS 7.0 can
then know that unless /<name>.htm is a virtual
directory it should not look for a configuration file.
Skipping the additional file operations can
significantly improve performance to Web sites
that have a very large set of randomly accessed
static content.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 41
system.webServer/asp/limits
Attribute Description Default
processorThreadMax Specifies the maximum number of worker 25
threads per processor that ASP can create.
Increase if the current setting is insufficient to
handle the load, possibly causing errors when it
is serving some requests or under-usage of CPU
resources.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 42
system.webServer/asp/comPlus
Attribute Description Default
executeInMta Set to “true” if errors or failures are detected False
while it is serving some ASP content. This can
occur, for example, when hosting multiple
isolated sites in which each site runs under its
own worker process. Errors are typically
reported from COM+ in the event viewer. This
setting enables the multithreaded apartment
model in ASP.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 43
ISAPI
No special tuning parameters are needed for the Internet Server API (ISAPI)
applications. If writing a private ISAPI extension, make sure that you code it efficiently
for performance and resource use. See also “Other Issues that Affect IIS
Performance” later in this guide.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 44
When you run multiple hosts that contain ASP.NET scripts in isolated mode (one
application pool per site), monitor the memory usage. Make sure that the server that
runs has enough RAM for the expected number of concurrently running application
pools. Consider using multiple application-domains instead of multiple isolated
processes.
For performance recommendations on ASP.NET, see the MSDN article “10 Tips for
Writing High-Performance Web Applications.”
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 45
hardware. The sections on networking and storage subsystems also apply to file
servers.
File Client
Application
RDBSS.SYS
SMB File Server
Configuration Considerations
Do not enable any services or features that your particular file server and file clients
do not require. These might include SMB signing, client
client-side
side caching, file system
minifilters, search service, scheduled tasks, NTFS encryption, NTFS compression,
IPSEC, firewall filters, and antivirus features.
June 9, 2008
© 2007–2008
2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 46
The default is 0. This parameter disables the processing of write flush commands
from clients. If the value of this entry is 1, the server performance and client
latency for power-protected servers can improve. Workloads that resemble the
NetBench file server benchmark benefit from this behavior.
• AsynchronousCredits
HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\(REG_DWORD)
The defaults are 64 and 1024, respectively. These parameters allow the server to
throttle client operation concurrency dynamically within the specified
boundaries. Some clients might achieve increased throughput with higher
concurrency limits. One example is file copy over high-bandwidth, high-latency
links.
• PagedPoolSize (no longer required for Windows Server 2008)
HKLM\System\CurrentControlSet\Control\SessionManager\MemoryManagement
\(REG_DWORD)
• Disablelastaccess (no longer required for Windows Server 2008)
HKLM\System\CurrentControlSet\Control\FileSystem\(REG_DWORD)
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 47
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 48
This option is the equivalent of the /3GB boot.ini option in Windows Server 2003.
• Use an appropriate amount of RAM.
Active Directory uses the server’s RAM to cache as much of the directory
database as possible. This reduces disk access and improves performance. Unlike
Windows 2000, the Active Directory cache in Windows Server 2003 and Windows
Server 2008 is permitted to grow. However, it is still limited by the virtual address
space and how much physical RAM is on the server.
To determine whether more RAM is needed for the server, monitor the
percentage of Active Directory operations that are being satisfied from the cache
by using the Reliability and Performance Monitor. Examine the lsass,exe instance
(for Active Directory Domain Services) or Directory instance (for Active Directory
Lightweight Directory Services) of the Database\Database Cache % Hit
performance counter. A low value indicates that many operations are not being
satisfied from the cache. Adding more RAM might improve the cache hit rate and
the performance of Active Directory. You should examine the counter after Active
Directory has been running for some time under a typical workload. The cache
starts out empty when the Active Directory service is restarted or the machine is
rebooted, so the initial hit rate is low.
The use of the Database Cache % Hit counter is the preferred way to assess how
much RAM a server needs. Or, a guideline is that when the RAM on a server is
twice the physical size of the Active Directory database on disk, it likely gives
sufficient room for caching the entire database in memory. However, in many
scenarios this is an overestimation because the actual part of the database
frequently used is only a fraction of the entire database.
• Use a good disk I/O subsystem.
Ideally, the server is equipped with sufficient RAM to be able to cache the “hot”
parts of the database entirely in memory. However, the on-disk database must
still be accessed to initially populate the memory cache, when it accesses
uncached parts of the database and when it writes updates to the directory.
Therefore, appropriate selection of storage is also important to Active Directory
performance.
We recommend that the Active Directory database folder be located on a
physical volume that is separate from the Active Directory log file folder. In the
Active Directory Lightweight Directory Services installation wizard, these are
known as data files and data recovery files. Both folders should be on a physical
volume that is separate from the operating system volume. The use of drives that
support command queuing, especially SCSI or Serial Attached SCSI, might also
improve performance.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 49
is populated. Active Directory automatically populates the cache as queries visit parts
of the directory.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 50
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 51
CPU Configuration
CPU configuration is conceptually determined by multiplying the required CPU to
support a session by the number of sessions that the system is expected to support,
while maintaining a buffer zone to handle temporary spikes. Multiple processors and
cores can help reduce abnormal CPU congestion situations, which are usually caused
by a few overactive threads that are contained by a similar number of cores.
Therefore, the more cores on a system, the lower the cushion margin that must be
built into the CPU usage estimate, which results in a larger percentage of active load
per CPU. One important factor to remember is that doubling the number of CPUs
does not double CPU capacity. For more considerations, see “Performance Tuning for
Server Hardware” earlier in this guide.
Processor Architecture
In a 32-bit architecture, all system processes share a 2-GB kernel virtual address
space, which limits the maximum number of attainable Terminal Server sessions.
Because memory that the operating system allocates across all processes shares the
same 2-GB space, increasing the number of sessions and processes eventually
exhausts this resource. Significant improvements have been made in Windows
Server 2008 to better manage the 2-GB address space. Some of these improvements
include dynamic reallocation across different internal memory subareas. This
reallocation is based on consumption compared to Windows Server 2003, which had
static allocation that left some fraction of the 2 GB unused depending on the specifics
of the usage scenario. The most important kernel memory areas that affect Terminal
Server capacity are system page table entries (PTEs), system cache, and paged pool.
Improvements also include reducing consumption in some critical areas such as
kernel stacks for threads. Nevertheless, either significant performance degradation or
failures can occur if the number of sessions or processes is high. Actual values vary
significantly with the usage scenario, but a good watermark is approximately 250
sessions. Using large amounts of memory (greater than 12 GB) also consumes
substantial amounts from the 2-GB space for memory management data structures,
which further accentuates the issue.
The 64-bit processor architecture provides a significantly higher kernel virtual address
space, which makes it much more suitable for systems that need large amounts of
memory. Specifically, the x64 version of the 64-bit architecture is the more workable
option for Terminal Server deployments because it provides very small overhead
when it runs 32-bit processes. The most significant performance drawback when you
migrate to 64-bit architecture is significantly greater memory usage.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 52
Memory Configuration
It is difficult to predict the memory configuration without knowing the applications
that users employ. However, the required amount of memory can be estimated by
using the following formula:
TotalMem = OSMem + SessionMem * NS
OSMem is how much memory the operating system requires to run (such as system
binary images, data structures, and so on), SessionMem is how much memory
processes running in one session require, and NS is the target number of active
sessions. The amount of required memory for a session is mostly determined by the
private memory reference set for applications and system processes that are running
inside the session. Shared pages (code or data) have little effect because only one
copy is present on the system.
One interesting observation is that, assuming the disk system that is backing the
pagefile does not change, the larger the number of concurrent active sessions the
system plans to support, the bigger the per-session memory allocation must be. If the
amount of memory that is allocated per session is not increased, the number of page
faults that active sessions generate increases with the number of sessions and
eventually overwhelms the I/O subsystem. By increasing the amount of memory that
is allocated per session, the probability of incurring page faults decreases, which
helps reduce the overall rate of page faults.
Disk
Storage is one of the aspects most often overlooked when you configure a Terminal
Server system, and it can be the most common limitation on systems that are
deployed in the field.
The disk activity that is generated on a typical Terminal Server system affects the
following three areas:
• System files and application binaries.
• Pagefiles.
• User profiles and user data.
Ideally, these three areas should be backed by distinct storage devices. Using RAID
configurations or other types of high-performance storage further improves
performance. We highly recommend that you use storage adapters with battery-
backed cache that allows writeback optimizations. Controllers with writeback cache
support offer improved support for synchronous disk writes. Because all users have a
separate hive, synchronous disk writes are significantly more common on a Terminal
Server system. Registry hives are periodically saved to disk by using synchronous
write operations. To enable these optimizations, from the Disk Management console,
open the Properties dialog box for the destination disk and, on the Policies tab, select
the Enable write caching on the disk and Enable advanced performance check
boxes.
For more specific storage tunings, see the guidelines in “Performance Tuning for
Storage Subsystem” earlier in this guide.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 53
Network
Network usage includes two main categories:
• Terminal Server connections traffic in which usage is determined almost
exclusively by the drawing patterns exhibited by the applications that are running
inside the sessions and the redirected devices I/O traffic.
For example, applications handling text processing and data input consume
bandwidth of approximately 10 to 100 Kb per second, whereas rich graphics and
video playback cause significant increases in bandwidth usage. We do not
recommend video playback over Terminal Server connections because desktop
remoting is not optimized to support the high frame rate rendering that is
associated with video playback. Frequent use of device redirection features such
as file, clipboard, printer, or audio redirection also significantly increases network
traffic. Generally, a single 1-GB adapter is satisfactory for most systems.
• Back-end connections such as roaming profiles, application access to file shares,
database servers, e-mail servers, and HTTP servers.
The volume and profile of network traffic is specific to each deployment.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 54
Pagefile
Insufficient pagefile can cause memory allocation failures either in applications or
system components. A general guideline is that the combined size of the pagefiles
should be two to three times larger than the physical memory size. You can use the
Memory\Committed Bytes performance counter to monitor how much committed
virtual memory is on the system. When the value of this counter reaches close to the
total combined size of physical memory and pagefiles, memory allocation begins to
fail. Because of significant disk I/O activity that pagefile access generates, consider
using a dedicated storage device for the pagefile, ideally a high-performance one
such as a RAID array.
Task Scheduler
Task Scheduler (which can be accessed under All Programs > Accessories >
System Tools) lets you examine the list of tasks that are scheduled for different
events. For Terminal Server, it is useful to focus specifically on the tasks that are
configured to run on idle, at user logon, or on session connect and disconnect.
Because of the specifics assumptions of the deployment, many of these tasks might
be unnecessary.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 55
• Color depth.
Color depth can be adjusted under Remote Session Environment >
Limit Maximum Color Depth with possible values of 8, 15, 16, and 32 bit. The
default value is 16 bit, and increasing the bit depth increases memory and
bandwidth consumption. Or, the color depth can be adjusted from TSConfig.exe
by opening the Properties dialog box for a specific connection and, on the Client
Setting tab, changing the selected value in the drop-down box under Color
Depth. The Limit Maximum Color Depth check box must be selected.
• Remote Desktop Protocol compression.
Remote Desktop Protocol (RDP) compression can be configured under
Remote Session Environment > Set compression algorithm for RDP data. Three
values are possible:
• Optimized to use less memory is the configuration that matches the default
Windows Server 2003 configuration. This uses the least amount of memory
per session but has the lowest compression ratio and therefore the highest
bandwidth consumption.
• Balances memory and network bandwidth is the default setting for
Windows Server 2008. This has reduced bandwidth consumption while
marginally increasing memory consumption (approximately 200 KB per
session).
• Optimized to use less network bandwidth further reduces network
bandwidth usage at a cost of approximately 2 MB per session. This memory is
allocated in the kernel virtual address space and can have a significant effect
on 32-bit processor-based systems that are running a fairly large number of
users. Because 64-bit systems do not have these issues, this setting is
recommended if the additional memory cost is considered acceptable. If you
want to use this setting, you should assess the maximum number of sessions
and test to that level with this setting before placing a server in production.
• Device redirection.
Device redirection can be configured under Device and Resource Redirection. Or,
it can be configured through TSConfig by opening the properties for a specific
connection such as RDP-Tcp and, on the Client Settings tab, changing Redirection
settings.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 56
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 57
Desktop Size
Desktop size for remote sessions can be controlled either through the TS Client user
interface (on the Display tab under Remote desktop size settings) or the RDP file
(desktopwidth:i:1152 and desktopheight:i:864). The larger the desktop size, the
greater the memory and bandwidth consumption that is associated with that session.
The current maximum desktop size that a server accepts is 4096 x 2048.
The default value is 5. It specifies the number of threads that the TS Gateway
service creates to handle incoming requests.
• MaxPoolThreads
HKLM\System\CurrentControlSet\Services\InetInfo\Parameters\(REG_DWORD)
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 58
The default value is 64 KB. This value specifies the receive window that the
server uses for data that is received from the RPC proxy. The minimum value
is set to 8 KB, and the maximum value is set at 1 GB. If the value is not
present, then the default value is used. When changes are made to this value,
IIS must be restarted for the change to take effect.
• ClientReceiveWindow
HKLM\Software\Microsoft\Rpc\ (REG_DWORD)
The default value is 64 KB. This value specifies the receive window that the
client uses for data that is received from the RPC proxy. The minimum valid
value is 8 KB, and the maximum value is 1 GB. If the value is not present, then
the default value is used.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 59
This section provides terminology that is used throughout the text and suggests best
practices that yield increased performance on Hyper-V servers.
Terminology
This section summarizes key terminology specific to VM technology that will be used
throughout this performance tuning guide:
child partition
Any partition (VM) that is created by the root partition.
device virtualization
A mechanism that lets a hardware resource be abstracted and shared among
multiple consumers.
emulated device
A virtualized device that mimics an actual physical hardware device so that guests
can use the typical drivers for that hardware device.
enlightenment
An optimization to a guest operating system to make it aware of VM
environments and tune its behavior for VMs.
guest
Software that is running in a partition. It can be a full-featured operating system
or a small, special-purpose kernel. The hypervisor is “guest-agnostic.”
hypervisor
A layer of software that sits just above the hardware and below one or more
operating systems. Its primary job is to provide isolated execution environments
called partitions. Each partition has its own set of hardware resources (CPU,
memory, and devices). The hypervisor is responsible for controls and arbitrates
access to the underlying hardware.
logical processor
A CPU that handles one thread of execution (instruction stream). There can be
one or more logical processors per core and one or more cores per processor
socket. In effect, it is a “physical processor.”
passthrough disk access
A representation of an entire physical disk as a virtual disk within the guest. The
data and commands are “passed through” to the physical disk (through the root
partition’s native storage stack) with no intervening processing by the virtual
stack.
root partition
A partition that is created first and owns all the resources that the hypervisor
does not own including most devices and system memory. It hosts the
virtualization stack and creates and manages the child partitions.
synthetic device
A virtualized device with no physical hardware analog so that guests might need a
driver (virtualization service client) to that synthetic device. The driver can use
VMBus to communicate with the virtualized device software in the root partition.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 60
Hyper-V Architecture
Hyper-V features a hypervisor-based architecture that is shown in Figure 7. The
hypervisor virtualizes processors and memory and provides mechanisms for the
virtualization stack in the root partition to manage child partitions (VMs) and expose
services such as I/O devices to the VMs. The root partition owns and has direct access
to the physical I/O devices. The virtualization stack in the root partition provides a
memory manager for VMs, management APIs, and virtualized I/O devices. It also
implements emulated devices such as Integrated Device Electronics (IDE) and PS/2
but supports synthetic devices for increased performance and reduced overhead.
Server Server
VSPs
VSPs
Hypervisor
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 61
The synthetic I/O architecture consists of VSPs in the root partition and VSCs in the
child partition. Each service is exposed as a device over VMBus, which acts as an I/O
bus and enables high-performance communication between VMs that use
mechanisms such as shared memory. Plug and Play enumerates these devices,
including VMBus, and loads the appropriate device drivers (VSCs). Services other than
I/O are also exposed through this architecture.
Windows Server 2008 features enlightenments to the operating system to optimize
its behavior when it is running in VMs. The benefits include reducing the cost of
memory virtualization, improving multiprocessor scalability, and decreasing the
background CPU usage of the guest operating system.
Server Configuration
This section describes best practices for selecting hardware for virtualization servers
and installing and setting up Windows Server 2008 for the Hyper-V server role.
Hardware Selection
The hardware considerations for Hyper-V servers generally resemble that of servers ,
but Hyper-V servers can exhibit increased CPU usage, consume more memory, and
need larger I/O bandwidth because of server consolidation. For more information,
refer to “Performance Tuning for Server Hardware” earlier in this guide.
• Processors.
Hyper-V in Windows Server 2008 supports up to 16 logical processors and can
use all logical processors if the number of active virtual processors matches that
of logical processors. This can reduce the rate of context switching between
virtual processors and can yield better performance overall.
• Cache.
Hyper-V can benefit from larger processor caches, especially for loads that have a
large working set in memory and in VM configurations in which the ratio of
virtual processors to logical processors is high.
• Memory.
The physical server requires sufficient memory for the root and child partitions.
Hyper-V first allocates the memory for child partitions, which should be sized
based on the needs of the expected server load for each VM. The root partition
should have sufficient available memory to efficiently perform I/Os on behalf of
the VMs and operations such as a VM snapshot.
• Networking.
If the expected loads are network intensive, the virtualization server can benefit
from having multiple network adapters or multiport network adapters. VMs can
be distributed among the adapters for better overall performance. To reduce the
CPU usage of network I/Os from VMs, Hyper-V can use hardware offloads such as
Large Send Offload (LSOv1) and TCPv4 checksum offload. For details on network
hardware considerations, see “Performance Tuning for Networking Subsystem”
earlier in this guide.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 62
• Storage.
The storage hardware should have sufficient I/O bandwidth and capacity to meet
current and future needs of the VMs that the physical server hosts. Consider
these requirements when you select storage controllers and disks and choose the
RAID configuration. Placing VMs with highly disk-intensive workloads on different
physical disks will likely improve overall performance. For example, if four VMs
share a single disk and actively use it, each VM can yield only 25 percent of the
bandwidth of that disk. For details on storage hardware considerations and
discussion on sizing and RAID selection, see “Performance Tuning for Storage
Subsystem” earlier in this guide.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 63
CPU Statistics
Hyper-V publishes performance counters to help characterize the behavior of the
virtualization server and break out the resource usage. The standard set of tools for
viewing performance counters in Windows include Performance Monitor
(perfmon.exe) and logman.exe, which can display and log the Hyper-V performance
counters. The names of the relevant counter objects are prefixed with “Hyper-V.”
You should always measure the CPU usage of the physical system through the
Hyper-V Hypervisor Logical Processor performance counters. The statistics that Task
Manager and Performance Monitor report in the root and child partitions do not fully
capture the CPU usage.
Processor Performance
The hypervisor virtualizes the physical processors by time-slicing between the virtual
processors. To perform the required emulation, certain instructions and operations
require the hypervisor and virtualization stack to run. Migrating a workload into a VM
increases the CPU usage, but this guide describes best practices for minimizing that
overhead.
Integration Services
The VM integration services include enlightened drivers for the synthetic I/O devices,
which significantly reduces CPU overhead for I/O than for emulated devices. The
latest version should be installed in every supported guest. The services decrease the
CPU usage of the guests, from idle guests to heavily used guests, and improve the I/O
throughput. This is the first step in tuning a Hyper-V server for performance.
Enlightened Guests
The operating system kernel in Windows Vista SP1, Windows Server 2008, and later
releases features enlightenments that optimize its operation for VMs. For best
performance, we recommend that you use Windows Server 2008 as a guest
operating system. The enlightenments decrease the CPU overhead of Windows that
runs in a VM. The integration services provide additional enlightenments for I/O.
Depending on the server load, it can be appropriate to host a server application in a
Windows Server 2008 guest for better performance.
Virtual Processors
Hyper-V in Windows Server 2008 supports a maximum of four virtual processors per
VM. VMs that have loads that are not CPU intensive should be configured by using
one virtual processor. This is because of the additional overhead that is associated
with multiple virtual processors, such as additional synchronization costs in the guest
operating system. More CPU-intensive loads should be placed in 2P or 4P VMs if the
VM requires more than one CPU of processing under peak load.
Hyper-V supports Windows Server 2008 guests in 1P, 2P, or 4P VMs, and Windows
Server 2003 supports SP2 guests in 1P and 2P VMs. Windows Server 2008 features
enlightenments to the core operating system that improves scalability in
multiprocessor VMs. Your workloads can benefit from the scalability improvements in
Windows Server 2008 if they must run 2P and 4P VMs.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 64
Background Activity
Minimizing the background activity in idle VMs releases CPU cycles that can be used
elsewhere by other VMs or saved to reduce power consumption. Windows guests
typically use less than 1 percent of one CPU when they are idle. The following are
several best practices for minimizing the background CPU usage of a VM:
• Install the latest version of VM integration services.
• Remove the emulated network adapter through the VM settings dialog box (use a
synthetic adapter).
• Disable the screen saver or select a blank screen saver.
• Remove unused devices such as the CD-ROM and COM port, or disconnect their
media.
• Keep the Windows guest at the logon screen when it is not being used (and
disable its screen saver).
• Use Windows Server 2008 for the guest operating system.
• Disable, throttle, or stagger periodic activity such as backup and defragmentation
if appropriate.
• Review scheduled tasks and services enabled by default.
• Improve server applications to reduce periodic activity (such as timers).
The following are additional best practices for configuring a client version of Windows
in a VM to reduce the overall CPU usage:
• Disable background services such as SuperFetch and Windows Search.
• Disable scheduled tasks such as Scheduled Defrag.
• Disable AeroGlass and other user interface effects (through the System
application in Control Panel).
Memory Performance
The hypervisor virtualizes the guest physical memory to isolate VMs from each other
and provide a contiguous, zero-based memory space for each guest operating
system. Memory virtualization can increase the CPU cost of accessing memory,
especially when applications frequently modify the virtual address space in the guest
operating system because of frequent allocations and deallocations.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 65
Enlightened Guests
Windows Server 2008 includes kernel enlightenments and optimizations to the
memory manager to reduce the CPU overhead from Hyper-V memory virtualization.
Workloads that have a large working set in memory can benefit from using Windows
Server 2008 as a guest. These enlightenments reduce the CPU cost of context
switching between processes and accessing memory. Additionally, they improve the
multiprocessor (MP) scalability of Windows Server 2008 guests.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 66
For highly intensive storage I/O workloads that span multiple data drives, each VHD
should be attached to a separate synthetic SCSI controller for better overall
performance. In addition, each VHD should be stored on separate physical disks.
Passthrough Disks
The VHD in a VM can be mapped directly to a physical disk or logical unit number
(LUN), instead of a VHD file. The benefit is that this configuration bypasses the file
system (NTFS) in the root partition, which reduces the CPU usage of storage I/O. The
risk is that physical disk or LUNs can be more difficult to move between machines
than VHD files.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 67
Large data drives can be prime candidates for passthrough disks, especially if they are
I/O intensive. VMs that can be migrated between virtualization servers (such as quick
migration) must also use drives that reside on a LUN of a shared storage device.
By default, both Windows Vista and Windows Server 2008 disable the last-access
time updates.
HKLM\System\CurrentControlSet\Services\VmSwitch\<Key> = (REG_DWORD)
Both storage and networking have three registry keys at the preceding StorVsp and
VmSwitch paths, respectively. Each value is a DWORD and operates as follows. We do
not recommend this advanced tuning option unless you have a specific reason to use
it. Note that these registry keys might be removed in future releases:
• IOBalance_Enabled
The balancer is enabled when set to a nonzero value and disabled when set to 0.
The default is enabled for storage and disabled for networking. Enabling the
balancing for networking can add significant CPU overhead in some scenarios.
• IOBalance_KeepHwBusyLatencyTarget_Microseconds
This controls how much work, represented by a latency value, the balancer allows
to be issued to the hardware before throttling to provide better balance. The
default is 83 ms for storage and 2 ms for networking. Lowering this value can
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 68
improve balance but will reduce some throughput. Lowering it too much
significantly affects overall throughput. Storage systems with high throughput
and high latencies can show added overall throughput with a higher value for this
parameter.
• IOBalance_AllowedPercentOverheadDueToFlowSwitching
This controls how much work the balancer issues from a VM before switching to
another VM. This setting is primarily for storage where finely interleaving I/Os
from different VMs can increase the number of disk seeks. The default is
8 percent for both storage and networking.
Offload Hardware
As with the native scenario, offload capabilities in the physical network adapter
reduce the CPU usage of network I/Os in VM scenarios. Hyper-V currently uses LSOv1
and TCPv4 checksum offload. The offload capabilities must be enabled in the driver
for the physical network adapter in the root partition. For details on offload
capabilities in network adapters, refer to “Choosing a Network Adapter” earlier in this
guide.
Drivers for certain network adapters disable LSOv1 but enable LSOv2 by default.
System administrators must explicitly enable LSOv1 by using the driver Properties
dialog box in Device Manager.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 69
adapters, VMs with network-intensive loads can benefit from being connected to
different virtual switches to better use the physical network adapters.
Interrupt Affinity
Under certain workloads, binding the device interrupts for a single network adapter
to a single logical processor can improve performance for Hyper-V. We recommend
this advanced tuning only to address specific problems in fully using network
bandwidth. System administrators can use the IntPolicy tool to bind device interrupts
to specific processors.
VLAN Performance
The Hyper-V synthetic network adapter supports VLAN tagging. It provides
significantly better network performance if the physical network adapter supports
NDIS_ENCAPSULATION_IEEE_802_3_P_AND_Q_IN_OOB encapsulation for both large
send and checksum offload. Without this support, Hyper-V cannot use hardware
offload for packets that require VLAN tagging and network performance can be
decreased.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 70
• TreatHostAsStableStorage
HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\(REG_DWORD)
The default is 0. This parameter disables the processing of write flush commands
from clients. If the value of this entry is 1, the server performance and client
latency for power-protected servers can improve.
• ScavengerTimeLimit
HKLM\system\CurrentControlSet\Services\lanmanworkstation\parameters\REG_DWORD)
• DisableByteRangeLockingOnReadOnlyFiles
HKLM\System\CurrentControlSet\Services\LanmanWorkStation\Parameters\REG_DWORD)
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 71
• You should not excessively post user receive buffers because the first ones that
are posted would return before you have the need to use other buffers.
• It is best to bind each set of threads to a processor (the second delimited
parameter in the “-m” option).
• Each thread creates a socket that connects (listens) on a different port.
Network Adapter
Make sure that you enable all offloading features.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 72
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 73
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 74
• Minimize the effect on CPU usage when you are running many Terminal Server
sessions by opening the MMC snap-in for Group Policies (gpedit.msc) and making
the following changes under Local Computer Policy > User Configuration >
Administrative Templates:
• Under Start Menu and Taskbar, enable
Do not keep history of recently opened documents.
• Under Start Menu and Taskbar, enable
Remove Balloon Tips on Start Menu items.
• Under Start Menu and Taskbar, enable
Remove frequent program list from Start Menu.
• Minimize the effect on the memory footprint and reduce background activity by
disabling certain Microsoft Win32® services. The following are examples from
command-line scripts to do this:
Service name Syntax to stop and disable service
Desktop Window Manager Session Manager sc config UxSms start= disabled
sc stop UxSms
Windows Error Reporting service sc config WerSvc start= disabled
sc stop WerSvc
Windows Update sc config wuauserv start= disabled
sc stop wuauserv
• Minimize background traffic by applying the following changes under Start >
All Programs > Administrative Tools > Server Manager, and going to
Resources and Support:
• Opt out of participating in the Customer Experience Improvement Program
(CEIP).
• Opt out of participating in Windows Error Reporting (WER).
• Apply the following changes from the Terminal Services MMC snap-in
(tsconfig.msc):
• Set the maximum color depth to 24 bits per pixel (bpp).
• Disable all device redirections.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 75
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 76
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 77
• To set TCP/IP connection affinity, refer to “How to: Map TCP/IP Ports to
NUMA Nodes.”
• Set a fixed amount of memory that the SQL Server process will use. For example,
set the max server memory and min server memory equal and large enough to
satisfy the workload (2500 MB is a good starting value).
• Change the network packet size to 8 KB for better page alignment in SQL
environments.
• Set the recovery interval to 32767, to offset the SQL Server checkpoints while it is
running the workload.
• On a two-tier ERP SAP setup, consider enabling and using only the Named Pipes
protocol and disabling the rest of the available protocols from the SQL Server
Configuration Manager for the local SQL connections.
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.
Performance Tuning Guidelines for Windows Server 2008 - 78
Resources
Web Sites:
Windows Server 2008
http://www.microsoft.com/windowsserver2008
Windows Server Performance Team Blog
http://blogs.technet.com/winserverperformance/
SAP Global
http://www.sap.com/solutions/benchmark/sd.epx
Transaction Processing Performance Council
http://www.tpc.org
Documents:
Scalable Networking: Eliminating the Receive Processing Bottleneck—Introducing
RSS
http://download.microsoft.com/download/5/D/6/5D6EAF2B-7DDF-476B-93DC-
7CF0072878E6/NDIS_RSS.doc
Disk Subsystem Performance Analysis for Windows
http://www.microsoft.com/whdc/device/storage/subsys_perf.mspx
10 Tips for Writing High-Performance Web Applications
http://go.microsoft.com/fwlink/?LinkId=98290
Performance Tuning Guidelines for Microsoft Services for Network File System
http://technet.microsoft.com/en-us/library/bb463205.aspx
Active Directory Performance for 64-bit Versions of Windows Server 2003
http://www.microsoft.com/downloads/details.aspx?FamilyID=52e7c3bd-570a-
475c-96e0-316dc821e3e7
How to configure Active Directory diagnostic event logging in Windows Server
2003 and in Windows 2000 Server
http://support.microsoft.com/kb/314980
Setting Server Configuration Options
http://go.microsoft.com/fwlink/?LinkId=98291
How to: Configure SQL Server to Use Soft-NUMA
http://go.microsoft.com/fwlink/?LinkId=98292
How to: Map TCP/IP Ports to NUMA Nodes
http://go.microsoft.com/fwlink/?LinkId=98293
SAP with Microsoft SQL Server 2005:
Best Practices for High Availability, Maximum Performance, and Scalability
http://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-
5bfcf076d9b9/SAP_SQL2005_Best%20Practices.doc
June 9, 2008
© 2007–2008 Microsoft Corporation. All rights reserved.