Best Practices For Tuning Microsoft SQL Server On HP DL980
Best Practices For Tuning Microsoft SQL Server On HP DL980
Table of contents
Introduction 3
Prerequisite Reading and System
Configuration 3
Server Recommendations 4
BIOS Settings 4
Hyper-threading Considerations 5
Card Placement: Understanding Processor
Enumeration and I/O Slots 5
Recommended Software: HP Enterprise
SQL Optimizer (HP ESO) 6
Operating System Recommendations 7
Windows (all versions) 7
Windows Server 2008 SP2 and Windows
Server 2008 R2 8
Windows Server 2008 R2 and Windows
Server 2012 8
Storage Recommendations 9
Provide Sufficient I/O and Storage to Run
SQL Server 9
Use the Recommended Storport Driver
with Fibre Channel Host Bus Adapters 10
Verify Maximum Queue Depth is Greater
Than or Equal to the Number of Spindles 10
Verify Switch Port Speed is set to
Maximum 11
Network Recommendations 11
Configure Receive Side Scaling (RSS) 11
Enable Options for Offload Processing 12
SQL Server Tuning Recommendations 13
Important Note about SQL Server 2008 13
Use Startup Options to Lock Pages in
Memory 13
Use SQL Server Startup Flags 14
Ensure SQL Server Starts Up Immediately
After System Boot 14
Enable Write Caching on the Log Disk 14
Segregate Network Processing from SQL
Processing 16
Use Data Compression Appropriately 16
Application-dependent SQL Parameter
Recommendations 17
For OLTP Workloads 17
For Business Intelligence Workloads 20
Conclusion 20
For More Information 21
Documentation Feedback 21
Introduction
The HP ProLiant DL980 Server stands at the pinnacle of HPs line of scale-up x86 servers. The DL980 not only
supports the most recent Microsoft Windows 64-bit operating systems, but also incorporates all of the
following technologies:
The latest Intel Xeon extended 64-bit processors
Quick Path Interconnect technology high speed links supporting a Non Uniform Memory Access
architecture
Industry standard I/O architecture based on PCI-e (and optionally, PCI-X) buses
HP PREMA Architecture for more information, see the technical overview at:
http://h20195.www2.hp.com/V2/GetPDF.aspx/4AA3-0643ENW.pdf
Because of this powerful synergy of proven technologies and engineering, the HP line of scale-up platforms is
exceptionally well-suited for workloads requiring high-performance processing, such as business intelligence
and other line-of-business applications.
Large databases at the core of these workloads quickly reach the 32-bit architectural limit of x86-based
systems. But the recent generation of HP servers implementing Windows 64-bit operating systems and
architecture offer much greater headroom and can therefore take advantage of scale-up x86 platforms. For
example, the HP ProLiant DL980 running Microsoft Windows Server 2008 R2 supports a maximum of 2
terabytes of main memory, 160 logical processors (with Intel Xeon E7 family processors), and 16 PCI cards.
The HP ProLiant DL980 is the ideal platform to capitalize on the advantages of Microsoft SQL Server 2008 (x64),
SQL Server 2008 R2, and, most recently, SQL Server 2012. Both SQL Server 2008 R2 and SQL Server 2012
deliver increased security, scalability, and availability to enterprise data and analytical applications, while
making them easier to build, deploy, and manage. Optimized for 64-bit addressing, they take advantage of
advanced memory addressing capabilities for essential resources such as buffer pools, caches, and sort heaps,
thereby reducing the need to perform multiple I/O operations to move data in and out of memory from disk.
This greater processing capacity, without the penalties of I/O latency, means greater application scalability.
Although SQL Server runs out-of-the-box on HP scale-up x64 servers, we recommend some tuning guidelines
to maximize performance and take full advantage of the capabilities of this platform. This document describes
configuration settings that represent current best practices in tuning an HP scale-up x64 system for Microsoft
Windows Server (2008, 2008 R2, or 2012) and SQL Server (2008, 2008 R2, or 2012).
Note
Windows Server 2008/R2/2012 is used throughout this document to describe
features common to Windows Server 2008, Windows Server 2008 R2, and Windows
Server 2012. SQL Server is used to describe features common to SQL Server 2008,
SQL Server 2008 R2, and SQL Server 2012. When the discussion refers to a specific
version of either, that version is explicitly stated.
3
Optimizer (ESO). To download the latest ISO image of the Smart Update CD, go here:
http://www.hp.com/support/DL980G7, select the Windows Operating System version , locate the
Software - CD-ROM section on this page, and then click the Download button next to Smart Update
QFE CD for x64.
In addition, installation of the latest HP System Providers is highly recommended: for systems running Windows
Server 2008/R2 (System Providers 9.0.5 or later is recommended), and for Windows Server 2012 (System
Providers 9.1 or later is required). In these systems, the logical processors may not be assigned in APIC order,
but instead re-assigned based on system measurements of Non Uniform Memory Architecture (NUMA)
distances during kernel group formation. Installation of the HP System Providers ensures that the Windows
logical processors are set optimally during kernel group configuration. The HP System Providers are available
as a self-installing, self-extracting system update on the HP Smart Update CD.
Finally, we also encourage users to read all of the relevant Microsoft tuning whitepapers. References and links
to these papers are found at the end of this document, in the For More Information section.
Server Recommendations
The HP ProLiant DL980 scale-up x86 server has a processor-based architecture where each CPU socket is
presented to the OS as a separate NUMA node. The cores on each socket appear to the OS as separate CPUs, and
with Hyper-threading enabled, each core appears as two Logical Processors (LPs). Each CPU also has onboard
local memory controllers that manage the memory attached to that processor. Since accessing local memory is
always much faster than accessing remote memory, you must take some steps to maximize local access and
minimize remote access in order to achieve the best performance from a NUMA server.
BIOS Settings
The ROM-Based Setup Utility (RBSU) is used to set certain configuration parameters at the BIOS or hardware
level. Most of the default settings are fine and the system will run satisfactorily with them, but extensive
testing has shown that changing some of the default settings yields higher performance with certain
workloads.
You can access the RBSU Setup utility by pressing F9 at the ProLiant splash screen during the boot process.
Listed below are the BIOS settings that are critical for peak performance. These settings should be verified at
the first available opportunity and changed as shown, if necessary:
System Options > Processor Options > Hyper-threading > Enable or Disable
(see section on Hyper-threading below)
Power Management Options > HP Power Profile > Custom
Power Management Options > HP Power Regulator > OS Control
Power Management Options > Advanced Power Management Options > Minimum
Processor Idle Power State > C1E (reduces power when possible, for performance + power savings) or
NO C-states (when performance is highest priority; power saving functionality is ignored)
Advanced Options > Advanced Performance and Tuning Options > HW
Prefetch > Enabled
Advanced Options > Advanced Performance and Tuning Options > Adjacent Sector
Prefetch > Enabled
Advanced Options > Advanced System ROM Options > Address Mode 44-bit > Enabled
(this setting is for Windows Server 2008 R2 and Windows Server 2012 only, and mandatory if 1TB
RAM. On Windows Server 2008 SP2, this setting should remain Disabled, since that OS uses a
40-bit address mode.)
Note
However these settings are configured, Microsoft Windows Server operating
systems always recognize the systems logical processors. These are the total
number of cores (when Hyper-threading is OFF), or the total number of
processor threads (when Hyper-threading is turned ON).
4
Hyper-threading Considerations
With Intel Xeon 65xx and 75xx processors, each NUMA node (processor socket) can contain up to 8 CPU cores,
and with the new Intel E7 processor family, up to 10 cores. To extend processing capabilities even further, you
can enable Intel Hyper-threading and each core will appear as two logical processors to the OS. Likewise, a
single physical processor with 10 cores appears as 20 functional processors to both the OS and the DBMS.
HP ProLiant DL980 Servers ship with Hyper-threading enabled by default. Depending on the workload, Hyper-
threading can increase system performance by up to 40% (20% is typical). But in some cases it can increase
contention and thereby cause a drop in performance. Or if the workload does not have enough parallelism,
Hyper-threading can actually increase response time, since each Hyper-threaded CPU is effectively slower than
a physical core. In short, you should always test your particular workload with and without Hyper-threading
before committing to its use.
Embedded adapters (like network card NC375i, video, and so on) are connected to processors 0 (zero) and 1.
The above figure, when used in combination with the following table, should help you determine the best
location for your adapters according to each cards characteristics (such as the number of lanes), with the
ultimate goal of distributing the load equally across all processors.
5
Table 1: PCIe slot, type, and capability
It is far better, performance-wise, to install cards into slots best suited for their characteristics. If for example
an adapter is a PCIe x8 (8 lanes), best throughput is obtained in the available x8 slots. When impossible to do
that, keep in mind that the PCIe x4 slots (with 4 lanes) are generally preferred for lower performance adapters.
Note
The initial version, HP ESO 1.0, supports tuning of HP ProLiant DL980 servers
running Microsoft Windows Server and SQL Server 2008 or SQL Server 2008 R2. HP
ESO 2.x, released in 2012, additionally supports SQL Server 2012.
6
After installing HP ESO, users should run a data collection process that scans the entire solution (server, OS,
SQL Server, storage, and network infrastructure). HP ESO then evaluates the results of its audit and makes
configuration recommendations spanning the entire solution, including the following areas:
SQL Server instance parameters like MAXDOP, CPU Affinity, Lightweight Pooling, and Min/Max
Memory
Storage settings Database File/Log/TempDB locations, RAID Levels, Number of Drives, and Drive
Sizes
Network settings Network Ports, NUMA/Interrupt CPU Affinities
Operating System options such as Power Management
IO Card Configuration and Placement for optimal PCI card placement based on I/O
slot capability
Software version information
HP ESO uses built-in WBEM-based instrumentation for future integration with other management tools such as
HP System Insight Manager. It also provides reporting pages that provide you with graphical displays of various
data collection parameters, for use in analysis and interpretation, and the ability to export these results to a
file. HP ESO can save your current configuration at any time, and with its ability to roll back to previously saved
configurations, you can retrieve and apply those configurations later, if necessary.
For a complete description of HP ESO features, along with instructions for installing, configuring, and using this
powerful tool, refer to the HP ESO User Guide on the following web page, under the User Guide section:
http://www.hp.com/go/proliant-DL980-docs.
HP ESO is available as a self-installing, self-extracting system update on the HP Smart Update CD. To download
the latest ISO image of this CD, go here:
http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=4
268505&prodTypeId=15351&prodSeriesId=4231377&swLang=8&taskId=135&swEnvOID=4024. Locate the
Software - CD-ROM section on this page, and then click the Download button next to Smart Update QFE CD
for x64.
7
Note
Windows Server 2008 supports a maximum of 64 logical processors. For this
reason, on DL980s with 8 processors and 8 cores or more, it is often better to
disable Hyper-threading. In order to fully utilize all of the processors in these
systems it is preferable to run Windows Server 2008 R2 or Windows Server
2012 instead. You can then determine whether your workload performs better
with Hyper-threading enabled or disabled on Windows Server 2008 R2 and
Windows Server 2012.
For more information about this issue, refer to the HP whitepaper, Best
Practices When Deploying Microsoft Windows Server on the HP ProLiant DL980,
at: http://www.hp.com/go/proliant-DL980-docs.
8
powercfg -setacvalueindex scheme_max sub_processor 5d76a2ca-e8c0-402f-
a133-2158492d58ad 1
powercfg -setacvalueindex scheme_balanced sub_processor 5d76a2ca-e8c0-
402f-a133-2158492d58ad 1
powercfg -setactive scheme_current
Note
To revert any power configuration command back to its previous state, rerun
the command with a 0 (zero) at the end instead of a 1. For example:
powercfg -setacvalueindex scheme_min sub_processor
5d76a2ca-e8c0-402f-a133-2158492d58ad 0
Storage Recommendations
Storage is an important factor when considering SQL Server workloads, and appropriate sizing is required. But
that in turn requires a good understanding of an applications I/O characteristics, such as the frequency of reads
and writes, and the amount of data typically moved in those operations. Specific guidelines for calculating the
optimal storage size for your particular application are beyond the scope of this document. Instead, we want to
provide you with some general, workload-dependent recommendations.
9
Fibre Channel transfer speeds
Fibre Channel transfer speeds are as follows:
2 Gbps fibre channel transfer rate = ~180 MB/sec
4Gbps transfer rate = 350MB/sec
8Gbps transfer rate = 680MB/sec
Also be aware that for 8KB I/O rates, many fibre channel host bus adapters (HBAs) have a limit on throughput
below the fibre channel bandwidth.
Having adequate storage (HDDs or SSDs) will definitely help sustain the high I/O required by a demanding SQL
application workload.
You must also consider the characteristics of the workload. Online Transaction Processing (OLTP) workloads
typically perform small, random I/O operations, while Decision Support (DS) workloads (large queries) perform
fewer but larger I/O operations. With OLTP, you are more concerned with the I/O rate than the bandwidth;
however the opposite is true for DS. Obviously, every application is different, and the I/O loads imposed on the
system by those applications are unique.
The Windows Performance Monitor utility (perfmon.exe) provides basic data about I/O rates and throughput.
Use this utility to monitor running applications and obtain the information necessary to design your I/O
configuration.
In addition to the I/O, you must also configure the storage system. Configuration of the storage system is
beyond the scope of this document. But by keeping the preceding rules of thumb in mind, you can configure the
I/O to achieve optimum system performance and gain valuable information about your storage requirements.
Use the Recommended Storport Driver with Fibre Channel Host Bus Adapters
Although this may seem obvious, you must use the driver recommended for your storage environment to
obtain the best performance with fibre channel HBAs. Vendors generally qualify an optimized set of compatible
versions of firmware and driver components. Depending on the system layout, it is often appropriate to use
switch zoning or other methods of segmentation. Multiple paths may improve data availability and eliminate
single points of failure in SAN components, but they also require multi-path software components running on
Windows. And finally, storage vendors often develop their own Device Specific Modules (DSMs). These DSMs
should be used whenever possible because they are optimized for your storage platform.
10
With QLogic HBAs, use the SanSurfer utility to change the QLogic firmware BIOS setting for execution throttle in
NVRAM to be equal to or greater than the number of physical drives seen by that HBA (default = 16(dec), or
0x10h).
Be aware that these same guidelines can apply to fibre channel RAID controller as well. Many RAID controllers
have a configuration option to return a busy status when a queue depth limit is exceeded. You should verify that
these options are appropriately configured, based upon the number of disks in the LUN.
Note
While increasing the queue depth often provides benefits for a given server, it can
also have detrimental effects on other servers utilizing the same storage array. You
should refer to the HP SAN Design Guide and consider the potential impact to other
servers if you increase the queue depth beyond the values recommended in the HP
SAN Design Guide.
Network Recommendations
SQL network traffic packets are typically small, so the maximum achievable bandwidth of a network link often
cannot be fully utilized. We recommend that you not exceed 15,000 packets/second on a Gigabit link. Network
Interface Controller (NIC) teaming is an option, but does require additional overhead that could affect
performance.
With 10Gb links, the limiting factor is usually the CPU power available for handling interrupts. For this reason,
there is a need to enable some more advanced features using Receive Side Scaling, or RSS.
Setting Up RSS
Receive-side scaling is enabled by default in Windows Server 2008, Windows Server 2008 R2, and Windows
Server 2012. In order to take advantage of RSS, network drivers must be written with RSS capabilities. RSS
cannot make use of more than 64 processors with Windows 2008 R2 even with SP1, at the time of SP1s release
in February, 2011; however, Windows Server 2012 addresses this issue and enables K-group support (for more
information about RSS under Windows Server 2012 see:
http://technet.microsoft.com/en-us/library/jj574168#bkmk_rss).
Use the netsh command to enable or disable RSS. If for some reason it is currently disabled on your server,
enter the following command to turn it back on:
netsh interface tcp set global rss=enabled
Modern network drivers are configured to use RSS through settings found in the Windows Device Manager, in
the Advanced Properties of the network interface, by following these steps:
1. Open the Device Manager and expand Network Adapters.
11
2. Right-click the adapter you want to configure, and select Properties.
3. On the Advanced tab, locate the Receive-side Scaling property and verify that the value is Enabled. If
not, enable it.
4. Click OK and exit the Device Manager.
Other advanced properties can be enabled too, depending upon driver implementation. These properties
correspond to the registry entries described in the following table:
Note
These settings apply only to R2 and NDIS 6.2-compliant drivers.
For 6.1-compliant drivers, use the global NDIS RSS settings documented in the
whitepaper, Receive-Side Scaling Enhancements in Windows Server 2008, found at:
http://msdn.microsoft.com/en-us/windows/hardware/gg463253
The values for these properties can vary depending on the cards and their configuration. In most cases these
properties are configured correctly by default. But it is important to verify those default assignments on servers
with several network interfaces, and override them if necessary, to better handle receive-intensive workloads.
Before making changes to these settings however, HP also advises the engagement of HP support or consulting
services to provide configuration-specific advice.
12
5. If you change the setting, you must reboot the server.
On the Gigabit Ethernet NIC driver, Coalescing is enabled by default. Coalesce buffers are used to copy
fragments of a transmit packet before assigning them a transmit descriptor. This reduces the number of
transmit descriptors required for each packet transmission.
For Intel NIC drivers, set the Interrupt Moderation Rate to High or Extreme. For other NICs, set the Interrupt
Moderation Rate to minimize CPU utilization at the expense of higher latencies, or lower latency for higher CPU
utilization (and more interrupts). In general, the former is recommended unless the application requires
extremely low latencies. If in doubt, a good compromise is to set the interrupt moderation to Adaptive.
On systems running Windows Server 2008 R2 in which the operating system sees more than 64 logical
processors (either with Hyper-Threading on, or when each processor has 10 cores), an instance of SQL Server
2008 will be limited to the K-group size in terms of processor count. For example, a DL980 with 8 processors of
10 cores each (for a total of 80 logical processors) will show two K-groups of 40 processors each if optimized;
therefore, SQL Server 2008 running on this system would indicate a process count of 40 in the SQL Server
errlog.
The K-group size will be 64 logical processors maximum, but can be adjusted by the HP DL980 System
Providers, or a BCDedit switch, or by altering the registry as described in the MS Knowledge Base:
http://support.microsoft.com/kb/2506384.
13
Note
To use this capability, the user running the instance of SQL Server (typically
Administrators) must have the Lock Pages in Memory capability enabled.
When Lock Pages is used for SQL Server, an entry is logged in the SQL Server error
log. You should set a max server memory value for the SQL Server instance(s) to
ensure the operating system keeps a portion of the RAM for its own operation. A
performance monitor helps to determine the best value here.
You also need to enable Trace Flag 845 with SQL Server 2008 R2 Standard Edition, in
order to use locked pages for the buffer pool, along with granting the SQL Server
service account the Lock Pages in Memory security privilege.
14
5. Verify the Optimize for performance radio button is selected.
6. Below that, verify that the Enable write caching on the disk and Enable advanced performance boxes
are checked (see Figure 2).
If the cache is enabled at the hardware level, the first checkbox is usually selected but the second is
not. The second checkbox is critical for maximum write performance on the log. These checkboxes can
be selected while the system is running. The performance change is immediate and does not require a
reboot.
7. Click OK, select the device(s) again, and verify that the boxes are still checked.
Note
Any change in the log disk hardware or cache configuration can cause the operating
system to deselect these checkboxes.
Also be aware that many if not most fibre channel RAID controllers ignore whatever
Write Back cache option is selected via the Windows interface, so you must use the
RAID controllers configuration utility to enable this option.
15
Note
For Windows Server 2012: The same performance related options for every disk are
located under Disk > Properties > Policies.
Note
Soft NUMA (discussed below) can also accomplish this task.
Before changing the affinity settings, keep in mind that the OS assigns deferred procedure call (DPC) activity
associated with NICs to the highest numbered processor in the system. In systems with more than one active
NIC, each additional cards activity is assigned to the next highest numbered processor. For example, an 8-
processor system with two NICs has DPCs for the NICs assigned to processor 7 and processor 6.
Syntax for the alter server command can be found at Microsofts SQL Server Books Online:
For SQL Server 2008 R2: http://msdn.microsoft.com/en-us/library/ee210585.aspx
For SQL Server 2012: http://msdn.microsoft.com/en-us/library/ee210585 (v=sql.110).aspx
The following example sets the affinity of SQL on a fully-loaded DL980, leaving it OFF of CPUs 2, 4, and 28,
thereby allowing the NIC interrupts to reside there:
alter server configuration set process affinity cpu=0,1,3,5 to
27, 29 to 127
Configuration changes like this take place immediately, and can be done with the workload active. Any threads
on the excluded processors will continue to run on those processors to completion, but no new threads will be
assigned to the excluded LPs.
16
Application-dependent SQL Parameter Recommendations
Use the sp_configure stored procedure to optimize resources. To modify the advanced configuration
options, we recommend that you first set the Show Advanced Options property, then run RECONFIGURE and
restart the SQL Server instance, by entering the following:
1. sp_configure Show Advanced options
2. GO
3. RECONFIGURE
Note
Higher MaxDOP values can be used, of course, but OLTP performance degrades as
this value rises to higher levels. As always, you must experiment with your workload
to find the optimal setting.
You must also check the transaction log latency and write size. Since every transaction committed in SQL must
be written and committed to the log, the SQL log can easily become a bottleneck and limit system performance.
A quick check with the Windows Performance Monitor utility can easily verify this.
Log Write Service times should be very low (about 1ms). If they are not, the cache could be disabled, either
within the array or in Windows. Average Log Write Sizes of greater than 30K can also indicate a bottleneck,
possibly caused by a disabled cache. Or it can indicate that more log disks or a larger cache size is required.
If a software RAID0 stripe across multiple arrays is used for the log, it may sometimes appear that no log
bottleneck exists, when in fact one does. If the sum of the average Log QueueDepth across all of the Logs RAID
arrays multiplied by the average log write size approaches (or is greater than) 64KB, you might have a log
bottleneck. Measuring this requires that the log be on separate LUNs from the rest of the database.
Remember to spread all tables over multiple files and over multiple disks (the more spindles the better). Do not
software-stripe LUNs that are already hardware-striped either. While this makes them easier to manage, it
degrades performance considerably.
Drive Considerations
Use the following guidelines regarding database storage:
As noted above, if database storage latency is greater than 2 to 3 times what is was unloaded, you
should add more storage.
Keep random IOPS to <120/sec/spindle for 15KRPM drives and 100/sec/spindle for 10KRPM drives.
For Gigabit NICs, keep packets below 30K/sec. Use the sp_configure stored procedure to set
Lightweight Pooling to 1.
When Lightweight Pooling is set to 1, SQL Server switches to Fibre Mode scheduling. In the event of
excessive context switching, Lightweight Pooling provides better throughput by performing context
switching inline, which reduces user/kernel mode context transitions. Note that Fibre Mode is generally
not effective with CPU utilization below 80 percent on a few or all CPUs.
By default, SQL Server uses one thread per active SPID or user process. These threads work in a pooled
configuration to keep the number of threads manageable. The advanced Lightweight Pooling
configuration option (sometimes referred to as Fibre mode) uses Windows "fibre" support to handle
several execution contexts with a single thread.
17
Lightweight Pooling Restrictions
SQL Server Agent (Microsoft KB303287): When SQL Server runs in Lightweight Pooling mode (or Fibre
mode) and the DTC service is started, unexpected behavior may occur. SQL Server Agent might not
execute any jobs.
DTC (Microsoft KB303287): If DTC operations are required on the server, the SQL Server instance
should always run in Thread mode. In other words, Lightweight Pooling should be set to 0 (zero).
Microsoft strongly recommends that you run the SQL Server instance in Thread mode when DTC is
needed. If you use Lightweight Pooling/Fibre mode on a system that does not specifically need it,
performance often degrades.
sp_xml_and *sp_OA* functionality (Microsoft KB322884): Microsoft does not support the use of
Microsoft Common Language Runtime (CLR) extended stored procedures or OLE Automation with any
libraries loaded to run in the SQL Server memory space. CLR only uses thread-based scheduling and
does not support fibre-mode scheduling. In later versions of SQL, you cannot load CLR by using
extended stored procedures or sp_OA stored procedures.
For more information about the lightweight pooling option, refer to:
http://msdn.microsoft.com/en-us/library/ms178074.aspx.
Other Considerations
Use NUMA support to reduce remote memory access. This can improve performance up to 60%, especially when
combined with Connection Affinity. In general, segregate connections and thus data locality in a given node (for
example, by region or department, or any other logical division that makes sense for the application). When
NUMA support is enabled, SQL attempts to create a threads data structures in the same NUMA node, thereby
reducing remote memory accesses.
In SQL Server, NUMA is enabled by default. All versions of SQL Server mentioned in this white paper also provide
a Soft NUMA feature, again enabled by default, which enables more precise control as described below.
Use Connection Affinity to take further advantage of SQL Server NUMA features. With Connection Affinity, a SQL
connection from the client is assigned affinity to a specific NUMA node. This assigns data structures to that
NUMA node, further enhancing the NUMA capabilities of SQL. When Connection Affinity is enabled for all
network connections, the requirement to use VIA for affinity no longer applies. Moreover, multiple connection
ports may be used on a single hardware network adapter, further increasing the flexibility.
Soft NUMA allows database administrators to configure pseudo-NUMA nodes that SQL Server treats like
hardware nodes. You can configure Soft NUMA nodes down to 1 processor, allowing fine control of connection
affinity and workload distribution. In addition, smaller x86 servers without hardware NUMA capabilities can still
run SQL employing Soft NUMA. These machines do not employ the NUMA concept of local and remote memory
access, but they do allow the SQL administrator to balance the workload down to the level of a single processor.
If SQL Server is run with no Soft NUMA nodes configured, then the hardware NUMA configuration is used. The
NUMA configuration, hard or soft, is written at startup to the SQL log.
To use these features, follow these steps:
1. Use regedit to configure the Soft NUMA nodes and port listen strings.
2. Restart SQL.
3. Set clients to use ports configured above.
It is also recommended that you install the HP System Providers v 9.0 or later on the system and use the
Optimize Logical Processor Configuration button in the HP System Management Homepage to preset your
logical processor and kernel groups (for nodes/group optimization).
A Configuration Example: How to Create 4 Soft NUMA Nodes with 2 CPUs Each
This example is for systems running SQL Server 2008 R2, and specifies the Group parameter, which applies only
to systems with >64 LPs. This example should be ignored for SQL Server 2008, or if the executing system is
equipped with 64 LPs.
First, run regedit and add the following entries to the system registry:
18
HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\100\NodeConfiguration
The CPUMask value is a bitmask of CPUs relative to the system. So Node0 described above includes CPUs 0 and
1, Node1 includes CPUs 2 and 3, and so on. There is no actual limit of 4 NUMA nodes; this is just an example.
Note
Although SQL allows configuration of Soft NUMA nodes that cross hardware NUMA
node boundaries, this is not recommended because it results in excessive remote
memory accessing.
Second, create the port listening strings using SQL Server Configuration Manager. Under SQL Server, select
Network Configuration > Protocols for MSSQLSERVER > TCP/IP > Properties > IPAddresses > IPAll > TCP Port.
You can also use regedit to modify the following key in the system registry:
Portno1[SoftNumaNodeMask],PortNo2[SoftNumaNodeMask2] PortNo N
[SoftNumaNodeMask N]
So the four ports for the Soft NUMA machines in this example would be:
1436[0x1],1437[0x2],1438[0x4],1439[0x8]
Note
Unlike SQL2000 which uses CPU masks directly in the listening string, SQL Server
specifies Soft NUMA masks. In other words, SQL Server employ a two-level
definition, while SQL2000 is only one-level.
The port number is the point of connection between the client and a specific Soft NUMA node. So if a client
application wanted to connect to Soft NUMA Node 0, it would specify:
SQLCMD E -Sservername,1436
19
Note that the default port, 1433, can also be used. This results in establishing connection affinity with all CPUs
in the system. However the heavy load imposed by these connections will degrade the maximum potential of
the system. Another alternative to SQL destination port-based client affinity selection is to use multiple NICs
and, for each one, set its own IP addresses for SQL Server to use.
Conclusion
To reiterate, just as Windows Server 2008 can only support 64 logical processors, SQL Server 2008 has the
same limitation. For this reason you should only run SQL Server 2008 on the HP DL980 when Hyper-threading is
disabled. To best utilize the full computing power of the DL980, you should always run Windows Server 2008 R2
or Windows Server 2012 with SQL Server 2008 R2 or SQL Server 2012 now as the Database Management
System (DBMS).
HP ProLiant DL980 Servers provide powerful computing and memory resources. To enable SQL Server to take
full advantage of these resources, we recommend hardware and software tuning. By applying some or all of the
guidelines in this document, you will achieve the highest performance from your server.
20
For More Information
For more information about HP ProLiant DL980 servers:
http://www.hp.com/servers/DL980
For additional best practices white papers for the HP ProLiant DL980:
http://www.hp.com/go/proliant-DL980-docs
To download the recommended components described in this document, along with other drivers and software,
visit the HP ProLiant DL980 Support web page:
http://www.hp.com/support/dl980g7
Documentation Feedback
HP welcomes your feedback. To make comments and suggestions about product documentation, send a
message to: [email protected]. Include the document title, part number, and filename found at the
end of the URL string (for example: c02861709.pdf). All submissions become the property of HP.
Copyright 2011-2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The
only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing
herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained
herein.
Microsoft, Windows, and Windows Server are U.S. trademarks of Microsoft Corporation.