Queue Depth
Queue Depth
technical challenges with advanced storage solutions and global data management strategies.
Technical Report: NetApp Storage Controllers and Fibre Channel Queue Depth
Version 1.0.2
TECHNICAL REPORT
TECHNICAL REPORT
TECHNICAL REPORT
Table of Contents
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Introduction The Target Queue Depth issue Queue Depth and Performance Maximum Queue Depth Recommendations Setting Queue Depths on Solaris Hosts Solaris LUN starving issue Setting Queue Depths on Windows Hosts Setting Queue Depths on HP-UX Hosts Setting Queue Depths on AIX Hosts Setting Queue Depths on LINUX Hosts Setting Queue Depths on VMWARE Hosts Queue Depth Example Diagrams 3 3 4 8 9 9 10 12 14 14 15 15 16
TECHNICAL REPORT
1.
Introduction
This document is indented to help with the understanding and configuration of host HBA queues to ensure error free operation while maximizing performance. This documents audience is Network Appliance customer facing technical personnel (SE, PSE, and GS) who are charged with maximizing the reliability and performance of our solutions to win business in fibre channel environments.
2.
The number of outstanding I/O requests, SCSI commands, from the hosts initiator HBA to the Storage Controllers target HBA has a direct impact on the scalability and performance of the overall solution. Each I/O request consumes a queue entry. The Storage Controller can handle only so many commands before it must reject commands with QFULL, indicating that it has no more room to queue commands for processing. The higher the number of outstanding SCSI commands, the higher the number of queue entries that are utilized. Typically, the higher the number of commands a host can have queued to the Storage Controller the better the performance, up to a certain point. However, the higher the number of queue entries per host, the greater the chance of the Storage Controller target HBA becoming overwhelmed, thus creating a queue full condition (QFULL). Because of this, if a large number of hosts are accessing a Storage Controller or the queue depths are increased, careful planning should be done to avoid QFULL conditions as they significantly degrade system performance and can lead to errors on clustered or AIX systems.
Tuning the queue depth will have no effect on shortening the response time for IO operations but may increase the total numbers of IO/second for the total system environment.
In a configuration with multiple-initiators (hosts), uneven queue depths between hosts can cause hosts with a small queue depth to be starved in comparison to hosts with a large queue depths accessing the same Storage Controller FC target port. For this reason it is generally recommended to try to have all the hosts accessing the Storage Controller to have similar queue depths, although, having different queue depths among machines can be used for tuning purposes.
TECHNICAL REPORT
3.
It was stated earlier that the higher the number of queue entries the better the performance. It is perhaps more accurate to say that an adequate queue depth is needed to achieve maximum performance. Going above the needed queue amount will not increase throughput.
Littles Law or Theorem can help calculate the number of queue entries that are needed. Littles Law states: Throughput Volume = Work In Progress / Cycle Time. Translated into storage IO terminology this would equate to: IO/s = Queued Requests / Response Time Translating to calculate the queue depth equates to: Needed Queue Depth = (# IO/s) * (Response Time)
The following two tables below show Littles Theorem in a tabular form for two different response times. The 2ms response time was chosen to represent accesses directly from cache and the 20ms was chosen for non-cached or disk accesses. A response time considerably higher than 20ms typically will lead to unsatisfactory performance so planning for a queue depth to accommodate longer response times probably doesnt make sense with some excepts like backup systems. Testing of Storage Controller performance shows that the 2 and 20ms values are certainly pessimistic values but for the purposes of planning for queue depth, it is better to use a slightly larger value than expected to be on the safe side. The tables can be used to determine what the total available queue depth needs to be in order to allow the desired number of IO/s at the given response times. Also displayed in the tables are the rough numbers of IO/s that a particular Storage Controller model can deliver, cached and uncached.
TECHNICAL REPORT
According to Littles law given in the beginning Needed Queue Depth = (# IO/s) * (Response Time). For ex, If we take a fas980c the IOPs for cached reads is ~162500 and with a desired response time of 2ms we would need a queue depth of 325. Similarly a FAS3050c would need a queue depth of ~440. (220,000 IOPs * 2ms). We would be able to calculate the needed queue depth, roughly, for each of the platforms using the method shown above depending on the IOPs for each of the platforms. The same principle holds true for calculating the needed queue depth for uncached performance as well.
The table shows that to get the maximum uncached performance from a 980C Storage Controller a total queue length of approximately 1250 is needed, or for maximum cached performance from a 3020C - a total queue depth of approximate 275 is needed. Typically, the maximum queue depth for a single FC host port accessing the Storage Controller is 256 meaning that in order to insure that queue depth is not limiting the performance, multiple hosts or multiple connections from a single host to the Storage Controller may be needed. Note, simply because the queue depth is adequate certainly
TECHNICAL REPORT
doesnt mean that full Storage Controller performance is obtainable, other factors such as host CPU, FC HBA throughput etc very well might limit performance. On some systems each LUN or disk device file the sum of the used LUN queues also needs to meet the minimum required queue depth.
The diagram below shows two different configurations for a Solaris or Windows host using an Emulex FC HBA and it is assumed that the maximum HBA target queue depth of 256 is used. Configuration #1, where the HBA can see two Storage Controller target ports, the total queue depth available would be 512 since the HBA would reserve 256 queue spots per Storage Controller target port. Configuration #2 would have a total queue availability of 1024 since each of the two installed HBAs can see two Storage Controller target ports. Using the example above, configuration #2 is the only one that would ensure that adequate queue depth was available. Under real world circumstances there would most likely be multiple hosts connected to the Storage Controller meaning that a much smaller queue depth would be needed per HBA to ensure adequate total queue depth for full Storage Controller performance.
TECHNICAL REPORT
Port 5b - Queue
FC Switch
Port 5a - Queue
Port 5b - Queue
Port 5b - Queue
Port 5a - Queue
FC Switch
Port 5b - Queue
TECHNICAL REPORT
4.
If the total queue depth available on the HBA configured to access a Storage Controller target port exceeds the available queue slots on that target port the host will received QFULL error messages from the Storage Controller when the Storage Controller is under very heavy loads. The QFULL condition normally only occurs under heavy loads because the full queue allocation is not used under normal conditions. The Storage Controller returning QFULL status can have the effect of the host automatically scaling down its queues which can have a negative impact on performance. Experience has shown that Solaris handles this situation quite well, Windows and Linux acceptably and HP-UX aggressively reducing the available queues. On clustered systems where access to a quorum disk is being performed or on AIX systems, QFULL can lead to fatal errors.
To determine if the available queue space on the Storage Controller is not being over allocated the sum of the queue slots on the HBAs that have a active logical path to the Storage Controller FC port simply need to be less than or equal to the slots available on the Storage Controller FC Port.
As an equation this can be represented as: Storage Controller FC Port queue depth >= (Logical Connection #1 * HBA queue length) + (Logical Connection #2 * HBA queue length) + .. (Logical Connection #N * HBA queue length)
If the sum of the available LUN queue lengths which are actively accessing a LUN through a Storage Controller FC port is less than the sum of the HBA queue depths this smaller value can be used: Storage Controller FC Port queue depth >= (LUN Path#1 * LUN queue length) + (LUN Path#2 * LUN queue length) + .. (LUN Path#N * LUN queue length). [From the Storage Controller side the Qfull column from the lun stats o command will tell us if the target ports are getting overwhelmed]. The diagrams at the end of this document show the summing of the HBA queues for several different configurations. Note: For specific Q-depths for each NetApp storage controller platform, check the FC and iSCSI Configuration guide for details.
TECHNICAL REPORT
5.
Recommendations
The following general recommendations can be made in regards to tuning the queue depths: For small to midsize systems use a HBA queue depth of 32. For large systems use a HBA queue depth of 128. For exception cases or performance testing use a queue depth of 256 to avoid any possible queuing problem. All hosts should have the queue depths set to similar values to give equal access to all hosts. Ensure the Storage Controller target FC port queue depth is not exceeded to avoid performance penalties or errors.
6.
HBA Queue Depth To update the queue depths the Emulex HBA on a Solaris host, please follow the procedure outlined below: 1) # cd /kernel/drv 2) # vi lpfc.conf 3) search for /tgt-queue (/tgt-queue) tgt-queue-depth=32 4) 5) 6) 7) The default from NetApp is set at 32 at install. Based on the configuration of your environment set the desired value. Save the file Reboot the host with the sync; sync; sync; reboot -- -r command
LUN Queue Depth For LUN Queue Depth, use the following rule: The number of LUNs in use on a host multiplied (*) by the per-LUN throttle (lun-queue-depth) must be less than or equal (<=) to the tgt-queue-depth value on that host.
TECHNICAL REPORT
For example: The number of LUNs on a host = 2 The lun-queue-depth = 16 The tgt-queue-depth (default) = 32
Following the rule above: 2*16=32 and 32 is equal to the tgt -queue-depth
If you need to have additional LUNs then increase the HBA queue depth or reduce the per-LUNthrottle.
For Qlogic HBAs In qlogic (qla): # Maximum number of commands queued on each logical unit. # Range: 1 - 65535 hba0-execution-throttle=16; *NOTE: In Qlogic though the comment says on each logical unit, since we only specify HBA instance number, the queue depth is neither per LUN nor per target but per instance of the HBA. The granularity here is at HBA port level and not per target/LUN level. Queue depth in Sun stack: Currently the Leadville drivers do not allow for a per LUN or per target max_throttle setting at HBA level. As of now you must set the global ssd_max_throttle setting in the /etc/system file. However, you can set throttle values on per device type basis in Sun's target drivers using sd.conf or ssd.conf.
7.
On a single host, a fast load on a given LUN can be starved by one or more slow loads on other LUNs. The lun-queue-depth setting within the host HBA configuration exists to prevent some active LUNs from consuming all available target queue entries and thereby starving other LUNs.
LUN starvation is a phenomenon in which the access patterns of processes on a single host to multiple LUNs on one target cause the requests destined for one or more of those LUNs to be denied service. This situation will occur if the following events occur:
10
TECHNICAL REPORT
1) The sum of the lun-queue-depths for some subset of the LUNs at the target exceeds the target-queue-depth. 2) The number of outstanding requests to each of the LUNs in that set constantly exceeds the lun-queue-depth. 3) There exist any other LUNs at the target whose number of outstanding requests (from that host) falls to zero.
Any such LUNs whose number of outstanding requests falls to zero will be subsequently denied service. This can happen in any situation where there is very heavy I/O running to a set of LUNs. The sum of whose lun queue depths is greater than the target queue depth on that host.
You can determine if you have LUN starvation by examining the output from the Solaris command, iostat -xznI 1 . This command shows the I/O activity on all active LUNs at 1second intervals. The -I option reports the count of requests completed in each interval rather than the rates. If there is some set of LUNs which consistently shows 0 under the columns r/i and w/i and non-zero values under the columns wait or actv, then those LUNs are experiencing starvation. The output from iostat will show the devices with names such as c3t2d30s2. The numbers following the c, t, and d identify the HBA, target, and LUN respectively.
If starvation is due to the situation described above, you will also see a set of devices with the same HBA and target identifiers but different LUN identifiers with constant non-zero values under the columns r/i and/or w/i and non-zero values under the columns wait and/or actv.
The solution to this problem is to tune the LUN (lun-queue-depth) and HBA target queue depths (tgtqueue-depth) in the /kernel/drv/lpfc.conf file so that the sum of the LUN queues is less than or equal to the HBA target queue length.
11
TECHNICAL REPORT
8.
For EMULEX To update the Emulex HBA queue depths on a Windows host, follow the procedure outline below: 1) Run the provided Emulex HBA utility LPUTILNT, located in the c:\WINNT\system32 directory. 2) In the pull down menu on the right hand side of the LPUTILNT, select Drive Parameters.
12
TECHNICAL REPORT
If the QueueDepth value is to be set greater than 150 the following Windows Registry value also needs to be increase appropriately: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\lpxnds\Parameters\Device\Numbe rOfRequests
For a Qlogic HBA, Invoke the SANsurfer HBA manager utility. Click on the HBA port -> Settings -> Advanced HBA port settings drop down box. The relevant parameter is Execution throttle. A screen shot is given below.
One more way is to invoke the registry editor to make the necessary changes. Select HKEY_LOCAL_MACHINE and follow the tree structure down to the QLogic driver as follows: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Ql2300\Parameters\Device Double click on DriverParameter: REG_SZ: qd=32. If the string qd= does not exist, append to end of string ; qd=32. Enter a value up to 254 (0xFE). The default value is 32 (0x20).
13
TECHNICAL REPORT
9.
The LUN or device queue depths can be changed using the HP-UX kernel parameter scsi_max_qdepth (default value of 8, max value of 255). Starting with HP-UX 11 this variable can be dynamically changed on a running system using the -u option on the kmtune command. This change is effective for all devices on the system. For example, to increase the LUN queue depth to 64 the following command would be used: kmtune -u s scsi_max_qdepth=64 It is possible to change this value for individual device file using the scsictl command but changes using the scsictl command are not persistent across system reboots. For example to view and change the queue depth for a particular device file the following commands can be executed: scsictl -a /dev/rdsk/c2t2d0 scsictl -m queue_depth=16 /dev/rdsk/c2t2d0 The HBA queue depth can be changed using the kernel parameter max_fcp_reqs (default value of 512, max value of 1024). This is a static variable meaning that the kernel will need to be rebuilt and the system rebooted for any changes to take effect. The command needed to change the HBA queue depth to 256 is: kmtune -u s max_fcp_reqs=256
10. Setting
The LUN or device queue depths can be change using the AIX chdev command. For example to change the queue depth for the hdisk7 device the following com mand would be used: chdev -l hdisk7 -a queue_depth=32 Changes made using the chdev command are persistent across reboots. The default queue_depth value for NetApp device types (disk//fcp/netapplun) is 12 and the maximum value is 256. Changes to the HBA queue depth can be made using the chdev command. For example to change the queue depth for the fcs0 HBA device the following command would be used: chdev -l fcs0 -a num_cmd_elems=128 Changes made using the chdev command are persistent across reboots. The default value for num_cmd_elems is 200 and the maximum value is 2048. It might be necessary to take the HBA offline to change the value and then back online using the rmdev l fcs0 R and mkdev l fcs0 P commands.
14
TECHNICAL REPORT
11. Setting
For LINUX machines, the /etc/modprobe.conf or /etc/modprobe.conf.local file will have to be edited and the appropriate options, depending on the type of card installed, will have to be set. For an Emulex HBA the relevant parameters are, Lpfc-hba_queue depth (8192, 32, 8192 ) and lpfc_lun_queue_depth (30,1,128 ) [Default, Minimum, Maximum Values depicted in the brackets]
HBAnyware or hbacmd could also be used to change the parameters. After the changes are made, the driver must
be reloaded and a new ramdisk image will be needed for the parameters to take effect. For a Qlogic HBA the relevant parameter is, Ql2xmaxqdepth The default value is 32. The driver should be reloaded and a new ramdisk image should be built. Follow the specific Linux guides for details on reloading the driver and building a new ramdisk image.
12. Setting
It is recommended to use the esxcfg-module command to change the HBA timeout settings; hand-editing the esx.conf file is not recommended. The Execution Throttle/Queue Depth signifies the maximum number of outstanding commands that can execute on any one HBA port. The default for ESX 3.0.1 is 32, To set maximum queue depth for a QLogic HBA 1 Log on to the service console as the root user. 2. Verify which Qlogic HBA module is currently loaded #vmkload_mod l 3. For a single instance of a Qlogic HBA, run the following commands. The example uses qla2300_707 module. Use the appropriate module based on the output of vmkload_mod l. #esxcfg-module s ql2xmaxqdepth=64 qla2300_707 4 Save your changes and reboot the server #/usr/sbin/esxcfg-boot b #reboot 5. Confirm the changes by issuing, #esxcfg-module -g qla2300_707 qla2300_707 enabled = 1 options = 'ql2xmaxqdepth=64' To change the queue depth of an Emulex HBA 1 Log on to the service console as root. 2 Verify which Emulex HBA module is currently loaded:
15
TECHNICAL REPORT
#vmkload_mod -l | grep lpfcdd Depending on the model of the HBA, the module can be one of the following: lpfcdd_7xx lpfcdd_732 3 For a single instance of an Emulex HBA on the system, run the following commands. The example shows the lpfcdd_7xx module. Use the appropriate module based on the outcome of vmkload_mod -l #esxcfg-module -s lpfc0_lun_queue_depth=16 lpfcdd_7xx In this case, the HBA represented by lpfc0 will have its LUN queue depth set to 16. 4 For multiple instances of an Emulex HBA being present on the system, run the following commands: a esxcfg-module -s "lpfc0_lun_queue_depth=16 lpfc1_lun_queue_depth=16" lpfcdd_7xx In this case, both HBAs lpfc0 and lpfc1 will have their LUN queue depths set to 16. 5 #esxcfg-boot b #reboot
13. Queue
Device Queues:
One queue per device file. If there are multiple paths to a LUN there are multiple queues for the LUN.
HP-UX has default device queue depth of 8. Solaris has default of 16. AIX has a default of 12 (with NetApp HAK). Windows has a default of 32.
16
TECHNICAL REPORT
On Solaris / Windows (X1050A Emulex) - One queue per HBA per target (i.e. Storage Controller FC Port) the HBA is connected to. On HP-UX/AIX - One queue per HBA regardless of the number of targets ( i.e. Storage Controller FC Ports) the HBA connects to. HP-UX has a default HBA queue depth of 512.
AIX has a default HBA queue depth of 40. For larger systems a value of 128 is recommended.
17
TECHNICAL REPORT
Host #2
Host #3
If sum of all HBA target queues exceeds the available Storage Controller queue space hosts can receive SCSI QFULL response under heavy load. Windows, HP -UX, Linux and Solaris hosts will slow down IOs dramatically if QFULL received QFULL on AIX equates to IO errors! QFULL on a cluster quorom disk is fatal!
6
Host #NN:
18
TECHNICAL REPORT
Host HBA Queue to Storage Controller Queue; Single Headed Storage Controller Example 1
H o s t # 1 H o s t # 2
H B A # 1 H B A # 2 Queue1 H B A # 1 H B A # 2 Queue1
Queue1
Storage Controller #1
Port 5a - Queue
Port 5b - Queue
Queue1
Port 7a - Queue
H o s t # 3
H B A # 1 H B A # 2
Queue1
Queue1
H o s t # 4
H B A # 1 H B A # 2
Queue1
Port5a-Queue Length >= Host#1,HBA#1,Queue1 + Host#2,HBA#1,Queue1 Port5b-Queue Length >= Host#3,HBA#1,Queue1 + Host#4,HBA#1,Queue1
Queue1
Port7a-Queue Length >= Host#1,HBA#2,Queue1 + Host#2,HBA#2,Queue1 Port7b-Queue Length >= Host#3,HBA#2,Queue1 + Host#4,HBA#2,Queue1
7
19
TECHNICAL REPORT
Host HBA Queue to Storage Controller Queue Single Headed Storage Controller with HP -UX and Windows Host
HP-UX or AIX Host
H o s t # 1
H B A # 1 Queue1
Storage Controller #1
Port 5a - Queue
H B A # 2
Queue1
Port 5b - Queue
H o s t # 2
H B A # 1
Queue1 Queue2
H B A # 2
Queue1 Queue2
Port5a-Queue Length >= Host#1,HBA#1,Queue1 + Host#1,HBA#2,Queue1 Port5b-Queue Length >= Host#2,HBA#1,Queue1 + Host#2,HBA#2,Queue1 Port7a-Queue Length >= Host#1,HBA#1,Queue1 + Host#1,HBA#2,Queue1 Port7b-Queue Length >= Host#2,HBA#1,Queue2 + Host#2,HBA#2,Queue2
20
TECHNICAL REPORT
Host HBA Queue to Storage Controller Queue Clustered Storage Controller Example 2
Windows or Solaris Host
H o s t #
1 Queue4 H B A # 1 Queue1 Queue2 Queue3
Storage Controller #1
Port 5a - Queue
Port 5b - Queue
H B A # 2
Port 7a - Queue
Queue1 Queue2 Queue3 Queue4
Storage Controller #2
Port 5a - Queue
H o s t #
2
H B A # 1
Queue1
Port 5b - Queue
H B A # 2
Queue1
Port 7a - Queue
12
21
TECHNICAL REPORT
Host HBA Queue to Storage Controller Queue Clustered Storage Controller Example 3 270c
Storage Controller #1 Windows or Solaris Host
H o s t #
1 H B A # 1 Queue1 Queue2
Port 0c - Queue
H B A # 2
Queue1 Queue2
Storage Controller #2
Queue1
Port 0c - Queue
H B A # 2
Queue1
Storage Controller#1-Port0c-Q >= Host#1,HBA#1,Q1 + Host#1,HBA#2,Q1 + Host#2,HBA#1,Q1 + Host#2,HBA#2,Q1 Storage Controller#2-Port0c-Q >= Host#1,HBA#1,Q2 + Host#1,HBA#2,Q2 + Host#2,HBA#1,Q1 + Host#2,HBA#2,Q1
13
22