Redbook IBM (SVC-2145) Best Practices
Redbook IBM (SVC-2145) Best Practices
Jon Tate Deon George Thorsten Hoss Ronda Hruby Ian MacQuarrie Barry Mellish Peter Mescher
ibm.com/redbooks
International Technical Support Organization SAN Volume Controller: Best Practices and Performance Guidelines March 2008
SG24-7521-00
Note: Before using this information and the product it supports, read the information in Notices on page xi.
First Edition (March 2008) This edition applies to Version 4.2 of the IBM System Storage SAN Volume Controller.
Copyright International Business Machines Corporation 2008. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii The team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Chapter 1. SAN fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 SVC SAN topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Topology basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.3 ISL oversubscription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.4 IBM 2109-M12/Brocade 12000 in an SVC environment . . . . . . . . . . . . . . . . . . . . . 4 1.1.5 Switch port layout for large edge switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.6 Switch port layout and hardware selection for director-class core switches . . . . . . 5 1.1.7 Single switch SVC SANs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.8 Basic core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.9 Four-SAN core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.10 Cisco VSANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.11 Common topology issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2 Tape and disk on your SAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3 Switch interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Distance extension for mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.1 Optical multiplexors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.2 Long-distance SFPs/XFPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.3 Fibre Channel: IP Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5 Zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.5.1 Type of zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.5.2 Pre-zoning tips and shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5.3 SVC cluster zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5.4 SVC: Storage zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5.5 SVC: Host zones. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5.6 Sample standard SVC zoning configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.5.7 Zoning with multiple SVC clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.5.8 Split controller configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.6 Switch Domain IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.7 TotalStorage Productivity Center for Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Chapter 2. SAN Volume Controller cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Advantages of virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 How does the SVC fit into your environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Scalability of SVC clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Advantage of multi cluster as opposed to single cluster . . . . . . . . . . . . . . . . . . . . 2.2.2 Performance expectations by adding an SVC . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Growing or splitting SVC clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 SVC cache improves subsystem performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Cache destage operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 22 22 22 23 23 24 26 29
iii
2.4 Cluster upgrade. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Chapter 3. Master console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Managing the master console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Managing a single master console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Managing multiple master consoles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Administration roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.4 Audit logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.5 Managing IDs and passwords. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.6 Saving the SVC configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.7 Restoring the SVC cluster configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4. I/O Groups and nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Determining I/O Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Node shutdown and node failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Impact when running single node I/O Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Adding or upgrading SVC node hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 5. Storage controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Controller affinity and preferred path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 ADT for DS4000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Ensuring path balance prior to MDisk discovery . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Pathing considerations for EMC Symmetrix/DMX and HDS . . . . . . . . . . . . . . . . . . . . . 5.3 LUN ID to MDisk translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 ESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 DS6000 and DS8000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 MDisk to VDisk mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Mapping physical LBAs to Extents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Media error logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Host encountered media errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 SVC-encountered media errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Selecting array and cache parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.1 DS4000 array width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.2 Segment size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.3 DS8000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Considerations for controller configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.1 Balancing workload across DS4000 controllers . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.2 Balancing workload across DS8000 controllers . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.3 DS8000 ranks/extent pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.4 Mixing array sizes within an MDG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.5 Determining the number of controller ports for ESS/DS8000 . . . . . . . . . . . . . . . . 5.8.6 Determining the number of controller ports for DS4000 . . . . . . . . . . . . . . . . . . . . 5.9 LUN masking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 WWPN to physical port translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11 Using TPC to identify storage controller boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Using TPC to measure storage controller performance . . . . . . . . . . . . . . . . . . . . . . . 5.12.1 Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.2 Establish a performance baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.3 Performance metric guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.4 Storage controller back end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 34 34 39 40 46 50 52 53 55 56 56 57 58 61 62 62 63 63 63 63 64 65 66 66 66 67 68 68 69 70 70 70 71 73 74 74 74 75 77 78 79 80 81 81 82
iv
Chapter 6. MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Back-end queue depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 MDisk transfer size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Host I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 FlashCopy I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Coalescing writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Selecting LUN attributes for MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Tiered storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Adding MDisks to existing MDGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Adding MDisks for capacity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Checking access to new MDisks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.3 Persistent reserve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Removing MDisks from existing MDGs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Remapping managed MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Controlling extent allocation order for VDisk creation . . . . . . . . . . . . . . . . . . . . . . . . . .
85 86 86 86 87 87 87 88 88 89 89 89 90 92 93
Chapter 7. Managed disk groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 7.1 Availability considerations for planning MDGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.1.1 Performance consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.1.2 Selecting the MDisk Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.2 Selecting number of LUNs per array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 7.2.1 Performance comparison of one compared to two LUNs per array . . . . . . . . . . 102 7.3 Selecting the number of arrays per MDG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.4 Striping compared to sequential type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.5 Selecting storage controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Chapter 8. VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Creating VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Selecting the MDisk Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 Changing the preferred node within an I/O Group . . . . . . . . . . . . . . . . . . . . . . . 8.1.3 Moving a VDisk to another I/O Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 VDisk migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Migrating across MDGs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Image type to striped type migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 Migrating to image type VDisk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.4 Preferred paths to a VDisk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.5 Governing of VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Cache-disabled VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Using underlying controller remote copy with SVC cache-disabled VDisks . . . . 8.3.2 Using underlying controller PiT copy with SVC cache-disabled VDisks . . . . . . . 8.3.3 Changing cache mode of VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 VDisk performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 VDisk performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 The effect of load on storage controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 9. Copy services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 SAN Volume Controller Advanced Copy Services functions. . . . . . . . . . . . . . . . . . . . 9.1.1 SVC copy service functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.2 Using both Metro Mirror and Global Mirror between two clusters . . . . . . . . . . . . 9.1.3 Performing three-way copy service functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.4 Using native controller Advanced Copy Services functions . . . . . . . . . . . . . . . . 9.2 Copy service limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Setting up FlashCopy copy services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Steps to making a FlashCopy VDisk with application data integrity . . . . . . . . . .
Contents
113 114 116 116 117 119 119 119 119 121 122 125 125 126 127 129 132 139 143 144 144 144 144 145 146 147 148 v
9.3.2 Making multiple related FlashCopy VDisks with data integrity . . . . . . . . . . . . . . 9.3.3 Creating multiple identical copies of a VDisk . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.4 Understanding FlashCopy dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.5 Using FlashCopy with your backup application. . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.6 Using FlashCopy to help with migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.7 Summary of FlashCopy rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Metro Mirror and Global Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 Configuration requirements for long distance links . . . . . . . . . . . . . . . . . . . . . . . 9.4.2 Global mirror guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.3 Migrating a Metro Mirror relationship to Global Mirror. . . . . . . . . . . . . . . . . . . . . 9.4.4 Recovering from suspended Metro Mirror or Global Mirror relationships . . . . . . 9.4.5 Diagnosing and fixing 1920 errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.6 Using Metro Mirror or Global Mirror with FlashCopy. . . . . . . . . . . . . . . . . . . . . . 9.4.7 Saving bandwidth creating Metro Mirror and Global Mirror relationships . . . . . . 9.4.8 Using TPC to monitor Global Mirror performance. . . . . . . . . . . . . . . . . . . . . . . . 9.4.9 Summary of Metro Mirror and Global Mirror rules. . . . . . . . . . . . . . . . . . . . . . . . Chapter 10. Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Configuration recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 The number of paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.2 Host ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.3 Port masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.4 Host to I/O Group mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.5 VDisk size as opposed to quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.6 Host VDisk mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.7 Server adapter layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.8 Availability as opposed to error isolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Host pathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Preferred path algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.2 Path selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.3 Path management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.4 Dynamic reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.5 VDisk migration between I/O Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 I/O queues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Queue depths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Multipath software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Host clustering and reserves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.1 AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.2 SDD compared to SDDPCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.3 Virtual I/O server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.4 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.5 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.6 Solaris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.7 VMWare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Mirroring considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.1 Host-based mirroring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.1 Automated path monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.2 Load measurement and stress tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
151 153 154 156 156 158 158 158 159 162 162 163 165 165 166 167 169 170 170 171 171 171 172 172 176 176 176 177 177 178 179 180 182 182 184 184 186 190 191 193 194 194 196 197 197 197 198 198
Chapter 11. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 11.1 Application workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 11.1.1 Transaction-based processes (IOPS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
vi
11.1.2 Throughput-based processes (MBps). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.3 Host considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Application considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 Transaction environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2 Throughput environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Data layout overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.1 Layers of volume abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.2 Storage administrator and AIX LVM administrator roles . . . . . . . . . . . . . . . . . . 11.3.3 General data layout recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.4 Database strip size considerations (throughput workload) . . . . . . . . . . . . . . . . 11.3.5 LVM volume groups and logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 When the application does its own balancing of I/Os . . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 DB2 I/O characteristics and data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.2 DB2 data layout example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.3 Striped VDisk recommendation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Data layout with the AIX virtual I/O (VIO) server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.2 Data layout strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 VDisk size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 Failure boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 12. Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Configuring TPC to analyze the SVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Using the TPC to verify fabric configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Verifying SVC node ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.2 Ensure that all SVC ports are online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.3 Verifying SVC port zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.4 Verifying paths to storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.5 Verifying host paths to the SVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Methods for collecting data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.1 Setting up TPC to collect performance information. . . . . . . . . . . . . . . . . . . . . . 12.3.2 Viewing TPC-collected information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.3 Using TPC to alert on performance constraints . . . . . . . . . . . . . . . . . . . . . . . . Chapter 13. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Configuration and change tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.1 SAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.2 SVC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.3 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.4 General inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.5 Change tickets and tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.6 Configuration archiving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Standard operating procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 TotalStorage Productivity Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Code upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 Which code levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2 How often . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.3 What order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.4 Preparing for upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.5 Host code upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 SAN hardware changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.1 Cross-referencing the SDD adapter number with the WWPN . . . . . . . . . . . . . 13.5.2 Changes that result in the modification of the destination FCID . . . . . . . . . . . .
203 203 203 203 204 204 205 205 206 208 208 209 209 211 211 212 212 213 213 214 215 216 217 217 219 220 222 224 227 228 228 235 237 238 238 241 242 242 242 242 243 246 246 246 246 246 247 247 248 248 248
Contents
vii
13.5.3 Switch replacement with a like switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.4 Switch replacement or upgrade with a different kind of switch . . . . . . . . . . . . . 13.5.5 HBA replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6 Naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.1 Hosts, zones, and SVC ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.2 Controllers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.3 MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.4 VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.5 MDGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 14. Other useful information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Cabling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.1 General cabling advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.2 Long distance optical links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.3 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.4 Cable management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.5 Cable routing and support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.6 Cable length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.7 Cable installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.1 Bundled uninterruptible power supply units . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.2 Rack power feeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4 SVC scripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5 IBM Support Notifications Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6 SVC Support Web site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.7 SVC-related publications and classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.7.1 IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.7.2 Courses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 15. Troubleshooting and diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1 Common problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.1 Host problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.2 SVC problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.3 SAN problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.4 Storage subsystem problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Collecting data and isolating the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.1 Host data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.2 Multipathing driver: SDD data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.3 SVC data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.4 SAN data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.5 Storage subsystem data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3 Recovering from problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.1 Solving host problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.2 Solving SVC problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.3 Solving SAN problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.4 Typical SVC storage problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.5 Solving storage subsystem problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.6 Common error recovery steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
249 250 250 251 251 251 251 251 251 253 254 254 254 254 254 255 255 255 256 256 256 256 257 257 257 257 258 258 259 260 260 260 262 262 262 264 265 267 269 269 270 270 272 275 275 277 281
viii
15.4 Livedump. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Chapter 16. SVC 4.2 performance highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1 SVC and continual performance enhancements. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 SVC 4.2 code improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.1 Performance scaling of I/O Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Referenced Web sites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to get IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 284 286 286 289 291 291 291 292 292 292
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Contents
ix
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
xi
Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:
Redbooks (logo) alphaWorks pSeries AIX BladeCenter DB2 DS4000 DS6000 DS8000 Enterprise Storage Server ESCON FlashCopy GPFS HACMP IBM Redbooks System p System z System Storage Tivoli Enterprise Console Tivoli TotalStorage 1350
The following terms are trademarks of other companies: QLogic, and the QLogic logo are registered trademarks of QLogic Corporation. SANblade is a registered trademark in the United States. Oracle, JD Edwards, PeopleSoft, Siebel, and TopLink are registered trademarks of Oracle Corporation and/or its affiliates. Solaris, Sun, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Visio, Windows NT, Windows Server, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
xii
Preface
This IBM Redbooks publication captures some of the best practices based on field experience and describes the performance gains that can be achieved by implementing the IBM System Storage SAN Volume Controller. This book is intended for very experienced storage, SAN, and SVC administrators and technicians. Readers are expected to have an advanced knowledge of the SAN Volume Controller (SVC) and SAN environment, and we recommend these books as background reading: IBM System Storage SAN Volume Controller, SG24-6423 Introduction to Storage Area Networks, SG24-5470 Using the SVC for Business Continuity, SG24-7371
xiii
Technology and has worked as a Product Field Engineer supporting numerous storage products including ESS, DS6000, and DS8000. His areas of expertise include Open Systems storage solutions, multipathing software, and AIX. He is currently a member of the STG Field Assist Team (FAST) supporting clients through critical account engagements and technical advocacy. Barry Mellish is a Certified I/T Specialist and works as a Senior Storage Specialist in the United Kingdom, Ireland, and South Africa. Prior to this assignment, he spent four years on assignment as a Project Leader at the International Technical Support Organization, San Jose Center. He has co-authored sixteen IBM Redbook Publications and has taught many classes worldwide on storage subsystems. He joined IBM UK 24 years ago. Peter Mescher is a Product Engineer on the SAN Central team within the IBM Systems and Technology Group in Research Triangle Park, North Carolina. He has seven years of experience in SAN Problem Determination and SAN Architecture. Before joining SAN Central, he performed Level 2 support for network routing products. He is a co-author of the SNIA Level 3 FC Specialist Exam. This is his fourth IBM Redbooks publication. We extend our thanks to the following people for their contributions to this project. There are many people that contributed to this book. In particular, we thank the development and PFE teams in Hursley. Matt Smith was also instrumental in moving any issues along and ensuring that they maintained a high profile. Barry Whyte was instrumental in steering us in the correct direction and for providing support throughout the life of the residency. We would also like to thank the following people for their contributions: Iain Bethune Trevor Boardman Carlos Fuente Gary Jarman Colin Jewell Andrew Martin Paul Merrison Steve Randle Bill Scales Matt Smith Barry Whyte IBM Hursley Bill Wiegand IBM Advanced Technical Support Mark Balstead IBM Tucson Dan Braden IBM Dallas Lloyd Dean IBM Philadelphia Dorothy Faurot IBM Raleigh Marci Nagel
xiv
John Gressett IBM Rochester Bruce McNutt IBM Tucson Dan C Rumney IBM New York Chris Saul IBM San Jose Brian Smith IBM San Jose Sharon Wang IBM Chicago Tom Cady Deanna Polm Sangam Racherla IBM ITSO Rob Jackard Advanced Technology Services Group Tom and Jenny Chang Garden Inn Hotel, Los Gatos, California
Comments welcome
Your comments are important to us! We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways: Use the online Contact us review IBM Redbooks publication form found at: ibm.com/redbooks
Preface
xv
Send your comments in an e-mail to: [email protected] Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400
xvi
Chapter 1.
SAN fabric
The IBM SAN Volume Controller (SVC) has unique SAN fabric configuration requirements that differ from what you might be used to for other storage devices. A quality SAN configuration can go a long way toward a stable, reliable, and scalable SVC installation; conversely, a poor SAN environment can make your SVC experience considerably less pleasant. This chapter will give you the information that you need to tackle this complex topic. Note: As with any of the information in this book, you must check the IBM System Storage SAN Volume Controller Software Installation and Configuration Guide, SC23-6628, and appropriate IBM System Storage SAN Volume Controller Configuration Requirements and Limitations document, S1003093, for limitations, caveats, updates, and so on that are specific to your environment. Do not rely on this book as the last word in SVC SAN design.
You must refer to the IBM System Storage Support web page for all updated documentation before implementing your solution. (The SVC is listed under Storage Software, if you are having trouble finding it.)
http://www.storage.ibm.com/support/ Also, the official documentation (specifically, the SVC Configuration Guide) reviews special configurations that might not be covered in this chapter. Note: All document citations in this book refer to the 4.2 versions of the documents. If you use a different version, refer to the correct edition of the documents. As you read this chapter, keep in mind that this is a best practices book based on field experiences. It might be possible (and supported) to do many of the things advised against here, but we (the authors) believe they are nevertheless not an ideal configuration.
1.1.1 Redundancy
One of the most basic SVC SAN requirements is to create two (or more) entirely separate SANs that are not connected to each other over Fibre Channel in any way. The easiest way to do this is to construct two SANs that are mirror images of each other. Technically, the SVC will support using just a single SAN (appropriately zoned) to connect the entire SVC. However, we do not recommend this design in any production environment. In our experience, we also do not recommend this design in development environments either, because a stable development platform is very important to programmers, and an extended outage in the development environment can cause an expensive business impact. For a dedicated storage test platform, however, it might be acceptable.
traffic and inter-node traffic must never transit an ISL, except during migration scenarios.
High-bandwidth-utilization servers (such as tape backup servers) must also be on the same switch as the SVC. Putting them on a separate switch can cause unexpected SAN congestion problems. Putting a high-bandwidth server on an edge switch is a waste of an ISL. If at all possible, plan for the maximum size configuration that you ever expect your SVC cluster to reach. As you will see in later parts of this chapter, the design of the SAN can change radically for larger numbers of hosts. Modifying the SAN later to accommodate a larger-than-expected number of hosts will either produce a poorly-designed SAN or be very difficult, expensive, and disruptive to your business. This does not mean that you need to purchase all of the SAN hardware initially, just that you need to lay out the SAN while keeping the maximum size in mind. Always deploy at least one extra ISL per switch. Not doing so opens you up to consequences from complete path loss (this is bad) to fabric congestion (this is even worse). The SVC does not permit the number of hops between the SVC and the hosts to exceed three hops. This typically is not a problem.
located in the switch. Generally, the ISLs (or ISL trunks) need to be on separate line cards within the switch. The hosts must be spread out evenly among the remaining line cards in the switch. Remember to locate high-bandwidth hosts on the core switches directly.
1.1.6 Switch port layout and hardware selection for director-class core switches
Each switch vendor has a selection of line cards available. Some of these line cards are oversubscribed, and some of them have full bandwidth available for the attached devices. For your core switches, we suggest only using line cards where the full line speed that you expect to use will be available. You need to contact your switch vendor for full line card details. (They change too rapidly for practical inclusion in this publication). Your SVC ports, storage ports, ISLs, and high-bandwidth hosts need to be spread out evenly among your line cards in order to help prevent the failure of any one line card from causing undue impact to performance or availability.
SVC Node 2 2 2
SVC Node
Core Switch
Core Switch
Edge Switch
Edge Switch
Edge Switch
Edge Switch
Host
Figure 1-2 Core-edge topology
Host
SVC Node
SVC Node
Core Switch
Core Switch
Core Switch
Core Switch
Edge Switch
Edge Switch
Edge Switch
Edge Switch
Host
Figure 1-3 Four-SAN core-edge topology
Host
While some clients have chosen to simplify management by connecting the SANs together into pairs with a single ISL, we do not recommend this design. With only a single ISL connecting fabrics together, a small zoning mistake can quickly lead to severe SAN congestion.
SVC Node 2 2
SVC Node
Switch
Switch
Switch
Switch
SVC -> Storage Traffic should be zoned to never travel over these links SVC-attach host
Figure 1-4 Spread out disk paths
Non-SVC-attach host
If you have this type of a topology, it is very important to zone the SVC so that it will only see the paths on the same switch as the SVC nodes. Note: This means you must have more restrictive zoning than what is detailed in 1.5.6, Sample standard SVC zoning configuration on page 16. Because of the way that the SVC load balances traffic between the nodes and MDisks, the amount of traffic that transits your ISLs will be unpredictable and vary significantly. If you have
a Cisco fabric, this might be a place where Cisco VSANs are useful to help enforce the separation.
Old Switch
New Switch
Old Switch
New Switch
SVC -> Storage Traffic should be zoned and masked to never travel over these links, but they should be zoned for intraCluster communications Host Host
This is a valid configuration, but you must take certain precautions: As stated in Accidentally accessing storage over ISLs on page 8, zone and mask the SAN/disks so that you do not access the disk arrays over the ISLs. This means your disk arrays will need connections to both switches. You must have two dedicated ISLs between the two switches on each SAN with no data traffic traveling over them. The reason for this design is because if this link ever becomes congested or lost, you might experience problems with your SVC cluster if there are also issues at the same time on the other SAN. If you can, set a 5% traffic threshold alert on the ISLs so that you know if a zoning mistake has allowed any data traffic over the links. Note: It is not a best practice to use this configuration to perform mirroring between I/O Groups within the same cluster. And, you must never split the two nodes in an I/O Group between different switches. However, in a dual-fabric configuration, half of the nodes ports must remain on the same switch with the other half of the ports on another switch.
10
plan in place. We also do not advise that you connect the QLogic/BladeCenter FCSM to Brocade at this time. In any fabric in which a BladeCenter FCSM is installed, do not perform any zoning operations from the FCSM. Perform them all from your core fabric.
11
With IP-based distance extension, it is imperative that you dedicate bandwidth to your Fibre Channel (FC) IP traffic if the link is shared with the rest of your IP cloud. Do not assume that because the link between two sites is low traffic or only used for e-mail that this will always be the case. Fibre Channel is far more sensitive to congestion than most IP applications. You do not want a spyware problem or a spam attack to disrupt your SVC. Also, when communicating with your organizations networking architects, make sure to distinguish between megabytes per second as opposed to megabits. In the storage world, bandwidth is usually specified in megabytes per second (MBps, MB/s, or MB/sec), while network engineers specify bandwidth in megabits (Mbps, Mb/s, or Mb/sec). If you fail to specify megabytes, you can end up with an impressive-sounding 155 Mb/sec OC-3 link, which is only going to supply a tiny 15 MB/sec or so to your SVC. With the suggested safety margins included, this is not a very fast link at all. Exact details of the configuration of these boxes is beyond the scope of this book; however, the configuration of these units for the SVC is no different from any other storage device.
1.5 Zoning
Because it is so different from traditional storage devices, properly zoning the SVC into your SAN fabric is a large source of misunderstanding and errors. Despite this, it is actually not particularly complicated. Note: Errors caused by improper SVC zoning are often fairly difficult to track down, so make sure to create your zoning configuration carefully. Here are the basic SVC zoning steps: 1. 2. 3. 4. 5. 6. Create SVC cluster zone. Create SVC cluster. Create SVC Storage zones. Assign storage to the SVC. Create host SVC zones. Create host definitions.
The zoning scheme that we describe next is actually slightly more restrictive than the zoning described in the IBM System Storage SAN Volume Controller Configuration Guide, SC23-6628. The reason for this is that the Configuration Guide is a statement of what is supported, but this publication is a statement of our understanding of the best way to do things, even if other ways are possible and supported.
12
There are multiple reasons not to use WWNN zoning. For hosts, it is absolutely a bad idea, because the WWNN is often based on the WWPN of only one of the HBAs. If you have to replace that HBA, the WWNN of the box will change on both fabrics, which will result in access loss. In addition, it also makes troubleshooting more difficult, because you have no consolidated list of which ports are supposed to be in which zone, and therefore, it is difficult to tell if a port is missing.
13
Aliases
The biggest time-saver when creating your SVC zones is to use zoning aliases, if they are available on your particular switch. They will make your zoning much easier to configure and understand, and the likelihood of errors will be much less. The aliases we are suggesting you create here take advantage of the fact that aliases can contain multiple members, just like zones. Create aliases for each of the following: One that holds all the SVC ports on each fabric One for each storage controller (or controller blade, in the case of DS4x000 units) One for each I/O Group port pair (that is, it needs to contain node 0, port 2, and node 1, port 2) It is usually not necessary to create aliases for your host ports.
Naming convention
Refer to 13.6, Naming convention on page 251 for suggestions for an SVC naming convention. A poor naming convention can make your zoning configuration very difficult to understand and maintain.
14
This configuration will give you four paths to each VDisk, which is the number of paths per VDisk for which IBM Subsystem Device Driver (SDD) and the SVC have been tuned.
A B C D
I/O Group 0
SVC Node
SVC Node
Zone Bar_Slot2_SAN_A
Zone Bar_Slot8_SAN_B
Host Foo
Host Bar
The IBM System Storage SAN Volume Controller Software Installation and Configuration Guide, SC23-662, discusses putting many hosts into a single zone as a supported configuration under some circumstances. While this will usually work just fine, instability in one of your hosts can trigger all sorts of impossible to diagnose problems in the other hosts in the zone. For this reason, you need to only have a single host in each zone. It is a supported configuration to have eight paths to each VDisk, but this provides no performance benefit (indeed, under some circumstances, it can even reduce performance), and it does not improve reliability or availability by any significant degree.
15
Switch A
Switch B
Peter
Barry
Jon
Ian
Thorsten
Ronda
Deon
Foo
Aliases
Unfortunately, you cannot nest aliases, so some of these WWPNs will appear in multiple aliases. Also, do not be concerned if none of your WWPNs look like the example; we made a few of them completely up when writing this book. Note that some switch vendors (McDATA comes to mind) do not allow multiple-member aliases, but you can still create single-member aliases. While this will not reduce the size of your zoning configuration, it will still make it easier to read than a mass of raw WWPNs. For the alias names, we have appended SAN_A on the end where necessary to distinguish that these are the ports on SAN A. While this seems kind of silly, it helps keep thing straight if you ever have to perform troubleshooting on both SANs at once.
16
SVC_Group0_Port1: 50:05:07:68:01:10:37:e5 50:05:07:68:01:10:37:dc SVC_Group0_Port3: 50:05:07:68:01:30:37:e5 50:05:07:68:01:30:37:dc SVC_Group1_Port1: 50:05:07:68:01:10:1d:1c 50:05:07:68:01:10:27:e2 SVC_Group1_Port3: 50:05:07:68:01:30:1d:1c 50:05:07:68:01:30:27:e2
17
DS4k_23K45_Blade_A_SAN_A 20:04:00:a0:b8:17:44:32 20:04:00:a0:b8:17:44:33 DS4k_23K45_Blade_B_SAN_A 20:05:00:a0:b8:17:44:32 20:05:00:a0:b8:17:44:33 DS8k_34912_SAN_A 50:05:00:63:02:ac:01:47 50:05:00:63:02:bd:01:37 50:05:00:63:02:7f:01:8d 50:05:00:63:02:2a:01:fc
Zones
One thing to keep in mind when naming your zones is that they cannot have identical names as aliases. Here is our sample zone set, utilizing the aliases that we have just defined.
Cluster zone
This one is pretty simple; it only contains a single alias (which happens to contain all of the SVC ports). And yes, this zone does overlap with every single one of the storage zones. Nevertheless, it is nice to have it there as a fail-safe, given the dire consequences that will occur if your cluster nodes ever completely lose contact with one another over the SAN. See Example 1-4.
Example 1-4 SVC cluster zone
SVC_Cluster_Zone_SAN_A: SVC_Cluster_SAN_A
18
WinPeter_Slot3: 21:00:00:e0:8b:05:41:bc SVC_Group0_Port1 WinBarry_Slot7: 21:00:00:e0:8b:05:37:ab SVC_Group0_Port3 WinJon_Slot1: 21:00:00:e0:8b:05:28:f9 SVC_Group1_Port1 WinIan_Slot2: 21:00:00:e0:8b:05:1a:6f SVC_Group1_Port3 AIXRonda_Slot6_fcs1: 10:00:00:00:c9:32:a8:00 SVC_Group0_Port1 AIXThorsten_Slot2_fcs0: 10:00:00:00:c9:32:bf:c7 SVC_Group0_Port3 AIXDeon_Slot9_fcs3: 10:00:00:00:c9:32:c9:6f SVC_Group1_Port1 AIXFoo_Slot1_fcs2: 10:00:00:00:c9:32:a8:67 SVC_Group1_Port3
19
20
Chapter 2.
21
22
I/O Groups are the basic configuration element of a SAN Volume Controller cluster. Adding I/O Groups to the cluster is designed to linearly increase cluster performance and bandwidth. An entry-level SAN Volume Controller configuration contains a single I/O Group. The SAN Volume Controller can scale out to support four I/O Groups, and it can scale up to support 1,024 host servers. For every cluster, the SAN Volume Controller supports up to 4,096 virtual disks (VDisks). This configuration flexibility means that SAN Volume Controller configurations can start small with an attractive price to suit smaller clients or pilot projects and yet can grow to manage very large storage environments.
40 TB
23
With the newly added I/O Group, the SVC cluster can now manage more than 130,000 IOPS. An SVC cluster itself can be scaled up to an eight node cluster with which we will reach a total I/O rate of more than 250,000 IOPS.
Looking at Figure 2-2, you can see that the response time over throughput can be scaled nearly linearly by adding SVC nodes (I/O Groups) to the cluster.
24
Table 2-2 Maximum SVC cluster limits Objects SAN Volume Controller nodes Managed disks Maximum number 8 4,096 Comments Arranged as four I/O Groups The maximum number of logical units that can be managed by SVC. This number includes disks that have not been configured into Managed Disk Groups. The maximum requires an eight node cluster. If extent size of 512 Mb is used
4,096 2.1 PB
If you exceed one of the current maximum configuration limits for the fully deployed SVC cluster, you then scale out by adding a new SVC cluster and distribute the workload to it. Because the current maximum configuration limits can change, use the following link to get a complete table of the current SVC cluster configuration limitations: http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003093 Splitting an SVC cluster or having a secondary SVC cluster provides you with the ability to implement a disaster recovery option in the environment. Having two SVC clusters in two locations allows work to continue even if one site is down. With the SVC Advanced Copy functions, you can copy data from the local primary environment to a remote secondary site. The maximum configuration limits apply here as well. Another advantage of having two clusters is that the SVC Advanced Copy functions license is based on: The total amount of storage (in Gigabytes) that is virtualized The Metro Mirror and Global Mirror or FlashCopy capacity in use In each case, the number of TBs to order for Metro Mirror and Global Mirror is the total number of source TBs and target TBs participating in the copy operations.
25
subsystem storage controllers, or any other maximum mentioned in the V4.2.0 Configuration Requirements and Guidelines at: http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003093 Instead of having one SVC cluster host all I/O operations, hosts, and subsystem storage attachments, the goal here is to create a second SVC cluster so that we equally distribute all of the workload over the two SVC clusters. There are a number of approaches that you can take for splitting an SVC cluster: The first, and probably the easiest, way is to create a new SVC cluster, attach storage subsystems and hosts to it, and start putting workload on this new SVC cluster. The next options are more intensive, and they involve performing more steps: Create new SVC clusters and start moving workload onto it. To move the workload from an existing SVC cluster to a new SVC cluster, you can use the Advanced Copy features, such as Metro Mirror and Global Mirror. We describe this scenario in Chapter 9, Copy services on page 143. Note: This move involves an outage from the host system point of view, because the worldwide port name (WWPN) from the subsystem (SVC I/O Group) does change. You can use the VDisk managed mode to image mode migration to move workload from one SVC cluster to the new SVC cluster. Migrate a VDisk from manage mode to image mode, reassign the disk (logical unit number (LUN) masking) from your storage subsystem point of view, introduce the disk to your new SVC cluster, and use the image mode to manage mode migration. We describe this scenario in Chapter 8, VDisks on page 113. Note: This scenario also invokes an outage to your host systems and the I/O to the VDisk. From a user perspective, the first option is the easiest way to expand your cluster workload. The second and third options are more difficult, involve more steps, and require more preparation in advance. The third option is the choice that involves the longest outage to the host systems, and therefore, we do not prefer the third choice. There is only one good reason that we can think of to reduce the existing SVC cluster by a certain amount of I/O Groups: If more bandwidth is required on the secondary SVC cluster, and there is spare bandwidth available on the primary cluster.
26
We performed these tests in the following environment: Windows 2003 Server I/O Meter with (32 KB; 75% Read; 0% random) DS4000 Brocade Fabric The overview that we show here does not provide any absolute numbers or show the best performance that you are ever likely to get. Our intent is to show that the SVC and its caching ability will undoubtedly improve the performance. In Figure 2-3, you see a comparison between native storage subsystem to host attachment compared to the storage subsystem to SVC to host attachment. The graphs taken with TotalStorage Productivity Center (TPC) show four tests and one extent migration where we brought in a secondary MDisk: Test1: Storage subsystem direct-attached to host Test2: Introducing an 8F4 SVC cluster in the datapath and using an image mode VDisk Test3: Image mode VDisk on 8G4 SVC cluster with VDisk performance Extent migration: Introducing a second MDisk to the Managed Disk Group (MDG) and equally distributing the extents of the MDisks Test4: Striped VDisk on an 8G4 SVC cluster Test3 Test1 Test2 Extent migration
Test4
Figure 2-3 Comparison between native disk to host attachment as opposed to disk to SVC to host connection
The test sequence that we have chosen here shows the normal introduction of an SVC cluster in a client environment from native attached storage to virtualized storage attachment. Test1, Test2, and Test3 show a nearly similar subsystem performance (yellow line). Test2 and Test3 show a spike at the beginning of each test. By introducing the SVC in the datapath, we introduced a caching appliance. Therefore, host I/O will no longer go directly to the subsystem, it is first cached and then flushed down to the subsystem.
27
Test3 shows, in addition to the subsystem performance, the performance for the VDisk (blue line). As we will explain later, we have a clear performance improvement from the hosts point of view. The extent migration between Test3 and Test4 is the step where you move an image mode VDisk to a managed mode VDisk. Test4 shows the performance of a striped VDisk (blue line) and two MDisks (red and orange lines). In this section, we show you the value of the SVC cluster in your environment. For this purpose, we only compare Test1 and Test4. In the chart in Figure 2-4, we compare the total, read, and write I/O per second (IOPS). The performance improvement that we saw here was approximately 27%. Test1 is direct-attached, and Test4 is striped.
I/Ops
3000 2500 2000 1500 1000 500 0 I/Ops Read I/Ops Write I/Ops
Test1 Test4
Figure 2-5 on page 29 shows the values for the total, read, and write MBps. Similar to the I/O rate, we saw a 27% improvement for the I/O traffic. Test1 is direct-attached, and Test4 is striped.
28
MB/s
90 80 70 60 50 40 30 20 10 0 Total MBps Read MBps Write MBps
Test1 Test4
For both parameters, I/Ops in Figure 2-4 on page 28 and MBps in Figure 2-5, we saw a large performance improvement by using the SVC.
29
The cache component periodically reclaims resources used to cache blocks so that it can cache different blocks. The cache component uses a least recently used (LRU) policy for selecting those blocks that it will no longer cache. However, the cache component will not be able to reclaim resources when the cache is full of pinned data. Pinned data is modified data for an offline VDisk, that is, it cannot be destaged, because the back-end disks are unavailable for I/O. For this reason, the cache component reserves a set of short-term resources that are not used to cache. The cache can use these resources to synchronously complete any read or write command. Therefore, the cache component can proceed with new read and write commands without waiting for the processing of existing read and write commands to complete.
Cache-disabled VDisks
Cache-disabled VDisks are useful: To allow the use of copy services in the underlying storage controllers To control the allocation of cache resources. By disabling the cache for some VDisks, more cache resources will be available to cache I/Os to other VDisks in the same I/O Group. This technique is particularly effective where an I/O Group is serving some VDisks, which will benefit from cache and other VDisks where the benefits of caching are small or nonexistent. Currently, there is no direct way to enable the cache for a previously cache-disabled VDisk. There are three options to turn the VDisk caching mechanism on: If the VDisk is an image mode VDisk, you can remove the VDisk from the SVC cluster and redefine it with cache enabled. Use the SVC FlashCopy function to copy the content of the cache-disabled VDisk to a new cache-enabled VDisk. After the Flash Copy has been started, change the VDisk to host mapping to the new VDisk. This will involve an outage. Use the SVC Metro Mirror or Global Mirror function to mirror the data to another cache-enabled VDisk. As in the second option, you have to change the VDisk to host mapping after the mirror operation is done. This will involve an outage. For more information about VDisk handling, see Chapter 8, VDisks on page 113.
30
Even though the SVC code update is concurrent, we recommend that you perform several steps in advance: Before applying a code update, ensure that there are no open problems in your SVC, SAN, or storage subsystems. Use the Run maintenance procedure on the SVC and fix the open problems first. For more information, refer to 15.3.2, Solving SVC problems on page 272. It is also very important to check your host dual pathing. Make sure that from the hosts point of view that all paths are available. Missing paths can lead to I/O problems during the SVC code update. Refer to Chapter 10, Hosts on page 169 for more information about hosts. It is wise to schedule a time for the SVC code update during low I/O activity. Upgrade the master console GUI first. Allow the SVC code update to finish before making any other changes in your environment. Allow at least one hour to perform the code update for a single SVC I/O Group and 30 minutes for each additional I/O Group. In a worst case scenario, an update can take up to two hours, which implies that the SVC code update will also update the BIOS, SP, and the SVC service card. Important: If the Concurrent Code Upgrade (CCU) appears to stop for a long time (up to an hour), this can occur, because it is upgrading a low level BIOS. Never power off during a CCU upgrade unless you have been instructed to do so by IBM service personnel. If the upgrade does encounter a problem and fails, it will back out the upgrade itself. New features are not available until all nodes in the cluster are at the same level. Features, which are dependent on a remote cluster Metro Mirror or Global Mirror, might not be available until the remote cluster is at the same level. For more information, refer to 15.3.5, Solving storage subsystem problems on page 277.
31
32
Chapter 3.
Master console
In this chapter, we describe how to manage important areas of the IBM System Storage SAN Volume Controller Master Console (MC), how to manage multiple MCs, how to save configurations from the IBM System Storage SAN Volume Controller (SVC) to the MC, and how to maintain passwords and IDs from the MC and the SVC. Furthermore, we provide information about IP considerations, audit logging, and how to use the audit logs.
33
34
When you change the host name, you must also be sure that other master console applications are updated to use the new name. Perform the following steps to change the host name and to update the name in other master console applications: 1. Right-click My Computer from the desktop. 2. Click Properties. 3. Click Computer Name. 4. Click Change. 5. Type the master console host name in the Computer name field. 6. Click More. 7. Type the full path information in the Primary DNS suffix of this computer field. 8. Click OK until you are returned to the desktop. 9. Click Yes to restart the master console system so that the change to the host name is applied. After you have finished the master console basic setup, you can start to add SVC clusters to the SVC console GUI.
Connection limitations
Each SVC cluster can host only a limited amount of Secure Shell (SSH) connections to it. The SVC will support no more than 10 concurrent SSH processes. If this number is exceeded, no further connections will be possible. The SVC currently has a maximum of 10 concurrent SSH session per user. This means that you can have up to a maximum of 10 connections per Admin or Service user. Each CIMOM application and host automation, such as HACMP-XD, counts toward these limits.
35
There is also a limit on the number of SSH connections that can be opened per second. The current limitation is 10 SSH connections per second. Note: We recommend that you close SSH connections when they are no longer required. Use the exit command to terminate an interactive SSH session. If the maximum connection limit is reached and you cannot determine those clients that have open connections to the cluster, the SVC 4.2 cluster code level has incorporated options to help you recover from this state.
If you get this error: 1. If you still have access to the SVC console GUI, you can use the service and maintenance procedure to fix this error. This procedure allows you to reset all active connections, which terminates all SSH sessions, and clears the login count. 2. If you have no access to the SVC console GUI, there is now a direct maintenance link in the drop-down menu of the View cluster panel. Using this link, you can get directly to the service and maintenance procedures. The following panels guide you to access and use this maintenance feature. Figure 3-3 shows you how to launch this procedure.
Figure 3-3 Launch Maintenance Procedures from the panel to view the cluster
In Figure 3-4 on page 37, you can access the Directed Maintenance Procedures. At this panel, you can review and identify all currently open SSH connections, and you are able to close all SSH connections.
36
You can read more information about the current SSH limitations and how to fix problems at: http://www-1.ibm.com/support/docview.wss?rs=591&context=STCFKTH&context=STCFKTW&dc =DB500&uid=ssg1S1002896&loc=en_US&cs=utf-8&lang=en
37
We explain some of the reasons for the SVC cluster status of No Contact and how you can fix that problem: The SVC console (SVCC) code level does not match the SVC code level (for example, SVCC V2.1.0.x with SVC 4.2.0). To fix this problem, you need to install the corresponding SVC console GUI code that was mentioned in SAN volume controller and SVC console (GUI) compatibility on page 35. The CIMOM cannot execute the plink.exe command (PuTTYs ssh command). To test the connection, open a command prompt (cmd.exe) and go to the PuTTY install directory. Common install directories are C:\Support Utils\Putty and C:\Program Files\Putty. Execute the following command from this directory: plink.exe admin@clusterIP -ssh -2 -i "c:\Programfiles\IBM\svcconsole\cimom\icat.ppk" This is shown in Example 3-1 on page 39.
38
C:\Program Files\PuTTY>plink.exe [email protected] -ssh -2 -i "c:\Program files\IBM\svcconsole\cimom\icat.ppk" Using username "admin". Last login: Fri Jul 27 11:18:48 2007 from 9.43.86.115 IBM_2145:ITSOCL1:admin> In Example 3-1, we executed the command, and the connection is established. If the command fails, there are a few things we can check: The location of the PuTTY executable does not match the SSHCLI path in setupcmdline.bat. The icat.ppk key needs to be in the C:\Program Files\IBM\svcconsole\cimom directory. The icat.ppk file found in the C:\Program Files\IBM\svcconsole\cimom directory needs to match the icat.pub key uploaded to the SVC cluster. The CIMOM can execute the plink.exe command, but the SVC cluster does not exist, it is offline, or the network is down. Check if the SVC cluster is up and running (check the front panel of the SVC nodes and use the arrow keys on the node to determine if the Ethernet on the configuration node is up). Check your local Ethernet settings and issue a ping to the SVC cluster IP address. If after you have performed all of these actions on the SVC cluster, and it is still in No Contact state, it is time to call IBM Support.
Configuration considerations
For managing two different master consoles, the same rules for managing a single master console apply as in 3.1.1, Managing a single master console on page 34. The only difference is that you have to provide different values for each master console: Machine name: A fully-qualified Domain Name Server (DNS) name for the master console Master console IP address: The address that will be used to access the master console
Chapter 3. Master console
39
Gateway IP address: The default gateway IP address used by the master console Subnet mask: The subnet mask for the master console
Connection limitations
The connection limitation of 10 will apply to both master consoles. Each master console will use up one SSH connection for each GUI session that is launched. The connection limit can be reached very quickly with multiple users of the same cluster. One disadvantage of using two master consoles to manage two clusters is that if one cluster is currently not operational, for example No Contact for the SVC cluster state, ease of access to the other cluster is affected by the two minute timeout during the launch of SVC menus when the GUI is checking the status of both clusters. This timeout appears while the SVC console GUI is trying to access the missing SVC cluster.
40
For more information about the command structure and the commands that each user role can use, refer to the SVC Controller Command-Line Interface Users Guide, SC26-7903-01.
2. This panel provides you with information about setting up a new user role as shown in Figure 3-7.
3. This panel, Figure 3-8 on page 42, asks you to provide a new User Name and a new Password to be associated with this user.
41
Over the next sequence of panels, you can select a certain type of user role for your new user. You can choose from a different set of user types as listed in 3.1.3, Administration roles on page 40. 4. In the first panel, you can add a user to an administration role for a specific SVC cluster, as shown in Figure 3-9.
Figure 3-10 on page 43 shows how you can add a user to a service role in an SVC cluster.
42
Figure 3-11 shows how to add a user in the operator role to a corresponding SVC cluster.
6. In the last panel, Figure 3-13 on page 44, before finalizing the roles, you can review if the changes you are about to make are correct. At this stage, verify that the user is associated to the correct roles and clusters.
43
C:\Program Files\PuTTY>pscp -load ITSOCL1 "C:\Program Files\IBM\MasterConsole\Support Utils\Putty\Thorsten\ticat.pub" [email protected]:/tmp/ ticat.pub | 0 kB | 0.3 kB/s | ETA: 00:00:00 | 100% 2. After the new key is copied to the SVC cluster, you need to use the SVC CLI to manage the user roles. Before you add a new user, check the existing ssh keys on your SVC cluster by issuing the svcinfo lssshkeys -user all command. The output of this command, which is shown in Example 3-3, shows you the existing user IDs and keys. To add the new icat.pub key that you previously added via SCP, issue the command svctask addsshkey.
Example 3-3 Create a new user
IBM_2145:ITSOCL1:admin>svcinfo lssshkeys -user all id userid key_identifier 1 admin admin IBM_2145:ITSOCL1:admin>svctask addsshkey -user admin -file /tmp/ticat.pub -label testkey IBM_2145:ITSOCL1:admin>svcinfo lssshkeys -user all 44
SAN Volume Controller: Best Practices and Performance Guidelines
id 1 2
3. When you run the svcinfo lssshkey again, you can see that a new user ID and key is added to the SVC cluster. With the svctask mkauth command, you can change a users default authorization role from that of Monitor to either CopyOperator or Administrator. This is shown in Example 3-4.
Example 3-4 Changing the user role
IBM_2145:ITSOCL1:admin>svcinfo lsauth id ssh_label Role 0 admin Administrator 1 test Administrator IBM_2145:ITSOCL1:admin>svctask mkauth -label test -role CopyOperator IBM_2145:ITSOCL1:admin>svcinfo lsauth id ssh_label Role 0 admin Administrator 1 test CopyOperator 4. By using rmauth, the user will be assigned the default operation role, which is Monitor. Using mkauth, you can assign CopyOperator or Administrator again. This is shown in Example 3-5.
Example 3-5 Manage user roles
IBM_2145:ITSOCL1:admin>svcinfo lsauth id ssh_label Role 0 admin Administrator 1 test CopyOperator IBM_2145:ITSOCL1:admin>svctask rmauth -label test IBM_2145:ITSOCL1:admin>svcinfo lsauth id ssh_label Role 0 admin Administrator 1 test Monitor IBM_2145:ITSOCL1:admin>svctask mkauth -label test -role Administrator IBM_2145:ITSOCL1:admin>svcinfo lsauth id ssh_label Role 0 admin Administrator 1 test Administrator This recently added user can now be used to log in via PuTTY. Open PuTTY and create a new session to your SVC cluster. For more information about how to use PuTTY, refer to IBM System Storage SAN Volume Controller, SG24-6423-05. Click SSH Auth (see Figure 3-14 on page 46) to select and use the SSH key that we added in Example 3-2 on page 44 and Example 3-3 on page 44 via the scp and addsshkey commands. Save this session. In Figure 3-14 on page 46, we give you an example of how to select and save the new SSH key to your PuTTY session.
Chapter 3. Master console
45
Important: To restrict administrative access to only Monitor or CopyOperator users, you need to either delete the admin icat.ppk or make it unavailable to the Monitor user or CopyOperator user. If you fail to do so, the master console user can gain back administrative access by using the admin icat.ppk key.
46
IBM_2145:ITSOCL1:admin>svctask dumpauditlog IBM_2145:ITSOCL1:admin> The audit log entries provide the following information: The identity of the user who issued the action command The name of the action command The time stamp of when the action command was issued by the configuration node The parameters that were issued with the action command Note: Some commands are not logged in the audit log dump. This list shows the commands that are not documented in the audit log: svctask dumpconfig svctask cpdumps svctask cleardumps svctask finderr svctask dumperrlog svctask dumpinternallog svcservicetask dumperrlog svcservicetask finderr The audit log will also track commands that failed.
Naming convention
Each dump file name is generated automatically in the following format: auditlog_<firstseq>_<lastseq>_<timestamp>_<clusterid> where <firstseq> is the audit log sequence number of the first entry in the log <lastseq> is the audit sequence number of the last entry in the log <timestamp> is the time stamp of the last entry in the audit log being dumped <clusterid> is the cluster ID at the time the dump was created Note: The audit log dump file names cannot be changed. To collect the audit log, you need to log on to the SVC Console and open the Service and Maintenance panel. Click List Dumps, and you will see all available files in the List Dumps section on the right window as shown in Figure 3-15 on page 48.
47
Figure 3-15 Open the Audit log file via the GUI
Click Audit Logs to open the audit log list. You will see a section where all currently available audit log files are listed (Figure 3-16). To open the file, you can simply click on it, or use a right click, and Save as.
In Figure 3-17 on page 49, we show an example for the collected audit log. The audit log provides information to the user about when the command was issued and by whom, if the command was issued remotely by the CLI or locally by the GUI, and the actual command input itself.
48
IBM_2145:ITSOCL1:admin>svctask dumpauditlog IBM_2145:ITSOCL1:admin> The lsauditlogdumps command generates a list of the audit log dumps that are available on the nodes in the cluster. After you have issued the command in Example 3-7, you will get a list of all available audit log dumps by typing the command shown in Example 3-8.
Example 3-8 List the available audit log files
IBM_2145:ITSOCL1:admin>svcinfo lsauditlogdumps id auditlog_filename 0 auditlog_0_3516_20070604102843_0000020060806fb8 1 auditlog_0_130_20070724115258_0000020060406fca For the naming convention, refer to Naming convention on page 47. In Example 3-9 on page 50, we show a captured audit log and a few entries in the logfile.
49
IBM_2145:ITSOCL1:admin>svcinfo catauditlog -delim : -first 3 audit_seq_no:timestamp:cluster_user:ssh_label:ssh_ip_address:icat_user:result:res_ obj_id:action_cmd 126:070724102710:admin:admin:9.43.86.115:superuser:0::svctask mkvdiskhostmap -host 0 15 127:070724104853:admin:admin:9.43.86.115:superuser:0::svctask chcluster -icatip 9.43.86.115:9080 128:070724104854:admin:admin:9.43.86.115:superuser:0::svctask chcluster -icatip 9.43.86.115:9080 This output gives the reader more information about the command being issued on the SVC cluster. In our example, this information is separated by colons, and it provides the following information (with explanations): audit_seq_no:timestamp:cluster_user:ssh_label:icat_user:result:res_obj_id :action_cmd audit_seq_no: Ascending numbering timestamp: Time when the command was issued cluster_user: User ssh_label: SSH username ssh_ip_address: Location from where the command was issued icat_user: The ICAT user result: 0 (success) or 1 (success in progress) res_obj_id action_cmd: Shows the issued command
50
SVC Master Console: You cannot access the Master Console. Password recovery depends on the operating system. The administrator will need to recover the lost or forgotten user and password. SVC Cluster: You cannot access the cluster through the SAN Volume Controller Console without this password. Allow the password reset during the cluster creation. If the password reset is not enabled, issue the svctask setpwdreset CLI command to view and change the status of the password reset feature for the SAN Volume Controller front panel. See Example 3-10. SVC Service mode: You cannot access the SVC cluster when it is in service mode. Reset the password in the SVC console GUI using the Maintaining Cluster Passwords feature. SVC CLI (PuTTY): You cannot access the SVC cluster via the CLI. Create a new private and public key pair. SAN Volume Controller Console: You cannot access the SVC cluster via the SVC console GUI. Remove and reinstall the SVC console GUI. Use the default user and password and change it during the first logon. TPC CIMOM: Same user and password as the SVC console. When creating a cluster, be sure to select the option Allow password reset from front panel as shown in Figure 3-18. You see this option during the initial cluster creation. For additional information, see IBM System Storage SAN Volume Controller, SG24-6423-05.
This option allows access to the cluster if the admin password is lost. If the password reset feature was not enabled during the cluster creation, use the following CLI command as shown in Example 3-10 to enable it.
Example 3-10 Enable password reset via CLI
IBM_2145:ITSOCL1:admin>svctask setpwdreset -show Password status: [0] IBM_2145:ITSOCL1:admin>svctask setpwdreset -enable IBM_2145:ITSOCL1:admin>svctask setpwdreset -show Password status: [1]
51
IBM_2145:ITSOCL1:admin>svcconfig backup ...... CMMVC6130W Inter-cluster partnership fully_configured will not be restored .. CMMVC6112W controller controller0 has a default name . . . CMMVC6112W mdisk mdisk1 has a default name ................ CMMVC6136W No SSH key file svc.config.admin.admin.key CMMVC6136W No SSH key file svc.config.test.admin.key CMMVC6136W No SSH key file svc.config.thorsten.admin.key ...................................... CMMVC6155I SVCCONFIG processing completed successfully IBM_2145:ITSOCL1:admin>svcinfo ls2145dumps id 2145_filename 0 svc.config.cron.bak_node1 1 . . 17 ups_log.a 18 svc.config.backup.bak_Node-1 19 svc.config.backup.xml_Node-1
52
As is the case for the CLI, a new svc.config.backup.xml_Node-1 will appear in the list dump section.
53
54
Chapter 4.
55
56
Note: If you plan to shut down a node as part of scheduled downtime while leaving the surviving node running, we recommend: Select a time when there is the least amount of write I/O to shut down your node, which ensures that the SVC cache contains the least amount of outstanding write I/Os that are yet to be destaged to disk. Shut down down your node and wait at least 10 minutes before you perform your scheduled work. If there is a fatal failure to the surviving node before the cache has been fully destaged, you might lose any uncommitted write I/Os.
We used TPC to collect performance data for the SVC cluster while the tests ran. Figure 4-1 on page 58 shows the CPU utilization while the 70 GB disk was managed by the SVC (tests B, C, and D). The blue line shows node 1, which was the preferred path for this VDisk. Before we started test D, we disabled all the ports for node 1 on the switches to which it was connected, which resulted in node 1 going offline and node 2 now being used as the alternate path to this VDisk.
Chapter 4. I/O Groups and nodes
57
There was no noticeable increase in CPU utilization in test B and test C. There was a slight reduction in CPU utilization when running test D.
Node 3 and node 4 were not used during this test. While there is a loss in performance when a node fails, and this test revealed our loss meant that we only saw 88.5% performance (when compared to our best result, which is the normal SVC usage configuration), we conclude that it was still 21.3% better than direct-attached.
58
If your cluster has up to six nodes, you have these options available: Add the new hardware to the cluster, migrate VDisks to the new nodes, and then retire the older hardware when it is no longer managing any VDisks. This method requires a brief outage to the hosts to change the I/O Group for each VDisk. Swap out one node in each I/O Group at a time and replace it with the new hardware. We recommend that you engage an IBM Service Support Representative (SSR) to help you with this process. You can perform this swap without an outage to the hosts. If your cluster has eight nodes, the options are similar: Swap out a node in each I/O Group one at a time and replace it with the new hardware. We recommend that you engage an IBM SSR to help you with this process. You can perform this swap without an outage to the hosts, and you need to swap a node in one I/O Group at a time. Do not change all I/O Groups in a multi-I/O Group cluster at one time. Move the VDisks to another I/O Group so that all VDisks are on three of the four I/O Groups. You can then remove the remaining I/O Group with no VDisks from the cluster and add the new hardware to the cluster. As each pair of new nodes is added, VDisks can then be moved to the new nodes, leaving another old I/O Group pair that can be removed. After all the old pairs are removed, the last two new nodes can be added, and if required, VDisks can be moved onto them. Unfortunately, this method requires several outages to the host, because VDisks are moved between I/O Groups. This method might not be practical unless you need to implement the new hardware over an extended period of time, and the first option is not practical for your environment. You can mix the previous two options. New SVC hardware provides considerable performance benefits on each release, and there have been substantial performance improvements since the first hardware release. Depending on the age of your existing SVC hardware, the performance requirements might be met by only six or fewer nodes of the new hardware. If this is the case, you might be able to utilize a mix of the previous two steps. For example, use an IBM SSR to help you upgrade one or two I/O Groups, and then move the VDisks from the remaining I/O Groups onto the new hardware.
59
60
Chapter 5.
Storage controller
In this chapter, we discuss the following topics: Controller affinity and preferred path Pathing considerations for EMC Symmetrix/DMX and HDS Logical unit number (LUN) ID to MDisk translation MDisk to VDisk mapping Mapping physical logical block addresses (LBAs) to extents Media error logging Selecting array and cache parameters Considerations for controller configuration LUN masking Worldwide port name (WWPN) to physical port translation Using TotalStorage Productivity Center (TPC) to identify storage controller boundaries Using TPC to measure storage controller performance
61
62
See Chapter 15, Troubleshooting and diagnostics on page 259 for information regarding checking the back-end paths to storage controllers.
5.3.1 ESS
The ESS uses 14 bits to represent the LUN ID, which decodes as: XXXX00000000 = xxyyXXX = 14 bit LUN ID The first two bits (xx) are always set to 01. The second two bits (yy) are not used. For example, LUN ID 1723, as displayed from the ESS storage specialist, displays as 572300000000 in the Controller LUN Number field on SVC from the MDisk details: 572300000000 = 0101 723 = 1723
63
64
From the MDisk details panel in Figure 5-2, the Controller LUN Number field is 4011400500000000, which translates to LUN ID 0x1105 (represented in Hex). We can also identify the storage controller from the Controller Name as DS8K7598654, which had been manually assigned. Note: The command line interface (CLI) references the Controller LUN Number as ctrl_LUN_#.
65
You can use the lsvdiskextent CLI command to obtain this information. The lsmdiskextent output in Example 5-1 shows a list of VDisk IDs that have extents allocated to mdisk0 along with the number of extents. The GUI also has a drop-down option to perform the same function for VDisks and MDisks.
Example 5-1 lsmdiskextent
LABEL: SC_DISK_ERR2 IDENTIFIER: B6267342 Date/Time: Thu Jul Sequence Number: 4334 5 10:49:35 2007
66
Machine Id: 00C91D3B4C00 Node Id: testnode Class: H Type: PERM Resource Name: hdisk34 Resource Class: disk Resource Type: 2145 Location: U7879.001.DQDFLVP-P1-C1-T1-W5005076801401FEF-L4000000000000 VPD: Manufacturer................IBM Machine Type and Model......2145 ROS Level and ID............0000 Device Specific.(Z0)........0000043268101002 Device Specific.(Z1)........0200604 Serial Number...............60050768018100FF78000000000000F6 SENSE DATA 0A00 2800 001C 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
From the sense byte decode: Byte 2 = SCSI Op Code (28 = 10-Byte Read) Bytes 4-7 = LBA (Logical Block Address for VDisk) Byte 30 = Key Byte 40 = Code Byte 41 = Qualifier
: Node7 : mdisk : 48
67
Sequence Number : 7073 Root Sequence Number : 7073 First Error Timestamp : Thu Jul 26 17:44:13 2007 : Epoch + 1185486253 Last Error Timestamp : Thu Jul 26 17:46:13 2007 : Epoch + 1185486373 Error Count : 21 Error ID : 10025 : A media error has occurred during I/O to a Managed Disk Error Code : 1320 : Disk I/O medium error Status Flag : FIXED Type Flag : TRANSIENT ERROR 40 6D 04 02 00 00 00 00 11 80 02 03 00 00 00 00 40 00 00 11 00 00 00 00 02 00 02 0B 00 00 00 00 00 40 00 80 00 00 00 0B 00 00 00 6D 00 00 00 00 00 00 00 59 00 00 00 00 00 00 00 58 00 00 00 00 00 00 00 00 00 00 00 04 00 00 01 00 00 00 00 00 00 00 0A 00 00 00 00 00 02 00 00 00 00 00 00 00 28 00 00 08 00 00 00 10 00 00 80 00 00 00 00 00 58 80 00 C0 00 00 00 02 59 00 00 AA 00 00 00 01
Where the sense byte decodes as: Byte 12 = SCSI Op Code (28 = 10-Byte Read) Bytes 14-17 = LBA (Logical Block Address for MDisk) Bytes 49-51 = Key/Code/Qualifier Caution: Attempting to locate data checks on MDisks by scanning VDisks with host applications, such as dd, or using SVC background functions, such as VDisk migrations and FlashCopy, can cause the Managed Disk Group (MDG) to go offline as a result of error handling behavior in current levels of SVC microcode. This behavior will change in future levels of SVC microcode. Check with support prior to attempting to locate data checks by any of these means.
Notes: Media errors encountered on VDisks will log error code 1320 Disks I/O Medium Error. VDisk migrations and flashcopies that exceed the media error site limit of 32 will terminate and log error code 1610 Too many medium errors on Managed Disk.
68
workloads. A common mistake that people make when selecting array width is the tendency to focus only on the capability of a single array to perform various workloads. However, you must also consider in this decision the aggregate throughput requirements of the entire storage server. A large number of physical disks in an array can create a workload imbalance between the controllers, because only one controller of the DS4000 actively accesses a specific array. When selecting array width, you must also consider its effect on rebuild time and availability. A larger number of disks in an array increases the rebuild time for disk failures, which can have a negative effect on performance. Additionally, more disks in an array increases the probability of having a second drive fail within the same array prior to the rebuild completion of an initial drive failure, which is an inherent exposure to the RAID5 architecture. Best practice: For the DS4000, we recommend array widths of 4+p and 8+p.
69
5.7.3 DS8000
For the DS8000, you cannot tune the array and cache parameters. The arrays will be either 6+p or 7+p, depending on whether the array site contains a spare and whether the segment size (contiguous amount of data that is written to a single disk) is 256k for fixed block volumes. Caching for the DS8000 is done on a 64k track boundary.
70
71
Example 5-4 shows what this invalid configuration looks like from the CLI output of the lsarray and lsrank commands. Notice that arrays residing on the same DA pair contain the same group number (0 or 1).
Example 5-4 Command output dscli> lsarray -l Date/Time: Aug 8, 2007 8:54:58 AM CEST IBM DSCLI Version:5.2.410.299 DS: IBM.2107-75L2321 Array State Data RAID type arsite Rank DA Pair DDMcap(10^9B) diskclass =================================================================================== A0 Assign Normal 5 (6+P+S) S1 R0 0 146.0 ENT A1 Assign Normal 5 (6+P+S) S9 R1 1 146.0 ENT A2 Assign Normal 5 (6+P+S) S17 R2 2 146.0 ENT A3 Assign Normal 5 (6+P+S) S25 R3 3 146.0 ENT A4 Assign Normal 5 (6+P+S) S2 R4 0 146.0 ENT A5 Assign Normal 5 (6+P+S) S10 R5 1 146.0 ENT A6 Assign Normal 5 (6+P+S) S18 R6 2 146.0 ENT A7 Assign Normal 5 (6+P+S) S26 R7 3 146.0 ENT dscli> lsrank -l Date/Time: Aug 8, 2007 8:52:33 AM CEST IBM DSCLI Version: 5.2.410.299 DS: IBM.2107-75L2321 ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts ====================================================================================== R0 0 Normal Normal A0 5 P0 extpool0 fb 779 779 R1 1 Normal Normal A1 5 P1 extpool1 fb 779 779 R2 0 Normal Normal A2 5 P2 extpool2 fb 779 779 R3 1 Normal Normal A3 5 P3 extpool3 fb 779 779 R4 0 Normal Normal A4 5 P4 extpool4 fb 779 779 R5 1 Normal Normal A5 5 P5 extpool5 fb 779 779 R6 0 Normal Normal A6 5 P6 extpool6 fb 779 779 R7 1 Normal Normal A7 5 P7 extpool7 fb 779 779
Figure 5-5 on page 73 shows an example of a correct configuration that balances the workload across all eight DA adapters. 72
SAN Volume Controller: Best Practices and Performance Guidelines
Example 5-5 shows what this correct configuration looks like from the CLI output of the lsrank command. The configuration from the lsarray output remains unchanged. Notice that arrays residing on the same DA pair are split between groups 0 and 1.
Example 5-5 Command output dscli> lsrank -l Date/Time: Aug 9, 2007 2:23:18 AM CEST IBM DSCLI Version: 5.2.410.299 DS: IBM.2107-75L2321 ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts ====================================================================================== R0 0 Normal Normal A0 5 P0 extpool0 fb 779 779 R1 1 Normal Normal A1 5 P1 extpool1 fb 779 779 R2 0 Normal Normal A2 5 P2 extpool2 fb 779 779 R3 1 Normal Normal A3 5 P3 extpool3 fb 779 779 R4 1 Normal Normal A4 5 P5 extpool5 fb 779 779 R5 0 Normal Normal A5 5 P4 extpool4 fb 779 779 R6 1 Normal Normal A6 5 P7 extpool7 fb 779 779 R7 0 Normal Normal A7 5 P6 extpool6 fb 779 779
73
The ESS and DS8000 populate Fibre Channel (FC) adapters across two to eight I/O enclosures, depending on configuration. Each I/O enclosure represents a separate hardware domain. Ensure that adapters configured to different SAN networks do not share the same I/O enclosure as part of our goal of keeping redundant SAN networks isolated from each other. Best practices that we recommend: Configure a minimum of eight ports per DS8000 Configure 16 ports per DS8000 when > 48 ranks are presented to the SVC cluster Configure a maximum of two ports per four port DS8000 adapter Configure adapters across redundant SAN networks from different I/O enclosures
74
dscli> showvolgrp -dev IBM.2107-75ALNN1 V0 Date/Time: August 15, 2007 10:12:33 AM PDT IBM DSCLI Version: 5.0.4.43 DS: IBM.2107-75ALNN1 Name SVCVG0 ID V0 Type SCSI Mask Vols 1000 1001 1004 1005 Example 5-7 shows lshostconnect output from the DS8000. Here, you can see that all 16 ports of the 4-node cluster are assigned to the same volume group (V0) and, therefore, have been assigned to the same four LUNs.
Example 5-7 lshostconnect output
dscli> lshostconnect -dev IBM.2107-75ALNN1 Date/Time: August 14, 2007 11:51:31 AM PDT IBM DSCLI Version: 5.0.4.43 DS: IBM.2107-75ALNN1 Name ID WWPN HostType Profile portgrp volgrpID ESSIOport ================================================================================== ========== svcnode 0000 5005076801302B3E SVC San Volume Controller 0 V0 all svcnode 0001 5005076801302B22 SVC San Volume Controller 0 V0 all svcnode 0002 5005076801202D95 SVC San Volume Controller 0 V0 all svcnode 0003 5005076801402D95 SVC San Volume Controller 0 V0 all svcnode 0004 5005076801202BF1 SVC San Volume Controller 0 V0 all svcnode 0005 5005076801402BF1 SVC San Volume Controller 0 V0 all svcnode 0006 5005076801202B3E SVC San Volume Controller 0 V0 all svcnode 0007 5005076801402B3E SVC San Volume Controller 0 V0 all svcnode 0008 5005076801202B22 SVC San Volume Controller 0 V0 all svcnode 0009 5005076801402B22 SVC San Volume Controller 0 V0 all svcnode 000A 5005076801102D95 SVC San Volume Controller 0 V0 all svcnode 000B 5005076801302D95 SVC San Volume Controller 0 V0 all svcnode 000C 5005076801102BF1 SVC San Volume Controller 0 V0 all svcnode 000D 5005076801302BF1 SVC San Volume Controller 0 V0 all svcnode 000E 5005076801102B3E SVC San Volume Controller 0 V0 all svcnode 000F 5005076801102B22 SVC San Volume Controller 0 V0 all fd11asys 0010 210100E08BA5A4BA VMWare VMWare 0 V1 all fd11asys 0011 210000E08B85A4BA VMWare VMWare 0 V1 all
75
0 0 0 0
V2 V2 V3 V3
Additionally, you can see from the lshostconnect output that only the SVC WWPNs are assigned to V0. Caution: Data corruption can occur if LUNs are assigned to both SVC nodes and non-SVC nodes, that is, direct-attached hosts. Next, we show you how the SVC will see these LUNs if the zoning is properly configured. The Managed Disk Link Count represents the total number of MDisks presented to the SVC cluster. Figure 5-6 shows the output storage controller general details. To display this panel, we selected Work with Managed Disks Disk Controller Systems View General Details. In this case, we can see that the Managed Disk Link Count is 4, which is correct for our example.
Figure 5-7 on page 77 shows the storage controller port details. To get to this panel, we selected Work with Managed Disks Disk Controller Systems View General Details Ports.
76
Here a path represents a connection from a single node to a single LUN. Because we have four nodes and four LUNs in this example configuration, we expect to see a total of 16 paths with all paths evenly distributed across the available storage ports. We have validated that this configuration is correct, because we see eight paths on one WWPN and eight paths on the other for a total of 16 paths.
WWPN format for ESS = 5005076300XXNNNN XX = adapter location within storage controller NNNN = unique identifier for storage controller Bay R1-B1 R1-B1 R1-B1 R1-B1 R1-B2 R1-B2 R1-B2 R1-B2 Slot H1 H2 H3 H4 H1 H2 H3 H4 XX C4 C3 C2 C1 CC CB CA C9 Bay Slot XX R1-B3 R1-B3 R1-B3 R1-B3 R1-B4 R1-B4 R1-B4 R1-B4 H1 H2 H3 H4 H1 H2 H3 H4 C8 C7 C6 C5 D0 CF CE CD
77
In Example 5-9, we show the WWPN to physical port translations for the DS8000.
Example 5-9 DS8000
WWPN format for DS8000 = 50050763030XXYNNN XX = adapter location within storage controller Y = port number within 4-port adapter NNN = unique identifier for storage controller IO Bay Slot XX IO Bay Slot XX Port Y B1 S1 S2 S4 S5 00 01 03 04 B5 S1 S2 S4 S5 20 21 23 24 P1 0 P2 4 P3 8 B2 S1 S2 S4 S5 08 09 0B 0C B6 S1 S2 S4 S5 28 29 2B 2C P4 C B3 S1 S2 S4 S5 10 11 13 14 B7 S1 S2 S4 S5 30 31 33 34 B4 S1 S2 S4 S5 18 19 1B 1C B8 S1 S2 S4 S5 38 39 3B 3C
78
Figure 5-9 completes the end-to-end view by mapping the MDisk through the SVC to the attached host. Click MDisk MDGroup VDisk host disk.
79
nodes and the controllers. Both rates are considered when monitoring storage controller performance. The two most important metrics when measuring I/O subsystem performance are response time in milliseconds and throughput in I/Os per second (IOPS): Response time is measured from where commands originate and in non-SVC environments. With the SVC, we not only have to consider response time from the host to the SVC nodes, but also from the SVC nodes to the storage controllers. Throughput, however, can be measured at a variety of points along the data path, and the SVC adds additional points where throughput is of interest and measurements can be obtained. TPC offers many disk performance reporting options that support the SVC environment well and also the storage controller back end for a variety of storage controller types. This is a list of the most relevant storage components where performance metrics can be collected when monitoring storage controller performance: Subsystem Controller Array MDisk MDG Port Note: In SVC environments, the SVC nodes interact with the storage controllers in the same way as a host. Therefore, the performance rules and guidelines that we discuss in this section are also applicable to non-SVC environments. References to MDisks are analogous with host-attached LUNs in a non-SVC environment.
5.12.1 Approximations
These are some of the approximations that we have made or assumed: Throughput for storage volumes can range from 1 IOPS to more than 1,000 IOPS based mostly on the nature of the application. When the I/O rates for an MDdisk approach 1,000 IOPS, it is because that MDisk is encountering very good controller cache behavior, otherwise, such high I/O rates are not possible. A 10 millisecond response time is generally considered to be getting high; however, it might be perfectly acceptable depending on application behavior and requirements. For example, many On-Line Transaction Processing (OLTP) environments require response times in the 5 to 8 millisecond range, while batch applications with large sequential transfers are operating nominally in the 15 to 30 millisecond range. Nominal service times for disks today are 5-7 milliseconds; however, when a disk is at 50% utilization, ordinary queuing adds a wait time roughly equal to the service time, so a 10-14 millisecond response time is a reasonable goal in most environments. High controller cache hit ratios allow the back-end arrays to run at a higher utilization. A 70% array utilization produces high array response times; however, when averaged with cache hits, they will produce acceptable average response times. High SVC read hit ratios can have the same effect on array utilization in that it will allow higher MDisk utilizations and, therefore, higher array response times. Poor cache hit ratios require good back-end response times. Front-end response times typically need to be in the 5-15 millisecond range.
80
Back-end response times to arrays can usually operate in the 20-25 millisecond range up to 60 milliseconds unless the cache hit ratio is low.
81
82
Array response times depend on many factors, including disk RPM and the array configuration. However, in all cases when the number of IOPS is near, or exceeds 1,000 IOPS, the array is very busy. Table 5-3 shows the upper limit for several disk speeds and array widths. Remember that while these I/O rates can be achieved, they imply considerable queuing delays and high response times.
Table 5-3 DDM speeds DDM speed 10K 15K 7.2k (near-line) Max Ops/sec 150 - 175 200 - 225 85 - 110 6+P Ops/sec 900 - 1050 1200 - 1350 510 - 660 7+P Ops/sec 1050 - 1225 1400 - 1575 595 - 770
These numbers can vary significantly depending on cache hit ratios, block size, and service time. Rule: 1,000 IOPS indicates a very busy array and can be impacting front-end response times.
83
84
Chapter 6.
MDisks
In this chapter, we discuss various MDisk attributes, as well as provide an overview to the process of adding and removing MDisks from existing Managed Disk Groups (MDGs). In this chapter, you will find the following sections: Back-end queue depth MDisk transfer size Selecting logical unit number (LUN) attributes for MDisks Tiered storage Adding MDisks to existing MDGs Remapping managed MDisks Controlling extent allocation order for VDisk creation
85
86
Sequential writes
The SVC does not employ a caching algorithm for explicit sequential detect, which means coalescing of writes in SVC cache has a random component to it. For example, 4 KB writes to VDisks will translate to a mix of 4 KB, 8 KB, 16 KB, 24 KB, and 32 KB transfers to the MDisks with reducing probability as the transfer size grows. Although larger transfer sizes tend to be more efficient, this varying transfer size has no effect on the controllers ability to detect and coalesce sequential content to achieve full stride writes.
Sequential reads
The SVC uses prefetch logic for staging reads based on statistics maintained on 128 MB regions. If the sequential content is sufficiently high enough within a region, prefetch will occur with 32 KB reads.
Chapter 6. MDisks
87
be reduced to the level of the poorest performing MDisk. Likewise, all LUNs need to also possess the same availability characteristics. Remember that the SVC does not provide any RAID capabilities. Because all MDisks are placed in the same MDG, the loss of access to any one of the MDisks within the MDG will impact the entire MDG. We recommend these best practices for LUN selection: Must be the same type Must be the same RAID level Must be the same RAID width (number of physical disks in array) Must have the same availability and fault tolerance characteristics MDisks created on LUNs with varying performance and availability characteristics need to be placed in separate MDGs.
Restriping extents
Adding MDisks to existing MDGs can result in reduced performance across the MDG due to the extent imbalance that will occur and the potential to create hot spots within the MDG. After adding MDisks to MDGs, we recommend that extents are rebalanced across all available MDisks. You accomplish this by using the command line interface (CLI) by manual command entry. Or, you can automate rebalancing the extents across all available MDisks by using a Perl script that is available from the alphaWorks Web site: http://www.alphaworks.ibm.com/tech/svctools The following CLI commands can be used to identify and correct extent imbalance across MDGs: svcinfo lsmdiskextent svctask migrateexts svcinfo lsmigrate
Chapter 6. MDisks
89
Renaming MDisks
We recommend that you rename MDisks from their SVC-assigned name after you discover them. Using a naming convention for MDisks that associates the MDisk to the controller and array helps during problem isolation and avoids confusion that can lead to an administration error. Note that when multiple tiers of storage exist on the same SVC cluster, you might also want to indicate the storage tier in the name as well. For example, you can use R5 and R10 to differentiate RAID levels or you can use T1, T2, and so on to indicate defined tiers. Best practice: Use a naming convention for MDisks that associates the MDisk with its corresponding controller and array within the controller, for example, DS8KR512345A22.
90
Figure 6-1 Controller Number and UID fields from the SVC MDisk details panel
Figure 6-2 on page 92 shows an example of the Logical Drive Properties for the DS4000. Note that the DS4000 refers to UID as the Logical Drive ID.
Chapter 6. MDisks
91
which can be obtained from the controller profile. To view the logical drive properties, click Logical/Physical View LUN Open Properties. See Figure 6-2 on page 92 for an example of the Logical Drive Properties panel for a DS4000 logical drive. This panel shows Logical Drive ID(UID) and SSID.
Chapter 6. MDisks
93
Table 6-1 shows the initial discovery order of six MDisks. Note that adding these MDisks to an MDG in this order results in three contiguous extent allocations alternating between the even and odd extent pools, as opposed to alternating between extent pools for each extent.
Table 6-1 Initial discovery order LUN ID 1000 1001 1002 1100 1101 1102 MDisk ID 1 2 3 4 5 6 MDisk name mdisk01 mdisk02 mdisk03 mdisk04 mdisk05 mdisk06 Controller resource DA pair/extent pool DA2/P0 DA6/P16 DA7/P30 DA0/P9 DA4/P23 DA5/P39
To change extent allocation so that each extent alternates between even and odd extent pools, the MDisks can be renamed after being discovered and then added to the MDG in their new order. Table 6-2 shows how the MDisks have been renamed so that when they are added to the MDG in their new order, the extent allocation will alternate between even and odd extent pools.
Table 6-2 MDisks renamed LUN ID 1000 1100 1001 1101 1002 1102 MDisk ID 1 4 2 5 3 6 MDisk name original/new mdisk01/md001 mdisk04/md002 mdisk02/md003 mdisk05/md004 mdisk03/md005 mdisk06/md006 Controller resource DA pair/extent pool DA2/P0 DA0/P9 DA6/P16 DA4/P23 DA7/P30 DA5/P39
There are two options available for VDisk creation. We describe both options along with the differences between the two options: Option A: Explicitly select the candidate MDisks within the MDG that will be used (via command line interface (CLI) or GUI). Note that when explicitly selecting the MDisk list, the extent allocation will round-robin across MDisks in the order that they are represented on the list starting with the first MDisk on the list: Example A1: Creating a VDisk with MDisks from the explicit candidate list order: md001, md002, md003, md004, md005, and md006. The VDisk extent allocations then begin at md001 and alternate round-robin around the explicit MDisk candidate list. In this case, the VDisk is distributed in the following order: md001, md002, md003, md004, md005, and md006. Example A2: Creating a VDisk with MDisks from the explicit candidate list order: md003, md001, md002, md005, md006, and md004. The VDisk extent allocations then begin at md003 and alternate round-robin around the explicit MDisk candidate list. In 94
SAN Volume Controller: Best Practices and Performance Guidelines
this case, the VDisk is distributed in the following order: md003, md001, md002, md005, md006, and md004. Option B: Do not explicitly select the candidate MDisks within an MDG that will be used (via command line interface (CLI) or GUI). Note that when the MDisk list is not explicitly defined, the extents will be allocated across MDisks in the order that they were added to the MDG, and the MDisk that will receive the first extent will be randomly selected: Example B1: Creating a VDisk with MDisks from the candidate list order (based on this definitive list from the order that the MDisks were added to the MDG: md001, md002, md003, md004, md005, and md006. The VDisk extent allocations then begin at a random MDisk starting point (let us assume md003 is randomly selected) and alternate round-robin around the explicit MDisk candidate list based on the order that they were added to the MDG originally. In this case, the VDisk is allocated in the following order: md003, md004, md005, md006, md001, and md002. Summary: Independent of the order a storage subsystems LUNs (volumes) are discovered by SVC, recognize that by renaming MDisks and changing the order that they are added to the MDG will influence how the VDisks extents are allocated. Renaming MDisks into a particular order and then adding them to the MDG in that order will allow the starting MDisk to be randomly selected for each VDisk created and, therefore, is the optimal method for balancing VDisk extent allocation across storage subsystem resources. When MDisks are added to an MDG based on the order the MDisks were discovered, the allocation order can be explicitly specified; however, the MDisk used for the first extent will always be the first MDisk specified on the list. When creating VDisks from the GUI: Recognize that you are not required to select the MDisks from the Managed Disk Candidates list and click the Add button, but rather you have the option to just input a capacity value into the Type the size of the virtual disks field and select whether you require formatting the VDisk. With this approach, Option B is the applied methodology for how the VDisks extents will be allocated within an MDG. When a set or subset of MDisks are selected and added (by using the Add button) to the Managed Disks Striped in this Order column, then Option A is the applied methodology for how the VDisks extents are explicitly distributed across the selected MDisks.
Figure 6-4 on page 96 shows the attributes panel for creating VDisks.
Chapter 6. MDisks
95
96
Chapter 7.
97
98
The following best practices are geared toward availability and do not consider the potential implications on performance. Therefore, there will always be valid reasons why these best practices cannot all be adhered to in all cases. As is always the case, performance needs to be considered in terms of specific application workload characteristics and requirements. In the following sections, we examine some of the effects that these practices have on performance. Best practices for availability: Each storage controller must be used within a single SVC cluster. Each array must be included in only one MDG. Each MDG must only contain MDisks from a single array controller. Each MDG must contain MDisks from no more than 10 arrays.
99
Note: We highly recommend that you use DiskMagic to size the performance demand for specific workloads. You can obtain a copy of DiskMagic, which can assist you with this effort, from: http://www.intellimagic.net
100
Note: It can be better to move users up the performance spectrum rather than down. People rarely complain if performance increases. So, if there is uncertainty about which pool is the correct one to use, use the pool with the lower performance and move the users up to the higher performing pool later if required.
101
A lot depends on the timing of when the workloads will run. If it is mainly OLTP during the day shift and the batch workloads run at night, there is no problem with mixing the workloads in the same MDG. If the two workloads run concurrently and if the batch workload runs with no cap or throttling and requires high levels of I/O throughput, we recommend that wherever possible, the workloads are segregated onto different MDGs that are supported by different disks, raid arrays, and resources.
102
Table 7-2 Two LUNs per array DS8000 array Array1 Array2 Array3 Array4 Array5 Array6 Array7 Array8 Table 7-3 One LUN per array DS8000 array Array1 Array2 Array3 Array4 Array5 Array6 Array7 Array8 LUN1 MDG1 MDG1 MDG1 MDG1 MDG2 MDG2 MDG2 MDG2 LUN1 MDG1 MDG1 MDG1 MDG1 MDG2 MDG2 MDG2 MDG2 LUN2 MDG1 MDG1 MDG1 MDG1 MDG2 MDG2 MDG2 MDG2
Testing was performed using a 4-node cluster with two I/O Groups and eight VDisks per MDG. The following workloads were used in the testing: Ran-R/W-50/50-0%CH Seq-R/W-50/50-25%CH Seq-R/W-50/50-0%CH Ran-R/W-70/30-25%CH Ran-R/W-50/50-25%CH Ran-R/W-70/30-0%CH Seq-R/W-70/30-25%CH Seq-R/W-70/30-0%CH Note: CH=Cache Hit, 25%CH means that 25% of all I/Os are read cache hits. The following performance metrics were collected for a single MDG using TotalStorage Productivity Center (TPC). Figure 7-3 on page 104 and Figure 7-4 on page 105 show the IOPS and response time comparisons between Config1 and Config2.
103
Figure 7-3 IOPS comparison between two LUNs per array and one LUN per array
104
Figure 7-4 Response time comparison between two LUNs per array and one LUN per array
The test shows a small response time advantage to the two LUNs per array configuration and a small IOPS advantage to the one LUN per array configuration for sequential workloads. Overall, the performance differences between these configurations are minimal.
105
arrays per MDG that is appropriate for general cases. Again, when it comes to performance, there can always be exceptions.
You can see from this design that if a single array fails, all four MDGs are affected, and all VDisks that are using storage from this DS8000 fail.
106
An alternative to this configuration is shown in Table 7-6. Here, the arrays are divided into two LUNs each, and there are half the number of arrays for each MDG as there were in the first configuration. In this design, the failure boundary of an array failure is cut in half, because any single array failure only affects half of the MDGs.
Table 7-6 Configuration two: Each array is contained in two MDGs DS8000 array Array1 Array2 Array3 Array4 Array5 Array6 Array7 Array8 LUN1 MDG1 MDG1 MDG1 MDG1 MDG2 MDG2 MDG2 MDG2 LUN2 MDG3 MDG3 MDG3 MDG3 MDG4 MDG4 MDG4 MDG4
We collected the following performance metrics using TPC to compare these configurations. The first test was performed with all four MDGs evenly loaded. Figure 7-5 on page 108 and Figure 7-6 on page 109 show the IOPS and response time comparisons between Config1 and Config2 for varying workloads.
107
Figure 7-5 IOPS comparison of eight arrays/MDG and four arrays/MDG with all four MDGs active
108
Figure 7-6 Response time comparison between eight arrays/MDG and four arrays/MDG with all four MDGs active
This test shows virtually no difference between using eight arrays per MDG compared to using four arrays per MDG when all MDGs are evenly loaded (with the exception of a small advantage in IOPS for the eight array MDG for sequential workloads). We performed two additional tests to show the potential effect when MDGs are not loaded evenly. The first test was performed using only one of the four MDGs, while the other three MDGs remained idle. This test presents the worst case scenario, because the eight array MDG has the full dedicated bandwidth of all eight arrays available to it, and therefore, halving the number of arrays has a pronounced effect. This tends to be an unrealistic scenario, because it is unlikely that all host workload will be directed at a single MDG. Figure 7-7 on page 110 shows the IOPS comparison between these configurations.
109
Figure 7-7 IOPS comparison between eight arrays/MDG and four Arrays/MDG with a single MDG active
We performed the second test with I/O running to only two of the four MDGs.
110
Figure 7-8 IOPS comparison between eight arrays/MDG and four arrays/MDG with two MDGs active
Figure 7-8 shows the results from the test where only two of the four MDGs are loaded. This test shows no difference between the eight arrays per MDG configuration and the four arrays per MDG configuration for random workload. This test shows a small advantage to the eight arrays per MDG configuration for sequential workloads. Our conclusions are: The performance advantage with striping across a larger number of arrays is not as pronounced as you might expect. You must consider the number of MDisks per array along with the number of arrays per MDG to understand aggregate MDG loading effects. Availability improvements can be achieved without compromising performance objectives.
111
possible with striping mode. This situation is a rare exception given the unlikely requirement to optimize for FlashCopy as opposed to online workload. Note: Electing to use sequential mode over striping requires a detailed understanding of the data layout and workload characteristics in order to avoid negatively impacting system performance.
112
Chapter 8.
VDisks
In this chapter, we discuss Virtual Disks (VDisks). We describe creating them, managing them, and migrating them across I/O Groups. We then discuss VDisk performance and how you can use TotalStorage Productivity Center (TPC) to analyze performance and help guide you to possible solutions.
113
IBM_2145:ITSOCL1:admin>svctask chvdisk -name TEST_NEWNAME TEST_TEST Balance the VDisks across the I/O Groups in the cluster to balance the load across the cluster. At the time of VDisk creation, the workload to be put on the VDisk might not be known. In this case, if you are using the GUI, accept the system default of load balancing allocation. Using the CLI, you must manually specify the I/O Group. In configurations with large numbers of attached hosts where it is not possible to zone a host to multiple I/O Groups, it might not be possible to choose to which I/O Group to attach the VDisks. The VDisk has to be created in the I/O Group to which its host belongs. For moving a VDisk across I/O Groups, see 8.1.3, Moving a VDisk to another I/O Group on page 117. Note: Migrating VDisks across I/O Groups is a disruptive action. Therefore, it is best to get this correct at the time of VDisk creation. By default, the preferred node, which owns a VDisk within an I/O Group, is selected on a load balancing basis. At the time of VDisk creation, the workload to be put on the VDisk might not be known. But it is important to distribute the workload evenly on the SVC nodes within an I/O Group. The preferred node cannot easily be changed. If you need to change the preferred node, see 8.1.2, Changing the preferred node within an I/O Group on page 116. The maximum number of VDisks per I/O Group is 1024. The maximum number of VDisks per cluster is 4096 (eight node cluster). The smaller the extent size that you select, the finer the granularity of the VDisk of space occupied on the underlying storage controller. A VDisk occupies an integer number of extents, but its length does not need to be an integer multiple of the extent size. The length does need to be an integer multiple of the block size. Any space left over between the last logical block in the VDisk and the end of the last extent in the VDisk is unused. A small extent size is used in order to minimize this unused space. However, note that with the decline in disk prices, capacity is cheaper than other storage considerations, such as I/O. The counter view to this is that the smaller the extent size, the smaller the total storage volume that the SVC can virtualize (see Table 8-1 on page 115). The extent size does not affect performance. For most clients, extent sizes of 128 MB or 256 MB give a reasonable balance between VDisk granularity and cluster capacity. There is no longer a default value set. Extent size is set during the Managed Disk (MDisk) Group creation. Important: VDisks can only be migrated between Managed Disk Groups (MDGs) that have the same extent size.
114
Table 8-1 Extent size and maximum cluster capabilities Extent size 16 MB 32 MB 64 MB 128 MB 256 MB 512 MB Maximum cluster capacity 64 TB 128 TB 256 TB 512 TB 1 PB 2 PB
A VDisk can be created in one of three modes: striped, sequential, or image. See Table 8-2.
Table 8-2 VDisk modes Mode Striped Description When a VDisk is created using a striped policy, its extents are allocated from the specified ordered list of managed disks (MDisks). The allocation algorithm starts with the first managed disk in the ordered list and attempts to allocate an extent from it, then it moves to the next disk, and so on for each managed disk in turn. If no list is specified, the entire MDG is used. You can see this in Figure 8-1 on page 115. When a VDisk is created using a sequential policy, its extents are allocated from a single specified MDisk. The extents must be contiguous on that disk. Image mode provides a direct block-for-block translation from the MDisk to the VDisk with no virtualization. This mode is intended to allow virtualization of MDisks that already contain data that was written directly, not through an SVC. Image mode allows a client to insert SVC into the data path of an existing storage configuration with minimal downtime.
Sequential Image
Chapter 8. VDisks
115
With very few exceptions, you must always configure VDisks using striping mode. Note: Electing to use sequential mode over striping requires a detailed understanding of the data layout and workload characteristics in order to avoid negatively impacting system performance.
FlashCopy the VDisk to a target VDisk in the same I/O Group with the preferred node that you want, using the auto-delete option. The steps to follow are: a. b. c. d. e. 116 Cease I/O to the VDisk. Start FlashCopy. When the FlashCopy completes, unmap the source VDisk from the host. Map the target VDisk to the host. Resume I/O operations.
f. Delete the source VDisk. There is a fourth, non-SVC method of changing the preferred node within an I/O Group if the host operating system or logical volume manager supports disk mirroring. To do this, you have to: 1. Create a VDisk, the same size as the existing one, on the desired preferred node. 2. Mirror the data to this VDisk using host-based logical volume mirroring. 3. Remove the original VDisk from the Logical Volume Manager (LVM).
id name IO_group_id IO_group_name mdisk_grp_name capacity type FC_id RC_name vdisk_UID fc_map_count 0 Barry-0001 0 io_grp0 mdg1 5.0GB striped 60050768018101BF280000000000001F 0 1 Barry-0003 0 io_grp0 mdg1 5.0GB striped 60050768018101BF2800000000000021 0 7 Barrz-test 0 io_grp0 mdg1 65.9GB striped 60050768018101BF2800000000000023 0 9 Barry-0004 0 io_grp0 mdg1 5.0GB striped 60050768018101BF2800000000000022 0
mdisk_grp_id RC_id 0
online
online
online
Look for the FC_id and RC_id fields. If these fields are not blank, the VDisk is part of a mapping or relationship.
Chapter 8. VDisks
117
The procedure is: 1. Cease I/O operations to the VDisk. 2. Disconnect the VDisk from the host operating system. For example, in Windows, remove the drive letter. 3. Stop any copy operations. 4. Issue the command to move the VDisk (see Example 8-3). This command does not work while there is data in the SVC cache that is to be written to the VDisk. After two minutes, the data automatically destages if no other condition forces an earlier destaging. 5. On the host, rediscover the VDisk. For example in Windows, run a rescan, then either mount the VDisk or add a drive letter. See Chapter 10, Hosts on page 169. 6. Resume copy operations as required. 7. Resume I/O operations on the host. After any copy relationships stop, you can move the VDisk across I/O Groups with a single command in an SVC: svctask chvdisk -iogrp newiogrpname/id vdiskname/id In this command, newiogrpname/id is the name or ID of the I/O Group to which you move the VDisk and vdiskname/id is the name or ID of the VDisk. Example 8-3 shows the command to move the VDisk named VDISK-Image from its existing I/O Group, io_grp1, to io_grp0.
Example 8-3 Moving a VDisk to another I/O Group
IBM_2145:ITSOCL1:admin>svctask chvdisk -iogrp io_grp0 VDISK-Image Migrating VDisks between I/O Groups can be a potential issue if the old definitions of the VDisks are not removed from the configuration prior to importing the VDisks to the host. Migrating VDisks between I/O Groups is not a dynamic configuration change. It must be done with the hosts shut down. Then, follow the procedure listed in Chapter 10, Hosts on page 169 for the reconfiguration of SVC VDisks to hosts. We recommend that you remove the stale configuration and reboot the host to reconfigure the VDisks that are mapped to a host. For details about how to dynamically reconfigure IBM Subsystem Device Driver (SDD) for the specific host operating system, refer to Multipath Subsystem Device Driver: Users Guide, SC30-4131-01, where this procedure is also described in great depth. Note: Do not move a VDisk to an offline I/O Group under any circumstances. You must ensure that the I/O Group is online before moving the VDisks to avoid any data loss.
This command will not work if there is any data in the SVC cache, which has to be flushed out first. There is a -force flag; however, this flag discards the data in the cache rather than flushing it to the VDisk. If the command fails due to outstanding I/Os, it is better to wait a couple of minutes after which the SVC will automatically flush the data to the VDisk. Note: Using the -force flag can result in data integrity issues.
118
svctask migratevdisk -mdiskgrp MDG-1 -threads 3 -vdisk IOTEST This command will migrate our VDisk, IOTEST, to MDG, MDG-1, and use three threads while doing so. Note that instead of using the VDisk name, you can use its ID number.
Chapter 8. VDisks
119
You must perform these command line steps: 1. To determine the name of the VDisk to be moved, issue the command: svcinfo lsvdisk The output will be in the form that is shown in Example 8-5.
Example 8-5 svcinfo lsvdisk output
IBM_2145:ITSOCL1:admin>svcinfo lsvdisk id name IO_group_id IO_group_name status mdisk_grp_name capacity type FC_id FC_name RC_name vdisk_UID fc_map_count 1 DEON_MASTER 0 io_grp0 online MDG-1 4.0GB striped 3 DEON_MASTER 60050768018101BF2800000000000033 1 7 DEON_0_T_0005 0 io_grp0 online MDG-1 4.0GB striped 0 DEON_MMTAPE 60050768018101BF2800000000000032 1 9 DEON_0_T_0003 0 io_grp0 online MDG-1 4.0GB striped 60050768018101BF2800000000000031 0 10 DEON_0_T_0002 0 io_grp0 online MDG-1 4.0GB striped 60050768018101BF2800000000000030 0 11 DEON_0_T_0001 0 io_grp0 online MDG-1 4.0GB striped 60050768018101BF280000000000002F 0 13 DEON_0_0005 0 io_grp0 online MDG-1 4.0GB striped many many DEON_MMTAPE 60050768018101BF280000000000002E 2 14 DEON_0_0004 0 io_grp0 online MDG-1 4.0GB striped 60050768018101BF280000000000002D 0 15 DEON_0_0003 0 io_grp0 online MDG-1 4.0GB striped 60050768018101BF280000000000002C 0 17 DEON_0_0002 0 io_grp0 online MDG-1 4.0GB striped 60050768018101BF280000000000002B 0 18 DEON_0_0001 0 io_grp0 online MDG-1 4.0GB striped 60050768018101BF280000000000002A 0 19 VDISK-Image 0 io_grp0 online MDG-1 5.0GB striped 60050768018101BF2800000000000035 0 21 VDISK-Striped1 0 io_grp0 online MDG-1 5.0GB striped 60050768018101BF2800000000000024 0
mdisk_grp_id RC_id 0
0 13 0
2. In order to migrate the VDisk, you need the name of the MDisk to which you will migrate it. The command that you need to issue is: svcinfo lsmdisk Example 8-6 on page 121 shows the command output.
120
IBM_2145:ITSOCL1:admin>svcinfo lsmdisk id name status mode mdisk_grp_id capacity ctrl_LUN_# controller_name UID 0 mdisk0 online managed 0 600.0GB 0000000000000000 controller0 600a0b800017423300000059469cf84500000000000000000000000000000000 1 mdisk1 online unmanaged 5.0GB 0000000000000001 controller0 600a0b80001742330000005c46a62f2500000000000000000000000000000000
mdisk_grp_name MDG-1
From this command, we can see that mdisk1 is a candidate for the image type migration, because it is an unmanaged MDisk. 3. We now have enough information to enter the command to migrate the VDisk to image type: svctask migratetoimage -vdisk VDISKNAME -threads number -mdisk MDISKNAME -mdiskgrp MDISK Group Name You can see this command in Example 8-7.
Example 8-7 migratetoimage command
IBM_2145:ITSOCL1:admin>svctask migratetoimage -vdisk VDISK-Image -threads 3 -mdisk mdisk1 -mdiskgrp MDG-1 4. If there is no unmanaged MDisk to which to migrate, you can remove an MDisk from an MDisk Group. However, you can only remove an MDisk from an MDisk Group if there are enough free extents on the remaining MDisks in the group to migrate any used extents on the MDisk that you are removing. Example 8-8 shows this command.
Example 8-8 rmdisk command
MDG-1
The -force flag is the option that automatically migrates used extents on mdisk1 to the free extents in the MDG.
Chapter 8. VDisks
121
then all the paths through the owning node. Therefore, a preferred path is any port on a preferred controller, assuming that SAN zoning is correct. Note: The preferred node by no means signifies absolute ownership. The data can still be accessed by the partner node in the I/O Group in the event of a failure.
By default, the SVC assigns ownership of even-numbered VDisks to one node of a caching pair and the ownership of odd-numbered VDisks to the other node. It is possible for the ownership distribution in a caching pair to become unbalanced if VDisk sizes are significantly different between the nodes, or the VDisk numbers assigned to the caching pair are predominantly even or odd. To provide some flexibility in making plans to avoid this problem, the ownership for a specific VDisk can be explicitly assigned to a specific node when the VDisk is created. A node that is explicitly assigned as an owner of a VDisk is known as the preferred node. Because it is expected that hosts will access VDisks through the preferred nodes, those nodes can become overloaded. When a node becomes overloaded, VDisks can be moved to other I/O Groups, because the ownership of a VDisk cannot be changed after the VDisk is created. We described this situation in 8.1.3, Moving a VDisk to another I/O Group on page 117. SDD is aware of the preferred paths that SVC sets per VDisk. SDD uses a load balancing and optimizing algorithm when failing over paths; that is, it tries the next known preferred path. If this effort fails and all preferred paths have been tried, it load balances on the non-preferred paths until it finds an available path. If all paths are unavailable, the VDisk goes offline. It can take some time, therefore, to perform path failover when multiple paths go offline. SDD also performs load balancing across the preferred paths where appropriate.
IBM_2145:ITSOCL1:admin>svcinfo lsvdisk VDISK-Image id 19 name VDISK-Image IO_group_id 0 IO_group_name io_grp0 status online mdisk_grp_id 0 mdisk_grp_name MDG-1 capacity 5.0GB type striped formatted no mdisk_id mdisk_name FC_id FC_name RC_id RC_name vdisk_UID 60050768018101BF2800000000000035 throttling 0 preferred_node_id 5 fast_write_state empty cache readwrite udid 0 fc_map_count 0 IBM_2145:ITSOCL1:admin> IBM_2145:ITSOCL1:admin> The throttle setting of zero indicates that no throttling has been set. Having checked the VDisk, you can then run the svctask chvdisk command. The complete syntax of the command is: svctask chvdisk [-iogrp iogrp_name|iogrp_id] [-rate throttle_rate [-unitmb]] [-name new_name_arg] [-force] vdisk_name|vdisk_id To just modify the throttle setting, we run: svctask chvdisk -rate 40 -unitmb VDISK-Image Running the lsvdisk command now gives us the output shown in Example 8-10.
Example 8-10 Output of lsvdisk command
IBM_2145:ITSOCL1:admin>svcinfo lsvdisk VDISK-Image id 19 name VDISK-Image IO_group_id 0 IO_group_name io_grp0 status online mdisk_grp_id 0 mdisk_grp_name MDG-1 capacity 5.0GB type striped formatted no mdisk_id mdisk_name FC_id FC_name 123
Chapter 8. VDisks
RC_id RC_name vdisk_UID 60050768018101BF2800000000000035 virtual_disk_throttling (MB) 40 preferred_node_id 5 fast_write_state empty cache readwrite udid 0 fc_map_count 0 IBM_2145:ITSOCL1:admin> This example shows that the throttle setting (virtual_disk_throttling) is 40 MB/sec on this VDisk. If we had set the throttle setting to an I/O rate by using the I/O parameter, which is the default setting, we do not use the -unitmb flag: svctask chvdisk -rate 4048 VDISK-Image You can see in Example 8-11 that the throttle setting has no unit parameter, which means that it is an I/O rate setting.
Example 8-11 chvdisk command
IBM_2145:ITSOCL1:admin>svctask chvdisk -rate 4048 VDISK-Image IBM_2145:ITSOCL1:admin>svcinfo lsvdisk VDISK-Image id 19 name VDISK-Image IO_group_id 0 IO_group_name io_grp0 status online mdisk_grp_id 0 mdisk_grp_name MDG-1 capacity 5.0GB type striped formatted no mdisk_id mdisk_name FC_id FC_name RC_id RC_name vdisk_UID 60050768018101BF2800000000000035 throttling 4048 preferred_node_id 5 fast_write_state empty cache readwrite udid 0 fc_map_count 0 IBM_2145:ITSOCL1:admin> Note: An I/O governing rate of 0 (displayed as virtual_disk_throttling in the CLI output of the svcinfo lsvdisk command) does not mean that zero IOPS (or MBs per second) can be achieved. It means that no throttle is set.
124
8.3.1 Using underlying controller remote copy with SVC cache-disabled VDisks
Where synchronous or asynchronous remote copy is used in the underlying storage controller, the controller LUNS at both the source and destination must be mapped through the SVC as image mode disks with the SVC cache disabled. Note that of course it is possible to access either the source or the target of the remote copy from a host directly, rather than through the SVC. The SVC copy services can be usefully employed with the image mode VDisk representing the primary site of the controller remote copy relationship. It does not make sense to use SVC copy services with the VDisk at the secondary site, because the SVC does not see the data flowing to this LUN through the controller. Figure 8-2 on page 126 shows the relationships among the SVC, the VDisk, and the underlying storage controller for a cache-disabled VDisk.
Chapter 8. VDisks
125
Host
Host
SVC Image Mode VDisk with cache disabled at secondary site. SVC is not aware of any relationship between the VDisks or storage controllers.
Controller
Controller-based (async/sync) Remote Copy Controller performs remote copy between controllers at different sites. Synchronous and asynchronous remote copy is possible.
Controller
8.3.2 Using underlying controller PiT copy with SVC cache-disabled VDisks
Where point-in-time (PiT) copy is used in the underlying storage controller, the controller LUNs for both the source and target must be mapped through the SVC as image mode disks with the SVC cache disabled as shown in Figure 8-3 on page 127. Note that of course it is possible to access either the source or the target of the FlashCopy from a host directly rather than through the SVC.
126
Host
Host
SVC
SVC Image Mode VDisk with cache disabled. SVC is not aware of any relationship between the VDisks. SVC MDisk represents controller LUNs
FlashCopy controller
IBM_2145:ITSOCL1:admin>svctask migratetoimage -vdisk VDISK-Image -threads 4 -mdisk mdisk1 -mdiskgrp MDG-1 2. 3. 4. 5. Stop I/O to the VDisk. Unmap the VDisk from the host. Run the svcinfo lsmdisk command to check your unmanaged MDisks. Remove the VDisk, which makes the MDisk on which it is created become unmanaged. See Example 8-13.
Example 8-13 Removing the VDisk VDISK-Image
IBM_2145:ITSOCL1:admin>svctask rmvdisk VDISK-Image 6. Make an image mode VDisk on the unmanaged MDisk that was just released from the SVC. Check the MDisks by running the svcinfo lsmdisk command first. See Example 8-14 on page 128.
Chapter 8. VDisks
127
IBM_2145:ITSOCL1:admin>svcinfo lsmdisk id name status mode mdisk_grp_id mdisk_grp_name capacity ctrl_LUN_# controller_name UID 0 mdisk0 online managed 0 MDG-1 600.0GB 0000000000000000 controller0 600a0b800017423300000059469cf84500000000000000000000000000000000 1 mdisk1 online unmanaged 5.0GB 0000000000000001 controller0 600a0b80001742330000005c46a62f2500000000000000000000000000000000 2 mdisk2 online managed 0 MDG-1 70.9GB 0000000000000002 controller0 600a0b800017443100000096469cf0e800000000000000000000000000000000 IBM_2145:ITSOCL1:admin>svctask mkvdisk -mdiskgrp MDG-1 -size 5 -unit gb -iogrp io_grp0 -name VDISK-Image -cache none Virtual Disk, id [19], successfully created IBM_2145:ITSOCL1:admin>svcinfo lsvdisk VDISK-Image id 19 name VDISK-Image IO_group_id 0 IO_group_name io_grp0 status online mdisk_grp_id 0 mdisk_grp_name MDG-1 capacity 5.0GB type striped formatted no mdisk_id mdisk_name FC_id FC_name RC_id RC_name vdisk_UID 60050768018101BF2800000000000043 throttling 0 preferred_node_id 6 fast_write_state empty cache none udid fc_map_count 0 IBM_2145:ITSOCL1:admin> 7. If you want to create the VDisk with read/write cache, you leave out the -cache parameter, because cache-enabled is the default setting. See Example 8-15.
Example 8-15 Removing VDisk and recreating with cache enabled
IBM_2145:ITSOCL1:admin>svctask rmvdisk VDISK-Image IBM_2145:ITSOCL1:admin>svctask mkvdisk -mdiskgrp MDG-1 -size 5 -unit gb -iogrp io_grp0 -name VDISK-Image Virtual Disk, id [19], successfully created IBM_2145:ITSOCL1:admin>svcinfo lsvdisk VDISK-Image id 19 name VDISK-Image IO_group_id 0 128
SAN Volume Controller: Best Practices and Performance Guidelines
IO_group_name io_grp0 status online mdisk_grp_id 0 mdisk_grp_name MDG-1 capacity 5.0GB type striped formatted no mdisk_id mdisk_name FC_id FC_name RC_id RC_name vdisk_UID 60050768018101BF2800000000000044 throttling 0 preferred_node_id 6 fast_write_state empty cache readwrite udid fc_map_count 0 IBM_2145:ITSOCL1:admin> 8. You can then map the VDisk to the host and continue I/O operations after rescanning the host. See Example 8-16.
Example 8-16 Mapping VDISK-Image to host senegal
IBM_2145:ITSOCL1:admin>svctask mkvdiskhostmap -host senegal VDISK-Image Virtual Disk to Host map, id [0], successfully created IBM_2145:ITSOCL1:admin> Note: Before removing the VDisk host mapping, it is essential that you follow the procedures in Chapter 10, Hosts on page 169 so that you can remount the disk with its access to data preserved.
129
mercy of the storage again, and here we tie in with the fast path with tens of microseconds of additional latency on a read-miss. The chances are this will also be a read miss on the controller where a high-end system will respond in around 10 milliseconds. The order of magnitude of the additional latency introduced by SVC is therefore lost in the noise. A VDisk, like any storage device, has three basic properties: capacity, I/O rate, and throughput as measured in megabytes per second. One of these properties will be the limiting factor in your environment. Having cache and striping across large numbers of disks can help increase these numbers. But eventually, the fundamental laws of physics apply. There will always be a limiting number. One of the major problems with designing a storage infrastructure is that while it is relatively easy to determine the required capacity, determining the required I/O rate and throughput is not so easy. All too often the exact requirement is only known after the storage infrastructure has been built, and the performance is inadequate. One of the advantages of the SVC is that it is possible to compensate for a lack of information at the design stage due to the SVCs flexibility and the ability to non-disruptively migrate data to different types of back-end storage devices. The throughput for VDisks can range from fairly small numbers (1 to 10 IOPS) to very large values (more than 1,000 IOPS). This throughput depends a lot on the nature of the application and across how many MDisks the VDisk is striped. When the I/O rates, or throughput, approach 1,000 IOPS per VDisk, it is either because the volume is getting very good performance, usually from very good cache behavior, or that the VDisk is striped across multiple MDisks and hence usually across multiple RAID arrays on the back-end storage system. Otherwise, it is not possible to perform so many IOPS to a VDisk that is based on a single RAID array and still have a good response time. The MDisk I/O limit depends on many factors. The primary factor is the number of disks in the RAID array on which the MDisk is built and the speed or revolutions per minute (RPM) of the disks. But when the number of IOPS to an MDisk is near or above 1000, the MDisk is considered extremely busy. For 15K RPM disks, the limit is a bit higher. But these high I/O rates to the back-end storage systems are not consistent with good performance; they imply that the back-end RAID arrays are operating at very high utilizations, which is indicative of considerable queuing delays. Good planning demands a solution that reduces the load on such busy RAID arrays. For more precision, we will consider the upper limit of performance for 10K and 15K RPM, enterprise class devices. Be aware that different people have different opinions about these limits, but all the numbers in Table 8-3 represent very busy disk drive modules (DDMs).
Table 8-3 DDM speeds DDM speed Maximum operations/second 150 - 175 200 - 225 6+P operations/second 900 - 1050 1200 - 1350 7+P operations/second 1050 - 1225 1400 - 1575
10K 15K
While disks might achieve these throughputs, these ranges imply a lot of queuing delay and high response times. These ranges probably represent acceptable performance only for batch-oriented applications, where throughput is the paramount performance metric. For online transaction processing (OLTP) applications, these throughputs might already have unacceptably high response times. Because 15K RPM DDMs are most commonly used in OLTP environments (where response time is at a premium), a simple rule is if the MDisk does
more than 1000 operations per second, it is very busy, no matter what the drives RPM is.
130
In the absence of additional information, we often assume, and our performance models assume, that 10 milliseconds (msec) is pretty high. But for a particular application, 10 msec might be too low or too high. Many OLTP environments require response times closer to 5 msec, while batch applications with large sequential transfers might run fine with 20 msec response time. The appropriate value can also change between shifts or on the weekend. A response time of 5 msec might be required from 8 am until 5 pm, while 50 msec is perfectly acceptable near midnight. It is all client and application dependent. What really matters is the average front-end response time, which is what counts for the users. You can measure the average front-end response time by using TPC for Disk with its performance reporting capabilities. See Chapter 12, Monitoring on page 215 for more information. Figure 8-4 shows the overall response time of a VDisk that is under test. Here, we have plotted the overall response time. Additionally, TPC allows us to plot read and write response times as distinct entities if one of these response times was causing problems to the user. This response time in the 1- 2 msec range gives an acceptable level of performance for OLTP applications.
If we look at the I/O rate on this VDisk, we see the chart in Figure 8-5 on page 132, which shows us that the I/O rate to this VDisk was in the region of 2,000 IOPS. This normally is an unacceptably high response time for a LUN based on a single RAID array. However, in this case, the VDisk was striped across two MDisks, which gives us an I/O rate per MDisk in the order of 1,200 IOPS. This is high and normally gives a high user response time; however,
Chapter 8. VDisks
131
here, the SVC front-end cache mitigates the high latency at the back end, giving the user a good response time. Although there is no immediate issue with this VDisk, if the workload characteristics change and the VDisk becomes less cache friendly, you need to consider adding another MDisk to the MDG, making sure that it comes from another RAID array, and striping the VDisk across all three MDisks.
OLTP workloads
Probably the most important parameter as far as VDisks are concerned is the I/O response time for OLTP workloads. After you have established what VDisk response time provides good user performance, you can set TPC alerting to notify you if this number is exceeded by about 25%. Then, you check the I/O rate of the MDisks on which this VDisk is built. If there are multiple MDisks per Raid array, you need to check the RAID array performance. All of this
132
can be done using TPC. The magic number here is 1,000 IOPS, assuming that the RAID array is 6+P. See Table 8-3 on page 130. If one of the back-end storage arrays is running at more than 1,000 IOPS and the user is experiencing poor performance because of degraded response time, this array is probably the root cause of the problem. If users complain of response time problems, yet the VDisk response as measured by TPC has not changed significantly, this situation indicates that the problem is in the SAN network between the host and the SVC. You can diagnose where the problem is with TPC. The best way to determine the location of the problem is to use the Topology Viewer to look at the host using Datapath Explorer (DPE). This view enables you to see the paths from the host to the SVC, which we show in Figure 8-6.
Figure 8-6 shows the paths from the disk as seen by the server through its host bus adapters (HBAs) to the SVC VDisk. By hovering the cursor over the switch port, the throughput of that port can be seen. You can also use TPC to produce reports showing the overall throughput of the ports, which we show in Figure 8-7 on page 134.
Chapter 8. VDisks
133
TPC can present the throughput of the ports graphically over time as shown in Figure 8-8 on page 135.
134
From this type of graph, you can identify performance bottlenecks in the SAN fabric and make the appropriate changes.
Batch workloads
With batch workloads in general, the most important parameter is the throughput rate as measured in megabytes per second. The goal rate is harder to quantify than the OLTP response figure, because throughput is heavily dependent on the block size. Additionally high response times can be acceptable for these workloads. So, it is not possible to give a single metric to quantify performance. It really is a question of it depends. The larger the block size, the greater the potential throughput to the SVC. Block size is often determined by the application. With TPC, you can measure the throughput of a VDisk and the
Chapter 8. VDisks
135
MDisks on which it is built. The important measure for the user is the time that the batch job takes to complete. If this time is too long, the following steps are a good starting point: Determine the data rate that is needed for timely completion and compare it with the storage systems capability as documented in performance white papers and Disk Magic. If the storage system is capable of greater performance: 1. Make sure that the application transfer size is as large as possible. 2. Consider increasing the number of concurrent application streams, threads, files, and partitions. 3. Make sure that the host is capable of supporting the required data rate. For example, use tests, such as DD, and use TPC to monitor the results. 4. Check whether the flow of data through the SAN is balanced by using the switch performance monitors within TPC (extremely useful). 5. Check whether all switch and host ports are operating at the maximum permitted data rate of 2 or 4 Gb per seconds. 6. Watch out for cases where the whole batch window stops on a single file or database getting read or written, which can be a practical exposure for obvious reasons. Unfortunately, sometimes there is nothing that can be done. However, it is worthwhile evaluating this situation to see whether, for example, the database can be divided into partitions, or the large file replaced by multiple smaller files. Or, the use of the SVC in combination with SDD might help with a combination of striping and added paths to multiple VDisks. These efforts can allow parallel batch streams to the VDisks and, thus, speed up bath runs. The chart shown in Figure 8-9 on page 137 gives an indication of what can be achieved with tuning the VDisk and the application. Points A-B shows the normal steady state running of the application on the VDisk built on a single MDisk. We then migrated the VDisk so that it spanned two MDisks. Points B-C shows the drop in performance during the migration. When the migration was complete, points D-E shows that the performance had almost doubled. The application was one with 75% reads and 75% sequential access. The application was then modified so that it was 100% sequential. The resulting gain in performance is shown between points E and F.
136
Figure 8-10 on page 138 shows the performance enhancements that can be achieved by modifying the number of parallel streams flowing to the VDisk. Points A-B show the performance with a single stream application. We then doubled the size of the workload but kept it in single stream. As you can see from the points C-D, there is no improvement in performance. We were then able to split the workload into two parallel streams at point E. As you can see from the graph, points E-F show that the throughput to the VDisk has increased by over 60%.
Chapter 8. VDisks
137
Figure 8-10 Effect of splitting a large job into two parallel streams
Mixed workloads
As discussed in 8.1.1, Selecting the MDisk Group on page 116, we usually recommend mixing workloads, so that the maximum resources are available to any workload when needed. When there is a heavy batch workload and there is no VDisk throttling, we recommend that the VDisks are placed on separate MDGs. This action is illustrated by the chart in Figure 8-11 on page 139. VDisk 21 is running an OLTP workload, and VDisk 20 is running a batch job. Both VDisks were both in the same MDG sharing the same MDisks, which were spread over three RAID arrays. As you can see between points A to B, the response time for the OLTP workload is very high, averaging 10 milliseconds. At time B, we migrated VDisk 20 to another MDG, using MDisks built on different RAID arrays. As you can see, after the migration had completed, the response time (points D to E) dropped for both the batch job and, more importantly, the OLTP workload.
138
Chapter 8. VDisks
139
Then, as you define the VDisks and the FlashCopy mappings, calculate the maximum average I/O that the SVC will receive per VDisk before you start to overload your storage controller. This example assumes: An MDisk is defined from an entire array (that is, the array only provides one LUN and that LUN is given to the SVC as an MDisk). Each MDisk assigned to an MDG is the same size and same RAID type and comes from a storage controller of the same type. MDisks from a storage controller are contained entirely in the same MDG. The raw I/O capability of the MDG is the sum of the capabilities of its MDisks. For example, for five RAID 5 MDisks with eight component disks on a typical back-end device, the I/O capability is: 5 x ( 150 x 7 ) = 5250 This raw number might be constrained by the I/O processing capability of the back-end storage controller itself. FlashCopy copying contributes to the I/O load of a storage controller, and thus, it must be taken into consideration. The effect of a FlashCopy is effectively adding a number of loaded VDisks to the group, and thus, a weighting factor can be calculated to make allowance for this load. The affect of FlashCopy copies depends on the type of I/O taking place. For example, in a group with two FlashCopy copies and random writes to those VDisks, the weighting factor is 14 x 2 = 28. The total weighting factor for FlashCopy copies is given in Table 8-4.
Table 8-4 FlashCopy weighting Type of I/O to the VDisk None/very little Reads only Sequential reads and writes Random reads and writes Random writes Impact on I/O Insignificant Insignificant Up to 2x I/Os Up to 15x I/O Up to 50x I/O Weight factor for FlashCopy 0 0 2xF 14 x F 49 x F
Thus, to calculate the average I/O per VDisk before overloading the MDG, use this formula: I/O rate = (I/O Capability) / (No vdisks + Weighting Factor) So, using the example MDG as defined previously, if we added 20 VDisks to the MDG, and that was able to sustain 5,250 IOPS, and there were two FlashCopy mappings that also have random Reads and Writes, the maximum I/O per VDisks is: 5250 / ( 20 + 28 ) = 110 Note that this is an average I/O rate, so if half of the VDisks sustain 200 I/Os and the other half of the VDisks sustain 10 I/Os, the average is still 110 IOPS.
140
Conclusion
As you can see from the previous examples, TPC is a very useful and powerful tool for analyzing and solving performance problems. If you want a single parameter to monitor to gain an overview of your systems performance, it is the read and write response times for both VDisks and MDisks. This parameter shows everything that you need in one view. It is the key day-to-day performance validation metric. It is relatively easy to notice that a system that usually had 2 ms writes and 6 ms reads suddenly has 10 ms writes and 12 ms reads and is getting overloaded. A general monthly check of CPU usage will show you how the system is growing over time and highlight when it is time to add a new I/O Group (or cluster). In addition, there are useful rules for OLTP-type workloads, such as the maximum I/O rates for back-end storage arrays, but for batch workloads, it really is a case of it depends.
Chapter 8. VDisks
141
142
Chapter 9.
Copy services
In this chapter, we discuss: Measuring load: By node I/O Group loading Moving Virtual Disks (VDisks): When to move VDisks to another I/O Group How to move them
Node failure impact and degradation (avoid /) Measuring load between clusters Configuration considerations
143
9.1.2 Using both Metro Mirror and Global Mirror between two clusters
In an SVC cluster pair relationship, Metro and Global mirror functions can be performed in either direction using either service. For example, a source VDisk in cluster A can perform Metro Mirror to a target VDisk in Cluster B at the same time that a source VDisk in cluster B performs Global Mirror to a target VDisk in cluster A. The management of the copy service relationships is always performed in the cluster where the source VDisk exists. However, you must consider the performance implications of this configuration, because write data from all mirroring relationships will be transported over the same inter-cluster links. Metro Mirror and Global Mirror respond differently to a heavily loaded, poorly performing link. Metro Mirror will usually maintain the relationships in a copying or synchronized state, meaning that primary host applications will start to see poor performance (as a result of the synchronous mirroring being used). Global Mirror, however, offers a higher level of write performance to primary host applications. With a well-performing link, writes are completed asynchronously. If link performance becomes unacceptable, the link tolerance feature automatically stops Global Mirror relationships to ensure that performance for application hosts remains within reasonable limits. Therefore, with active Metro Mirror and Global Mirror relationships between the same two clusters, Global Mirror writes might suffer degraded performance, if Metro Mirror relationships consume most of the inter-cluster links capability. If this degradation reaches a level where hosts writing to Global Mirror experience extended response times, the Global Mirror relationships can be stopped when the link tolerance threshold is exceeded. If this situation happens, refer to 9.4.5, Diagnosing and fixing 1920 errors on page 163.
144
using a combination of SVC copy services with some image mode cache-disabled VDisks
In Figure 9-1, the Primary Site uses SVC copy services (Global or Metro Mirror) to the secondary site. Thus, in the event of a disaster at the primary site, the storage administrator enables access to the target VDisk (from the secondary site), and the business application continues processing. While the business continues processing at the secondary site, the storage controller copy services replicate to the tertiary site. In Figure 9-1, the SVC copy services can control the replication between the primary site and the secondary site, or the secondary site and the tertiary site. Where storage controller copy services are used, the VDisks must be image mode Cache-Disabled VDisks. Where the SVC owns the copy service functions and there are no storage controller copy service functions utilized (underneath) for the same VDisk, then the VDisks can be striped or sequential with cache enabled or disabled.
If that LUN was a Managed Disk (MDisk) in an MDisk group (MDG) with striped or sequential VDisks on it, the accident might cascade up and bring the MDG offline. This situation, in turn, makes all the VDisks that belong to that group offline. When defining LUNs in point-in-time copy or a remote mirror relationship, double-check that the SVC does not have visibility to the LUN (mask it so that no SVC node can see it), or if the SVC must see the LUN, ensure that it is an unmanaged MDisk. The storage controller might, as part of its Advanced Copy Services function, take a LUN offline or suspend reads or writes. The SVC does not understand why this happens; therefore, the SVC might log errors as these events occur. If you mask target LUNs to the SVC and rename your MDisks as you discover them and if the Advanced Copy Services function prohibits access to the LUN as part of its processing, the MDisk might be discarded and rediscovered with an SVC-assigned MDisk name.
146
Table 9-1 Copy service limits Description Remote copy relationships per cluster Remote copy consistency groups per cluster Capacity of VDisk in a remote copy relationship per I/O Group FlashCopy targets with the same source VDisk (Multiple target FlashCopy) FlashCopy mappings per cluster. (An SVC cluster can manage 4096 VDisks. This is calculated by 240 source VDisks with 16 mappings each, with one more with only 15 mappings, totalling 3855) FlashCopy consistency groups per cluster Capacity of VDisks in a FlashCopy relationship FlashCopy mappings in a consistency group Limit 1024 256 40 TB (This applies to the target VDisk, not the source) 16 3855
128 40 TB 512
FlashCopy uses an internal bitmap to keep track of changes that need to be applied to the target. The maximum size of the bitmap currently limits the SVC to supporting up to 40 TB of source VDisks in one I/O Group to be flash copied up to another 40 TB of target VDisks anywhere else in the cluster. This means that a four I/O Group cluster can support up to 160 TB of FlashCopy sources up to 160 TB of FlashCopy targets anywhere else in the cluster. There can be instances where the I/O Group FlashCopy bitmap table might limit the maximum capacity of flash copies to less than 40 TB. Internally, all maps assume a VDisk source size rounded up to an 8 GB boundary. Thus, a 24.1 GB VDisk will occupy the same mapping space as a 32 GB VDisk. In this configuration, 512 FlashCopy mappings use all available bitmap space for that I/O Group, which is less than 40 TB in total. Metro Mirror and Global Mirrors 40 TB limit includes both source and target VDisks per I/O Group. The 40 TB limit can be split into any ratio between source and target VDisks. Like FlashCopy, VDisks are rounded up to an 8 GB boundary, and therefore, there can be less than 40 TB before the limit is reached.
it is pointless if the operating system, or more importantly, the application, cannot use the copied disk.
Data stored to a disk from an application normally goes through these steps: 1. The application records the data using its defined application programming. Some applications might first store their data in application memory before sending it to disk at a later time. Normally, subsequent reads of the block just being written will get the block in memory if it is still there.
Chapter 9. Copy services
147
2. The application sends the data to a file. The file system accepting the data might buffer it in memory for a period of time. 3. The file system will send the I/O to a disk controller after a defined period of time (or even based on an event). 4. The disk controller might cache its write in memory before sending the data onto the physical drive. If the SVC is the disk controller, it will store the write in its internal cache before then sending the I/O onto the real disk controller. 5. The data is stored on the drive. At any point in time, there might be any number of unwritten blocks of data in any of these steps, waiting to go to the next step. It is also important to realize that sometimes the order of the data blocks created in step 1 might not be the same order that is used when sending the blocks to steps 2, 3, or 4. So it is possible, that at any point in time, data arriving in step 4 might be missing a vital component that has not yet been sent from step 1, 2, or 3. FlashCopy copies are normally created with data that is visible from step 4. So, to maintain application integrity, when a FlashCopy is created, any I/O that is generated in step one, must make it to step 4 when the FlashCopy is started. In other words, there must not be any outstanding write I/Os in steps 1, 2 or 3. If there were, the copy of the disk that is created at step 4 is likely to be missing those transactions, and if the FlashCopy is to be used, this missing I/Os may make it unusable.
IBM_2145:ITSOCL1:admin>svcinfo lsvdisk id name IO_group_id mdisk_grp_id mdisk_grp_name capacity FC_name RC_id RC_name fc_map_count 19 VDISK-Image 0 0 MDG-1 5.0GB image 60050768018101BF2800000000000029 0
148
io_grp0
online
Figure 9-2 Using the master console to see the type of VDisks
VDisk 19, however, is an image mode VDisk, so you need to know its exact size in bytes. In Example 9-2, you use the -bytes parameter of the svcinfo lsvdisk command to find its exact size. Thus, the target VDisk must be created with a size of 5368709120 bytes, not 5 GB. Figure 9-3 on page 150 shows the exact size of an image mode VDisk using the SVC GUI.
Example 9-2 Find the exact size of an image mode VDisk using the command line interface
IBM_2145:ITSOCL1:admin>svcinfo lsvdisk -bytes 19 id 19 name VDISK-Image IO_group_id 0 IO_group_name io_grp0 status online mdisk_grp_id 0 mdisk_grp_name MDG-1 capacity 5368709120 type image formatted no mdisk_id 1 mdisk_name mdisk1 FC_id FC_name RC_id RC_name vdisk_UID 60050768018101BF2800000000000029 throttling 0 preferred_node_id 5 fast_write_state empty cache readwrite udid 0 fc_map_count 0
149
Figure 9-3 Find the exact size of an image mode VDisk using the SVC GUI
Figure 9-3 shows how to find out the exact size of the image mode VDisk, when you click on the VDisk name from the panel shown in Figure 9-2 on page 149. 3. Create a target VDisk of the required size as identified by the source above. The target VDisk can be either an image, sequential, or striped mode VDisk; the only requirement is that it must be exactly the same size as the source. The target VDisk can be cache-enabled or cache-disabled. 4. Define a FlashCopy mapping, making sure that you have the source and target disks defined in the correct order. (If you use your newly created VDisk as a source and the existing hosts VDisk as the target, you will destroy the data on the VDisk if you start the FlashCopy.) As part of the define step, you can specify the copy rate from 0 to 100. The copy rate will determine how quickly the SVC will copy the source VDisk to the target VDisk. Setting the copy rate to 0 (NOCOPY), the SVC will only copy blocks that have changed since the mapping was started on the source VDisk or the target VDisk (if the target VDisk is mounted, read write to a host). 5. Prepare the FlashCopy mapping. This prepare process can take several minutes to complete, because it forces the SVC to flush any outstanding write I/Os belonging to the source VDisks to the storage controllers disks. After the prepare completes, the mapping has a Prepared status and the source VDisk behaves as though it was a cache-disabled VDisk until the FlashCopy mapping is either started or deleted.
150
Note: If you create a FlashCopy mapping, where the source VDisk is a target VDisk of an active Metro Mirror relationship, this adds additional latency to that existing Metro Mirror relationship (and possibly affects the host that is using the source VDisk of that Metro Mirror relationship as a result). The reason for the additional latency is that the FlashCopy prepare disables the cache on the source VDisk (which is the target VDisk of the Metro Mirror relationship), and thus, all write I/Os from the Metro Mirror relationship need to commit to the storage controller before the complete is returned to the host. 6. After the FlashCopy mapping is prepared, you can then quiesce the host by forcing the host and the application to stop I/Os and flush any outstanding write I/Os to disk. This process will be different for each application and for each operating system. One guaranteed way to quiesce the host is to stop the application and unmount the VDisk from the host. 7. As soon as the host completes its flushing, you can the start the FlashCopy mapping. The FlashCopy starts very quickly (at most, a few seconds). 8. When the FlashCopy mapping has started, you can the unquiesce your application (or mount the volume and start the application), at which point the cache is re-enabled for the source VDisks. The FlashCopy continues to run in the background and ensures that the target VDisk is an exact copy of the source VDisk when the FlashCopy mapping was started. Steps 1 on page 148 through 5 on page 150 can be performed while the host that owns the source VDisk performs its typical daily activities (that is, no downtime). While step 5 on page 150 is running, which can last several minutes, there might be a delay in I/O throughput, because the cache on the VDisk is temporarily disabled. Step 6 must be performed when the application is down. However, these steps complete quickly and application downtime is minimal. The target FlashCopy VDisk can now be assigned to another host and it can be used for read or write, even though the FlashCopy process has not completed. Note: If you intend to use the target VDisk on the same host as the source VDisk at the same time that the source VDisk is visible to that host, you might need to perform additional preparation steps to enable the host to access VDisks that are identical.
151
Thus, when performing a snap copy of the exchange environment, all three disks need to be flashed at exactly the same time, so that if they were used during a recovery, no one information store has more recent data on it than another information store. A UNIX relational database has several VDisks to hold different parts of the relational database. For example, two VDisks are used to hold two distinct tables, and a third VDisk holds the relational database transaction logs. Again, when a snap copy of the relational database environment is taken, all three disks need to be in sync. That way, when they are used in a recovery, the relational database is not missing any transactions that might have occurred if each VDisk was flashcopied independently. Here are the steps to ensure that data integrity is preserved when VDisks are related to each other: 1. Your host is currently writing to the VDisks as part of its daily activities. These VDisks will become the source VDisks in our FlashCopy mappings. 2. Identify the size and type (image, sequential, or striped) of each source VDisk. If any of the source VDisks is an image mode VDisk, you will need to know its size in bytes. If any are sequential or striped mode VDisks, their size as reported by the SVC master console or SVC command line will be sufficient. 3. Create a target VDisk of the required size for each source identified in the previous step. The target VDisk can be either an image, sequential, or striped mode VDisk; the only requirement is that they must be exactly the same size as their source. The target VDisk can be cache-enabled or cache-disabled. 4. Define a FlashCopy Consistency Group. This Consistency Group will be linked to each FlashCopy mapping that you have defined, so that data integrity is preserved between each VDisk. 5. Define a FlashCopy mapping for each source VDisk, making sure that you have the source and target disks defined in the correct order. (If you use any of your newly created VDisks as a source and the existing hosts VDisk as the target, you will destroy the data on the VDisk if you start the FlashCopy.) When defining the mapping, make sure that you link this mapping to the FlashCopy Consistency Group that you defined in the previous step. As part of the define step, you can specify the copy rate from 0 to 100. The copy rate will determine how quickly the SVC will copy the source VDisks to the target VDisks. Setting the copy rate to 0 (NOCOPY), the SVC will only copy blocks that have changed on any VDisk since the Consistency Group was started on the source VDisk or the target VDisk (if the target VDisk is mounted read/write to a host). 6. Prepare the FlashCopy Consistency Group. This prepare process can take several minutes to complete, because it forces the SVC to flush any outstanding write I/Os belonging to the VDisks in the Consistency Group to the storage controllers disks. After the prepare completes, the Consistency Group has a Prepared status and all source VDisks behave as though they were cache-disabled VDisks until the Consistency Group is either started or deleted.
152
Note: If you create a FlashCopy mapping where the source VDisk is a target VDisk of an active Metro Mirror relationship, this adds additional latency to that existing Metro Mirror relationship (and possibly affects the host that is using the source VDisk of that Metro Mirror relationship as a result). The reason for the additional latency is that the FlashCopy Consistency Group prepare disables the cache on all source VDisks (which might be target VDisks of a Metro Mirror relationship), and thus, all write I/Os from the Metro Mirror relationship need to commit to the storage controller before the complete is returned to the host. 7. After the Consistency Group is prepared, you can then quiesce the host by forcing the host and the application to stop I/Os and flush any outstanding write I/Os to disk. This process differs for each application and for each operating system. One guaranteed way to quiesce the host is to stop the application and unmount the VDisks from the host. 8. As soon as the host completes its flushing, you can the start the Consistency Group. The FlashCopy start completes very quickly (at most, a few seconds). 9. When the Consistency Group has started, you can then unquiesce your application (or mount the VDisks and start the application), at which point the cache is re-enabled. The FlashCopy continues to run in the background and preserves the data that existed on the VDisks when the Consistency Group was started. Steps 1 on page 152 through 6 on page 152 can be performed while the host that owns the source VDisks is performing its typical daily duties (that is, no downtime). While step 6 on page 152 is running, which can be several minutes, there might be a delay in I/O throughput, because the cache on the VDisks is temporarily disabled. Step 7 must be performed when the application is down; however, these steps complete quickly so that application downtime is minimal. The target FlashCopy VDisks can now be assigned to another host and used for read or write even though the FlashCopy processes have not completed. Note: If you intend to use any of the target VDisks on the same host as their source VDisk at the same time that the source VDisk is visible to that host, you might need to perform additional preparation steps to enable the host to access VDisks that are identical.
153
154
In Figure 9-4 on page 154, six FlashCopy mappings have been created from the same VDisk source and started one after another. Thus, mapping T0 started first, followed by T1, then T2, and so on, and T5 was the last FlashCopy mapping started. The time between starting each FlashCopy mapping is not important in this example. This source VDisk has been divided into six units, and a write to any unit (in either the source or the target VDisk) causes that not yet copied unit to be copied to the target before the write completes to the VDisk. NC (not copied) in the diagram indicates that the block has not yet been copied, while C (copied) indicates that the block has been copied from the source. Assuming that not all targets have completed yet, T5 was the latest FlashCopy mapping started. It is at the top of the dependency tree. Any writes to a block on the source VDisk that have not yet been copied to another target are written to T5. T4 then gets this block from T5, T3 gets it from T4, and so on until T0 gets the block from T1. If the next VDisk in the dependency chain does not have a block that is required, the SVC skips to the next VDisk in the chain to get the block or finally gets it from the source if no other VDisk has it. For example in Figure 9-4 on page 154, if a write went to target T1 block 5, the SVC in fact first copies this block from T4, because T2 and T3 do not have it yet. Each FlashCopy mapping can have different copy rates, so it is possible that each mapping completes at different times. In Figure 9-4 on page 154, mappings T0, T4, and T5 have completed, because they have copied all the changed blocks since the FlashCopy mapping started. T5 has completed, and because it was started last, it does not depend on any other target VDisk. Normally, T4 is dependent on T5, but it has completed also, so the FlashCopy mapping for T5 has finished. This mapping can be deleted (or is deleted automatically if that parameter was set when it was created). T4 has also completed; however, mapping T3 has some non-copied blocks and it depends on getting them from T4. T4 will show 100% and a status of copying until T3 gets all its required blocks. If mapping T4 was stopped, the SVC immediately copies blocks 4 and 5 to T3 to fulfill the dependency. Then, T4 becomes idle_or_copied. This mapping is automatically deleted if this parameter was used on the FlashCopy mapping. If mapping T4 was stopped with the -force parameter, the relationship stops as normal (because it has reached 100%) and the target is still available. T3, however, does not get its blocks from T4 and enters the stopped state, and T3s target VDisk goes offline. Mapping T3 is both dependent on T4 and being depended on by T2. This mapping is in the usual state of copying. If T3 was stopped, blocks 2 and 3 from T3 are copied to T2. T2 has its dependency changed to now become dependent on T4 (to get its remaining blocks). T3 then becomes stopped as normal. Mapping T1 depends on getting its changed blocks from T2. It is not dependent on T0 (because T0 has finished) so if this mapping is stopped, it stops immediately. Mapping T0 has completed. Because it was the first mapping started, it has no VDisk dependent on it; therefore, it enters the idle_or_copied state as soon as it finishes. In summary, when stopping FlashCopy mappings, where the source VDisk is being used in other FlashCopy mappings, remember that the stop request might in fact generate some additional I/O before the FlashCopy mapping finally stops.
155
156
To use FlashCopy to help with migration: 1. Your hosts are using the storage from either an unsupported controller or a supported controller that you plan on retiring. 2. Install the new storage into your SAN fabric and define your arrays and LUNs. Do not mask the LUNs to any host; you will mask them to the SVC later. 3. Install the SVC into your SAN fabric and create the required SAN zones for the SVC nodes and SVC to see the new storage. 4. Mask the LUNs from your new storage controller to the SVC, and use svctask detectmdisk on the SVC to discover the new LUNs as MDisks. 5. Place the MDisks into the appropriate MDG. 6. Zone the hosts to the SVC (while maintaining their current zone to their storage) so that you can discover and define the hosts to the SVC. 7. At an appropriate time, install the IBM SDD onto the hosts that will soon use the SVC for storage. If you have performed testing to ensure that the host can use both SDD and the original driver, this step can be done anytime before the next step. 8. Quiesce or shut down the hosts so that they no longer use the old storage. 9. Change the masking on the LUNs on the old storage controller so that the SVC now is the only user of the LUNs. You can change this masking one LUN at a time so that you can discover them (in the next step) one at a time and not mix any LUNs up. 10.Use svctask detectmdisk to discover the LUNs as MDisks. We recommend that you also use svctask chmdisk to rename the LUNs to something more meaningful. 11.Define a VDisk from each LUN and note its exact size (to the number of bytes) by using the svcinfo lsvdisk command. 12.Define a FlashCopy mapping and start the FlashCopy mapping for each VDisk by using the steps in Steps to making a FlashCopy VDisk with application data integrity on page 148. 13.Assign the target VDisks to the hosts and then restart your hosts. Your host sees the original data with the exception that the storage is now an IBM SVC LUN. With these steps, you have made a copy of the existing storage, and the SVC has not been configured to write to the original storage. Thus, if you encounter any problems with these steps, you can reverse everything that you have done, assign the old storage back to the host, and continue without the SVC. By using FlashCopy in this example, any incoming writes go to the new storage subsystem and any read requests that have not been copied to the new subsystem automatically come from the old subsystem (the FlashCopy source). You can alter the FlashCopy copy rate, as appropriate, to ensure that all the data is copied to the new controller. After the FlashCopy completes, you can delete the FlashCopy mappings and the source VDisks. After all the LUNs have been migrated across, you can remove the old storage controller from the SVC node zones and then, optionally, remove the old storage controller from the SAN fabric. You can also use this process if you want to migrate to a new storage controller and not keep the SVC after the migration. At step 2, make sure that you create LUNs that are the same size as the original LUNs. Then, at step 11, use image mode VDisks. When the FlashCopy mappings complete, you can shut down the hosts and map the storage directly to them, remove the SVC, and continue on the new storage controller.
Chapter 9. Copy services
157
unusually high. The peak workload must be evaluated by considering the average write workload over a period of one minute or less plus the required synchronization copy bandwidth. SVC uses some of the bandwidth for its internal SVC inter-cluster heartbeat. The amount of traffic depends on how many nodes are in each of the two clusters. Table 9-2 shows the amount of traffic, in megabits per second, generated by different sizes of clusters. These numbers represent the total traffic between the two clusters, when no I/O is taking place to mirrored VDisks. Half of the data is sent by one cluster, and half of the data is sent by the other cluster. The traffic will be divided evenly over all available inter-cluster links; therefore, if you have two redundant links, half of this traffic will be sent over each link during fault- free operation.
Table 9-2 SVC inter-cluster heartbeat traffic (Megabits per second) Local/remote cluster Two nodes Four nodes Six nodes Eight nodes Two nodes 2.6 4.0 5.4 6.7 Four nodes 4.0 5.5 7.1 8.6 Six nodes 5.4 7.1 8.8 10.5 Eight nodes 6.7 8.6 10.5 12.4
If the link between the sites is configured with redundancy so that it can tolerate single failures, the link must be sized so that the bandwidth and latency statements continue to be accurate even during single failure conditions.
159
gmlinktolerance parameter
The gmlinktolerance parameter of the remote copy partnership must be set to an appropriate value. The default value of 300 seconds (5 minutes) is appropriate for most clients. If you plan to perform SAN maintenance that might impact SVC Global Mirror relationships, you must either: Pick a maintenance window where application I/O workload is reduced for the duration of the maintenance Disable the gmlinktolerance feature or increase the gmlinktolerance value (meaning that application hosts might see extended response times from Global Mirror VDisks) Stop the Global Mirror relationships
160
If the configuration was changed to look like Figure 9-6, all Global Mirror resources for each node are used, and SVC Global Mirror operates with better performance than that shown in Figure 9-5.
161
Dedicating storage controllers to only Global and Metro Mirror VDisks Configuring the controller to guarantee sufficient quality of service for the disks used by Global and Metro Mirror Ensuring that physical disks are not shared between Global or Metro Mirror VDisks and other I/O. Verifying that MDisks within a mirror MDisk group must be similar in their characteristics (for example, RAID level, physical disk count, and disk speed)
162
This is particularly important if you have a Global/Metro Mirror relationship running (that is synchronized) and the link fails (thus, the mirror relationship suspends). When you restart the mirror relationship, the target disk will not be usable until the mirror catches up and becomes synchronized again. Depending on the amount of changes that need to be applied to the target and your bandwidth, this situation will leave you exposed without a usable target VDisk at all until the synchronization completes. To avoid this exposure, we recommend that you make a FlashCopy of the target VDisks before you restart the mirror relationship. At least this way, you will have a usable target VDisk even if it does contain old data.
163
If write throughput increases greatly (by 30% or more) when the relationships were stopped, this indicates that the application host was attempting to perform more I/O than the link can sustain. While the Global Mirror relationships are active, the overloaded link causes higher response times to the application host, which decreases the throughput that it can achieve. After the relationships have stopped, the application host sees lower response times, and you can see the true I/O workload. In this case, the link bandwidth must be increased, the application host I/O rate must be decreased, or fewer VDisks must be mirrored using Global Mirror. The storage controllers at the remote cluster are overloaded. If one or more of the MDisks on a storage controller provides poor service to the SVC cluster, this can cause a 1920 error if this prevents application I/O from proceeding at the rate required by the application host. If you have followed the specified back-end storage controller requirements, it is most likely that the error has been caused by a decrease in controller performance due to maintenance actions or a hardware failure of the controller. Use TPC to obtain the back-end write response time for each MDisk at the remote cluster. If the response time for any individual MDisk exhibits a sudden increase of 50 ms or more, or if the response time is higher than 100 ms, this indicates a problem: Check the storage controller for error conditions, such as media errors, a failed physical disk, or associated activity, such as RAID array rebuilding. If there is an error, fix the problem and restart the Global Mirror relationships. If there is no error, consider whether the secondary controller is capable of processing the required level of application host I/O. It might be possible to improve the performance of the controller by: Adding more physical disks to a RAID array Changing the RAID level of the array Changing the controllers cache settings (and checking that the cache batteries are healthy, if applicable) Changing other controller-specific configuration parameters
The storage controllers at the primary site are overloaded. Analyze the performance of the primary back-end storage using the same steps you use for the remote back-end storage. The main effect of bad performance is to limit the amount of I/O that can be performed by application hosts. Therefore, back-end storage at the primary site must be monitored regardless of Global Mirror. However, if bad performance continues for a prolonged period, it is possible that a 1920 error will occur and the Global Mirror relationships will stop. One of the SVC clusters is overloaded. Use TPC to obtain the port to local node send response time and port to local node send queue time. If the total of these statistics for either cluster is higher than 1 millisecond, this suggests that the SVC might be experiencing a very high I/O load. Also, check the SVC node CPU utilization; if this figure is in excess of 50%, this might also contribute to the problem. In either case, contact your IBM service support representative for further assistance. FlashCopy mappings are in the prepared state. If the Global Mirror target VDisks are the sources of a FlashCopy mapping, and that mapping is in the prepared state for an extended time, performance to those VDisks can be impacted, because the cache is 164
SAN Volume Controller: Best Practices and Performance Guidelines
disabled. Starting the flash copy mapping will re-enable the cache, improving the VDisks performance for Global Mirror I/O.
9.4.7 Saving bandwidth creating Metro Mirror and Global Mirror relationships
If you have a situation where you have a large source VDisk (or a large number of source VDisks) that you want to replicate to a remote site and your planning shows that the SVC mirror initial sync time will take too long (or will be too costly, if you pay for the traffic that you use), here is a method of setting up the sync using another medium (that might be less expensive). Another reason that you might want to use these steps is if you want to increase the size of the VDisks currently in a Metro Mirror or Global Mirror relationship. To do this, you must delete the current mirror relationships and redefine the mirror relationships after you have resized the VDisks. In this example, we use tape media as the source for the initial sync for the Metro Mirror or Global Mirror relationship target before using SVC to maintain the Metro Mirror or Global Mirror. This does not require downtime for the hosts using the source VDisks. Here are the steps: 1. The hosts are up and running and using their VDisks as normal. There is no Metro Mirror or Global Mirror relationship defined yet. You have identified all the VDisks that will become the source VDisks in a Metro Mirror or Global Mirror relationship. 2. You have already established the SVC cluster relationship with the target SVC. 3. Define a Metro Mirror or Global Mirror relationship for each source VDisk. When defining the relationship, ensure that you use the -sync option, which stops the SVC from performing an initial sync. Note: If you fail to use the -sync option, all of these steps are redundant, because the SVC performs a full initial sync anyway. 4. Stop each mirror relationship by using the -access option, which enables write access to the target VDisks. We will need this access later.
165
5. Make a copy of the source VDisk to the alternate media by using the dd command to copy the contents of the VDisk to tape. Another option might be using your backup tool (for example, IBM Tivoli Storage Manager) to make an image backup of the VDisk. Note: Even though the source is being modified while you are copying the image, the SVC is tracking those changes. Your image that you create might already have some of the changes and is likely to have missed some of the changes as well. When the relationship is restarted, the SVC will apply all changes that occurred since the relationship was stopped in step 4. After all the changes are applied, you will have a consistent target image. 6. Ship your media to the remote site and apply the contents to the targets of the Metro/Global mirror relationship; you can mount the Metro Mirror and Global Mirror target VDisks to a UNIX server and use the dd command to copy the contents of the tape to the target VDisk. If you used your backup tool to make an image of the VDisk, follow the instructions for your tool to restore the image to the target VDisk. Do not forget to remove the mount, if this is a temporary host. Note: It will not matter how long it takes to get your media to the remote site and perform this step. The quicker you can get it to the remote site and loaded, the quicker SVC is running and maintaining the Metro Mirror and Global Mirror. 7. Unmount the target VDisks from your host. When you start the Metro Mirror and Global Mirror relationship later, the SVC will stop write access to the VDisk while the mirror relationship is running. 8. Start your Metro Mirror and Global Mirror relationships. While the mirror relationship catches up, the target VDisk is not usable at all. As soon as it reaches Consistent Copying, your remote VDisk is ready for use in a disaster.
Time must be less than 1 ms for the primary cluster. A number in excess of this might indicate that an I/O Group is reaching its I/O throughput limit, which can limit performance. CPU Utilization Percentage CPU Utilization must be below 50%. Sum of Backend Write Response Time and Write Queue Time for Global Mirror MDisks at the remote cluster. Time needs to be less than 100 ms. A longer response time can indicate that the storage controller is overloaded. If the response time for a specific storage controller is outside of its specified operating range, this must be investigated for the same reason. Sum of Backend Write Response Time and Write Queue Time for Global Mirror MDisks at the primary cluster. Time must also be less than 100 ms. If response time is greater than this, application hosts might see extended response times if the SVCs cache becomes full. Write Data Rate for Global Mirror MDisk groups at the remote cluster. This data rate indicates the amount of data being written by Global Mirror. If this number approaches either the inter-cluster link bandwidth or the storage controller throughput limit, be aware that further increases can cause overloading of the system and monitor this number appropriately.
167
168
10
Chapter 10.
Hosts
This chapter describes best practices and monitoring for host systems attached to the San Volume Controller (SVC). A host system is an Open Systems computer that is connected to the switch through a Fibre Channel (FC) interface. The most important part of tuning, troubleshooting, and performance considerations for a host attached to an SVC will be in the host. There are three major areas of concern: Using multipathing and bandwidth (physical capability of SAN and back-end storage) Understanding how your host performs I/O and what types of I/O Utilizing measurement and test tools to determine host performance and for tuning This topic supplements the IBM System Storage SAN Volume Controller Host Attachment Users Guide Version 4.2.0, SC26-7905, at: http://www-1.ibm.com/support/docview.wss?rs=591&context=STCWGAV&context=STC7HAC&co ntext=STCWGBP&dc=DA400&q1=english&uid=ssg1S7001712&loc=en_US&cs=utf-8&lang=en
169
-8.6%
-5.0% -5.6% 1.3%
170
R/W test 50/50 R/W Miss 4K Rdm IOPS 50/50 R/W Miss 64K Rdm MBps
171
Occasionally, a very powerful host can benefit from spreading its VDisks across I/O Groups for load balancing. Our recommendation is to start with a single I/O Group and use the performance monitoring tools, such as TotalStorage Productivity Center (TPC), to determine if the host is I/O Group-limited. If additional I/O Groups are needed for the bandwidth, it is possible to use more host ports to allocate to the other I/O Group. For example, start with two HBAs zoned to one I/O Group. To add bandwidth, add two more HBAs and zone to the other I/O Group. The host object in the SVC will contain both sets of HBAs. The load can be balanced by selecting which host volumes are allocated to each VDisk. Because VDisks are allocated to only a single I/O Group, the load will then be spread across both I/O Groups based on the VDisk allocation spread.
svcinfo lsvdiskhostmap -delim : EEXCLS_HBin01 id:name:SCSI_id:host_id:host_name:wwpn:vdisk_UID 950:EEXCLS_HBin01:14:109:HDMCENTEX1N1:10000000C938CFDF:600507680191011D48000000 00000466 950:EEXCLS_HBin01:14:109:HDMCENTEX1N1:10000000C938D01F:600507680191011D48000000 00000466 950:EEXCLS_HBin01:13:110:HDMCENTEX1N2:10000000C938D65B:600507680191011D48000000 00000466 950:EEXCLS_HBin01:13:110:HDMCENTEX1N2:10000000C938D3D3:600507680191011D48000000 00000466 950:EEXCLS_HBin01:14:111:HDMCENTEX1N3:10000000C938D615:600507680191011D48000000 00000466 950:EEXCLS_HBin01:14:111:HDMCENTEX1N3:10000000C938D612:600507680191011D48000000 00000466 950:EEXCLS_HBin01:14:112:HDMCENTEX1N4:10000000C938CFBD:600507680191011D48000000 00000466 950:EEXCLS_HBin01:14:112:HDMCENTEX1N4:10000000C938CE29:600507680191011D48000000 00000466 950:EEXCLS_HBin01:14:113:HDMCENTEX1N5:10000000C92EE1D8:600507680191011D48000000 00000466 950:EEXCLS_HBin01:14:113:HDMCENTEX1N5:10000000C92EDFFE:600507680191011D48000000 00000466 If using IBM multipathing software (IBM Subsystem Device Driver (SDD) or SDDDSM), the command datapath query device shows the vdisk_UID (unique identifier) and so enables easier management of VDisks. The SDDPCM equivalent command is pcmpath query device.
IBM_2145:ITSOCL1:admin>svcinfo lshostvdiskmap senegal id name SCSI_id vdisk_id wwpn vdisk_UID 0 senegal 1 60 210000E08B89CCC2 60050768018101BF28000000000000A8 0 senegal 2 58 210000E08B89CCC2 60050768018101BF28000000000000A9 0 senegal 3 57 210000E08B89CCC2 60050768018101BF28000000000000AA 0 senegal 4 56 210000E08B89CCC2 60050768018101BF28000000000000AB 0 senegal 5 61 210000E08B89CCC2 60050768018101BF28000000000000A7 0 senegal 6 36 210000E08B89CCC2 60050768018101BF28000000000000B9 0 senegal 7 34 210000E08B89CCC2 60050768018101BF28000000000000BA
173
senegal 1 40 60050768018101BF28000000000000B5 senegal 2 50 60050768018101BF28000000000000B1 senegal 3 49 60050768018101BF28000000000000B2 senegal 4 42 60050768018101BF28000000000000B3 senegal 5 41 60050768018101BF28000000000000B4
Example 10-2 shows the datapath query device output of this Windows host. Note that the order of the two I/O Groups VDisks is reversed from the host-vdisk map. VDisk s-1-8-2 is first, followed by the rest of the LUNs from the second I/O Group, then VDisk s-0-6-4, and the rest of the LUNs from the first I/O Group. Most likely, Windows discovered the second set of LUNS first. However, the relative order within an I/O Group is maintained.
Example 10-2 datapath query device for the host VDisk map
DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B5 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1342 0 2 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 1444 0 DEV#: 1 DEVICE NAME: Disk2 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B1 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 1405 0 1 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 1387 0 3 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 0 0 DEV#: 2 DEVICE NAME: Disk3 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B2 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk3 Part0 OPEN NORMAL 1398 0 1 Scsi Port2 Bus0/Disk3 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk3 Part0 OPEN NORMAL 1407 0 3 Scsi Port3 Bus0/Disk3 Part0 OPEN NORMAL 0 0 DEV#: 3 DEVICE NAME: Disk4 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B3 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 174
SAN Volume Controller: Best Practices and Performance Guidelines
0 1 2 3
1504 0 1281 0
0 0 0 0
DEV#: 4 DEVICE NAME: Disk5 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B4 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk5 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk5 Part0 OPEN NORMAL 1399 0 2 Scsi Port3 Bus0/Disk5 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk5 Part0 OPEN NORMAL 1391 0 DEV#: 5 DEVICE NAME: Disk6 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A8 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk6 Part0 OPEN NORMAL 1400 0 1 Scsi Port2 Bus0/Disk6 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk6 Part0 OPEN NORMAL 1390 0 3 Scsi Port3 Bus0/Disk6 Part0 OPEN NORMAL 0 0 DEV#: 6 DEVICE NAME: Disk7 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A9 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk7 Part0 OPEN NORMAL 1379 0 1 Scsi Port2 Bus0/Disk7 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk7 Part0 OPEN NORMAL 1412 0 3 Scsi Port3 Bus0/Disk7 Part0 OPEN NORMAL 0 0 DEV#: 7 DEVICE NAME: Disk8 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000AA ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk8 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk8 Part0 OPEN NORMAL 1417 0 2 Scsi Port3 Bus0/Disk8 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk8 Part0 OPEN NORMAL 1381 0 DEV#: 8 DEVICE NAME: Disk9 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000AB ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk9 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk9 Part0 OPEN NORMAL 1388 0 2 Scsi Port3 Bus0/Disk9 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk9 Part0 OPEN NORMAL 1413 0 DEV#: 9 DEVICE NAME: Disk10 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A7 ============================================================================= Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk10 Part0 OPEN NORMAL 1293 0
175
1 2 3
Scsi Port2 Bus0/Disk10 Part0 Scsi Port3 Bus0/Disk10 Part0 Scsi Port3 Bus0/Disk10 Part0
0 1477 0
0 0 0
DEV#: 10 DEVICE NAME: Disk11 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B9 ============================================================================= Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk11 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk11 Part0 OPEN NORMAL 59981 0 2 Scsi Port3 Bus0/Disk11 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk11 Part0 OPEN NORMAL 60179 0 DEV#: 11 DEVICE NAME: Disk12 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000BA ============================================================================= Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk12 Part0 OPEN NORMAL 28324 0 1 Scsi Port2 Bus0/Disk12 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk12 Part0 OPEN NORMAL 27111 0 3 Scsi Port3 Bus0/Disk12 Part0 OPEN NORMAL 0 0 Sometimes, a host might discover everything correctly at initial configuration, but it does not keep up with the dynamic changes in the configuration. The scsi id is therefore very important. 10.2.4, Dynamic reconfiguration on page 179 will discuss this topic further.
176
is running on the host. The multipathing software manages the many paths that are available to the VDisk and presents a single storage device to the operating system.
Table 10-3 shows the change in throughput for the case of 16 devices and random 4 Kb read miss using the preferred node as opposed to non-preferred nodes shown in Table 10-2.
Table 10-3 16 device random 4 Kb read miss throughput (IOPS) Preferred node (owner) 105,274.3 Non-preferred node 90,292.3 Delta 14,982
177
In Table 10-4, we show the effect of using the non-preferred paths compared to the preferred paths on read performance.
Table 10-4 Random (1 TB) 4 Kb read response time (4.1 nodes, usecs) Preferred Node (Owner) 5,074 Non-Preferred Node 5,147 Delta 73
Table 10-5 shows the effect of using non-preferred nodes on write performance.
Table 10-5 Random (1 TB) 4 Kb write response time (4.2 nodes, usecs) Preferred node (owner) 5,346 Non-preferred node 5,433 Delta 87
IBM SDD, SDDDSM, and SDDPCM software recognize the preferred nodes and utilize the preferred paths.
178
Removing VDisks and then later allocating new VDisks to the host
The problem surfaces when a user removes a vdiskhostmap on the SVC during the process of removing a VDisk. After a VDisk is unmapped from the host, the device becomes unavailable and the SVC reports there is no such disk on this port. Usage of datapath query device after the removal will show a closed, offline, invalid, or dead state as shown here: Windows host: DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018201BEE000000000000041 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 CLOSE OFFLINE 0 0 1 Scsi Port3 Bus0/Disk1 Part0 CLOSE OFFLINE 263 0 AIX host: DEV#: 189 DEVICE NAME: vpath189 TYPE: 2145 POLICY: Optimized SERIAL: 600507680000009E68000000000007E6 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk1654 DEAD OFFLINE 0 0 1 fscsi0/hdisk1655 DEAD OFFLINE 2 0 2 fscsi1/hdisk1658 INVALID NORMAL 0 0 3 fscsi1/hdisk1659 INVALID NORMAL 1 0
179
The next time that a new VDisk is allocated and mapped to that host, the SCSI ID will be reused if it is allowed to set to the default value, and the host can possibly confuse the new device with the old device definition that is still left over in the device database or system memory. It is possible to get two devices that use the same identical device definitions in the device database, such as this example. Note that both vpath189 and vpath190 have the same HDisk definitions while they actually contain different device serial numbers. The path fscsi0/hdisk1654 exists in both vpaths. DEV#: 189 DEVICE NAME: vpath189 TYPE: 2145 SERIAL: 600507680000009E68000000000007E6 ============================================================================ Path# 0 1 2 3 Adapter/Hard Disk fscsi0/hdisk1654 fscsi0/hdisk1655 fscsi1/hdisk1658 fscsi1/hdisk1659 State CLOSE CLOSE CLOSE CLOSE Mode NORMAL NORMAL NORMAL NORMAL Select 0 2 0 1 Errors 0 0 0 0 POLICY: Optimized
DEV#: 190 DEVICE NAME: vpath190 TYPE: 2145 POLICY: Optimized SERIAL: 600507680000009E68000000000007F4 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk1654 OPEN NORMAL 0 0 1 fscsi0/hdisk1655 OPEN NORMAL 6336260 0 2 fscsi1/hdisk1658 OPEN NORMAL 0 0 3 fscsi1/hdisk1659 OPEN NORMAL 6326954 0 The multipathing software (SDD) recognizes that there is a new device, because at configuration time, it issues an inquiry command and reads the mode pages. However, if the user did not remove the stale configuration data, the ODM for the old HDisks and vpaths still remains and confuses the host, because the SCSI ID as opposed to the device serial number mapping has changed. You can avoid this if you remove the HDisk and vpath information from the device configuration database (rmdev -dl vpath189, rmdev -dl hdisk1654, and so forth) prior to mapping new devices to the host and running discovery. Removing the stale configuration and rebooting the host is the recommended procedure for reconfiguring the VDisks mapped to a host. Another process that might cause host confusion is expanding a VDisk. The SVC will tell a host via the scsi check condition mode parameters changed, but not all hosts are able to automatically discover the change and might confuse LUNs or continue to use the old size. Review the IBM System Storage SAN Volume Controller V4.2.0 - Software Installation and Configuration Guide, SC23-6628, for more details and supported hosts: http://www-1.ibm.com/support/docview.wss?rs=591&context=STCWGAV&context=STC7HAC&co ntext=STCWGBP&dc=DA400&q1=english&uid=ssg1S7001711&loc=en_US&cs=utf-8&lang=en
180
changes. If the stale configuration data is still known by the host, the host might continue to attempt I/O to the old I/O node targets during multipathing selection. Example 10-3 shows the Windows SDD host display prior to I/O Group migration.
Example 10-3 Windows SDD host display prior to I/O Group migration
C:\Program Files\IBM\Subsystem Device Driver>datapath query device DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A0 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1873173 0 2 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 1884768 0 DEV#: 1 DEVICE NAME: Disk2 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF280000000000009F ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 1863138 0 2 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 1839632 0 If you just quiesce the host I/O and then migrate the VDisks to the new I/O Group, you will get closed offline paths for the old I/O Group and open normal paths to the new I/O Group. However, these devices do not work correctly, and there is no way to remove the stale paths without rebooting. Note the change in the pathing in Example 10-4 for device 0 SERIAL:S60050768018101BF28000000000000A0.
Example 10-4 Windows VDISK moved to new I/O Group dynamically showing the closed offline paths
DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A0 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 CLOSED OFFLINE 0 0 1 Scsi Port2 Bus0/Disk1 Part0 CLOSED OFFLINE 1873173 0 2 Scsi Port3 Bus0/Disk1 Part0 CLOSED OFFLINE 0 0 3 Scsi Port3 Bus0/Disk1 Part0 CLOSED OFFLINE 1884768 0 4 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0 5 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 45 0 6 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 7 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 54 0 DEV#: 1 DEVICE NAME: Disk2 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF280000000000009F ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 0 0
181
1 2 3
Scsi Port2 Bus0/Disk2 Part0 Scsi Port3 Bus0/Disk2 Part0 Scsi Port3 Bus0/Disk2 Part0
1863138 0 1839632
0 0 0
To change the I/O Group, you must first flush the cache within the nodes in the current I/O Group to ensure that all data is written to disk. The SVC command line interface (CLI) guide recommends that you suspend I/O operations at the host level. The recommended way to quiesce the I/O is to take the volume groups offline, remove the saved configuration (AIX ODM) entries, such as HDisks and vpaths for those that are planned for removal, and then gracefully shut down the hosts. Migrate the VDisk to the new I/O Group and power up the host, which will discover the new I/O Group. If the stale configuration data was not removed prior to the shutdown, remove it from the stored host device databases (such as ODM if it is an AIX host) at this point. For Windows hosts, the stale registry information is normally ignored after reboot. Doing VDisk migrations in this way will prevent the problem of stale configuration issues.
Queue depth control within the host is accomplished via limits placed by the adapter resources for handling I/Os and by setting a queue depth maximum per LUN. Multipathing software also controls queue depth using different algorithms. SDD recently made an algorithm change in this area to limit queue depth individually by LUN as opposed to an overall system queue depth limitation. The host I/O will be converted to MDisk I/O as needed. The SVC submits I/O to the back-end (MDisk) storage as any host normally does. The host allows user control of the queue depth that is maintained on a disk. SVC does this internally for MDisk I/O without any user intervention. After SVC has submitted I/Os and has Q I/Os per second (IOPS) outstanding for a single MDisk (that is, it is waiting for Q I/Os to complete), it will not submit any more I/O until some I/O completes. That is, any new I/O requests for that MDisk will be queued inside SVC. The following graphs in Figure 10-1 indicate the effect on host VDisk queue depth for a simple configuration of 16 VDisks and one host.
Figure 10-1 IOPS compared to queue depth for 16 disk tests using a single host
Figure 10-2 on page 184 shows another example of queue depth sensitivity for 16 disks on a single host.
183
Figure 10-2 MB/s compared to queue depth for 16 disk tests on a single host
184
Persistent reserve refers to a set of Small Computer Systems Interface-3 (SCSI-3) standard commands and command options that provide SCSI initiators with the ability to establish, preempt, query, and reset a reservation policy with a specified target device. The functionality provided by the persistent reserve commands is a superset of the legacy reserve/release commands. The persistent reserve commands are incompatible with the legacy reserve/release mechanism, and target devices can only support reservations from either the legacy mechanism or the new mechanism. Attempting to mix persistent reserve commands with legacy reserve/release commands will result in the target device returning a reservation conflict error.
Legacy reserve and release mechanisms (SCSI-2) reserved the entire LUN (VDisk) for exclusive use down a single path, which prevents access from any other host or even access from the same host utilizing a different host adapter. The persistent reserve design establishes a method and interface through a reserve policy attribute for SCSI disks, which specifies the type of reservation (if any) that the OS device driver will establish before accessing data on the disk. Four possible values are supported for the reserve policy: No_reserve: No reservations are used on the disk. Single_path: Legacy reserve/release commands are used on the disk. PR_exclusive: Persistent reservation is used to establish exclusive host access to the disk. PR_shared: Persistent reservation is used to establish shared host access to the disk. When a device is opened (for example, when the AIX varyonvg command opens the underlying HDisks), the device driver will check the ODM for a reserve_policy and a PR_key_value and open the device appropriately. For persistent reserve, it is necessary that each host attached to the shared disk use a unique registration key value.
Clearing reserves
It is possible to accidently leave a reserve on the SVC VDisk or even the SVC MDisk during migration into the SVC or when reusing disks for another purpose. There are several tools available from the hosts to clear these reserves. The easiest tools to use are the commands lquerypr (AIX SDD host) and pcmquerypr (AIX SDDPCM host). There is also a Windows SDD/SDDDSM tool, which is menu driven. The Windows Persistent Reserve Tool is called PRTool.exe and is installed automatically when SDD or SDDDSM is installed. C:\Program Files\IBM\Subsystem Device Driver>PRTool.exe It is possible to clear SVC VDisk reserves by removing all the host-VDisk mappings when SVC code is at 4.1.0 or higher. Here is an example of how to determine if there is a reserve on a device using the AIX SDD lquerypr command on a reserved HDisk: [root@ktazp5033]/reserve-checker-> lquerypr -vVh /dev/hdisk5 connection type: fscsi0 open dev: /dev/hdisk5 Attempt to read reservation key... Attempt to read registration keys...
185
Read Keys parameter Generation : 935 Additional Length: 32 Key0 : 7702785F Key1 : 7702785F Key2 : 770378DF Key3 : 770378DF Reserve Key provided by current host = 7702785F Reserve Key on the device: 770378DF This example shows that the device is reserved by a different host. The advantage of using the vV parameters is that the full persistent reserve keys on the device are shown, as well as the errors if the command fails. An example of a failing pcmquerypr command to clear the reserve shows this: # pcmquerypr -ph /dev/hdisk232 -V connection type: fscsi0 open dev: /dev/hdisk232 couldn't open /dev/hdisk232, errno=16 Use the AIX include file errno.h to find out what the 16 indicates. This error indicates a busy condition, which can indicate a legacy reserve or a persistent reserve from another host (or this host from a different adapter). However, there are certain AIX technology levels (TLs) that have a diagnostic open issue, which prevents the pcmquerypr command from opening the device to display the status or to clear a reserve. The following hint and tip give more information about AIX TL levels that break the pcmquerypr command: http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&uid=ssg1S1003122&lo c=en_US&cs=utf-8&lang=en
10.5.1 AIX
The following topics details items specific to AIX.
186
Transaction-based settings
The following host attachment script will set the default values of attributes for the SVC HDisks: devices.fcp.disk.IBM.rte or devices.fcp.disk.IBM.mpio.rte These values can be modified but are a very good place to start. There are additionally some HBA parameters that are useful to set for higher performance or large numbers of HDisk configurations. All attribute values that are changeable can be changed using the chdev command for AIX. AIX settings, which can directly affect transaction performance, are the queue_depth HDisk attribute, and num_cmd_elem in the HBA attributes.
queue_depth
For the logical drive known as the HDisk in AIX, the setting is the attribute queue_depth: # chdev -l hdiskX -a queue_depth=Y -P In this example, X is the HDisk number, and Y is the value to which you are setting X for queue_depth. For a high transaction workload of small random transfers, try queue_depth of 25 or more, but for large sequential workloads, performance is better with shallow queue depths, such as 4.
num_cmd_elem
For the HBA settings, the attribute num_cmd_elem for the fcs device represents the number of commands that can be queued to the adapter. chdev -l fcsX -a num_cmd_elem=1024 -P The default value is 200, and the maximum value is: LP9000 adapters: 2048 LP10000 adapters: 2048 LP11000 adapters: 2048 LP7000 adapters: 1024 Best practice: For high transactions on AIX or large numbers of HDisks on the fcs adapter, we recommend that you increase num_cmd_elem to 1024 for the fcs devices being used. AIX settings which can directly affect throughput performance with large I/O block size are the lg_term_dma and max_xfer_size parameters for the fcs device.
lg_term_dma
This AIX Fibre Channel adapter attribute controls the direct memory access (DMA) memory resource that an adapter driver can use. The default value of lg_term_dma is 0x200000, and the maximum value is 0x8000000. A recommended change is to increase the value of lg_term_dma to 0x400000. If you still experience poor I/O performance after changing the value to 0x400000, you can increase the value of this attribute again. If you have a dual-port Fibre Channel adapter, the maximum value of the lg_term_dma attribute is divided between the two adapter ports. Therefore, never increase lg_term_dma to the maximum value for a
187
dual-port Fibre Channel adapter, because this will cause the configuration of the second adapter port to fail.
max_xfer_size
This AIX Fibre Channel adapter attribute controls the maximum transfer size of the Fibre Channel adapter. Its default value is 100000, and the maximum value is 1000000. You can increase this attribute to improve performance. You can change this attribute only with AIX 5.2.0 or later. Note that setting the max_xfer_size affects the size of a memory area used for data transfer by the adapter. With the default value of max_xfer_size=0x100000, the area is 16 MB in size, and for other allowable values of max_xfer_size, the memory area is 128 MB in size.
Throughput-based settings
In the throughput-based environment, you might want to decrease the queue depth setting to a smaller value than the default from the host attach. In a mixed application environment, you do not want to lower the num_cmd_elem setting, because other logical drives might need this higher value to perform. In a pure high throughput workload, this value will have no effect. Best practice: The recommended start values for high throughput sequential I/O environments are lg_term_dma = 0x400000 or 0x800000 (depending on the adapter type) and max_xfr_size = 0x200000. We recommend that you test your host with the default settings first and then make these possible tuning changes to the host parameters to verify if these suggested changes actually enhance performance for your specific host configuration and workload.
Multipathing
When the AIX operating system was first developed, multipathing was not embedded within the device drivers. Therefore, each path to an SVC VDisk was represented by an AIX HDisk. The SVC host attachment script devices.fcp.disk.ibm.rte sets up the predefined attributes within the AIX database for SVC disks, and these attributes have changed with each iteration of host attachment and AIX technology levels. Both SDD and Veritas DMP utilize the HDisks for multipathing control. The host attachment is also used for other IBM storage devices. The Host Attachment allows AIX device driver configuration methods to properly identify and configure SVC (2145), DS6000 (1750), and DS8000 (2107) LUNs:
188
http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D410&q1=host+att achment&uid=ssg1S4000106&loc=en_US&cs=utf-8&lang=en
SDD
IBM Subsystem Device Driver (SDD) multipathing software has been designed and updated consistently over the last decade and is a very mature multipathing technology. The SDD software also supports many other IBM storage types directly connected to AIX, such as the 2107. SDD algorithms for handling multipathing have also evolved. There are throttling mechanisms within SDD that controlled overall I/O bandwidth in SDD releases 1.6.1.0 and earlier. This throttling mechanism has evolved to be single vpath specific and is called qdepth_enable in later releases. SDD utilizes persistent reserve functions, placing a persistent reserve on the device in place of the legacy reserve when the volume group is varyon. However, if HACMP is installed, HACMP controls the persistent reserve usage depending on the type of varyon used. Also, the enhanced concurrent volume groups (VGs) have no reserves: varyonvg -c for enhanced concurrent and varyonvg for regular VGs that utilize the persistent reserve. Datapath commands are a very powerful method for managing the SVC storage and pathing. The output shows the LUN serial number of the SVC VDisk and which vpath and HDisk represent that SVC LUN. Datapath commands can also change the multipath selection algorithm. The default is load balance, but this is programmable. The recommended best practice when using SDD is also load balance using four paths. The datapath query device output will show a somewhat balanced number of selects on each preferred path to the SVC: DEV#: 12 DEVICE NAME: vpath12 TYPE: 2145 POLICY: Optimized SERIAL: 60050768018B810A88000000000000E0 ==================================================================== Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk55 OPEN NORMAL 1390209 0 1 fscsi0/hdisk65 OPEN NORMAL 0 0 2 fscsi0/hdisk75 OPEN NORMAL 1391852 0 3 fscsi0/hdisk85 OPEN NORMAL 0 0 We recommend that you verify the selects during normal operation are occurring on the preferred paths (use datapath query device -l). Also, verify that you have the correct connectivity.
SDDPCM
As Fibre Channel technologies matured, AIX was enhanced by adding native multipathing support called Multipath I/O (MPIO). This structure allows a manufacturer of storage to create software plug-ins for their specific storage. The IBM SVC version of this plug-in is called SDDPCM. This requires a different host attachment script called devices.fcp.disk.ibm.mpio.rte: http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D410&q1=host+att achment&uid=ssg1S4000203&loc=en_US&cs=utf-8&lang=en SDDPCM and AIX MPIO have been continually improved since their release. We recommend that you are at the latest release levels of this software. The preferred path indicator for SDDPCM will not display until after the device has been opened for the first time. This is different than SDD, which will display the preferred path immediately after being configured. SDDPCM features four types of reserve policies:
189
No_reserve policy Exclusive host access single path policy Persistent reserve exclusive host policy Persistent reserve shared host access policy The usage of the persistent reserve now depends on the HDisk attribute: reserve_policy. Change this policy to match your storage security requirements. There are three path selection algorithms: Failover Round robin Load balancing The latest SDDPCM code of 2.1.3.0 and later has improvements in failed path reclamation by health checker, a failback error recovery algorithm, Fibre Channel dynamic device tracking, and support for SAN boot device on MPIO-supported storage devices.
SDDPCM pathing
SDDPCM pcmpath commands are the best way to understand configuration information about the SVC storage allocation. The following example shows how much can be determined from this command about the connections to the SVC from this host. pcmpath query device DEV#: 0 DEVICE NAME: hdisk0 TYPE: 2145 ALGORITHM: Load Balance SERIAL: 6005076801808101400000000000037B ====================================================================== Path# Adapter/Path Name State Mode Select Errors 0 fscsi0/path0 OPEN NORMAL 155009 0 1 fscsi1/path1 OPEN NORMAL 155156 0 In this example, both paths are being used for the SVC connections. This is not the normal select counts for a properly mapped SVC, nor is this an adequate number of paths. Use the -l option on pcmpath query device to check whether these are both preferred paths. If they are, one SVC node must be missing from the host view. Using the -l option shows an asterisk on both paths, indicating a single node is visible to the host (and is the non-preferred node for this VDisk). 0* 1* 190 fscsi0/path0 fscsi1/path1 OPEN OPEN NORMAL NORMAL 9795 0 9558 0
This indicates a problem that needs to be corrected. If zoning in the switch is correct, perhaps this host was rebooted while one SVC node was missing from the fabric.
Veritas
Veritas DMP multipathing is also supported for the SVC. This requires certain AIX APARS, and the Veritas Array Support Library. It also requires a certain version of the host attachment script devices.fcp.disk.ibm.rte to recognize the 2,145 devices as HDisks rather than MPIO HDisks. In addition to the normal ODM databases that contain HDisk attributes, there are several Veritas filesets that contain configuration data: /dev/vx/dmp /dev/vx/rdmp /etc/vxX.info Storage reconfiguration of VDisks presented to an AIX host will require cleanup of the AIX HDisks and these Veritas filesets.
Details of the Virtual I/O Server-supported environments are at: http://www14.software.ibm.com/webapp/set2/sas/f/vios/home.html There are many questions answered on the following Web site for usage of the VIOS: http://www14.software.ibm.com/webapp/set2/sas/f/vios/documentation/faq.html One common question is how to migrate data into a VIO environment or how to reconfigure storage on a VIOS. This question is addressed in the previous link. Many clients ask, Can SCSI LUNs be moved between the physical and virtual environment as is? That is, given a physical SCSI device (LUN) with user data on it that resides in a SAN environment, can this device be allocated to a VIOS and then provisioned to a client partition and used by the client as is?
Chapter 10. Hosts
191
The answer is no, this function is not supported at this time. The device cannot be used as is. Virtual SCSI devices are new devices when created, and the data must be put onto them after creation. This typically requires some type of backup of the data in the physical SAN environment with a restoration of the data onto the VDisk.
For rootvg, the supported method is a mksysb and an install, or savevg and restvg for non-rootvg.
10.5.4 Windows
There are two options of multipathing drivers released for Windows 2003 Server hosts. Windows 2003 Server device driver development has concentrated on the storport.sys driver. This driver has significant interoperability differences from the older scsiport driver set. Additionally, Windows has released a native multipathing I/O option with a storage specific plug-in. SDDDSM was designed to support these newer methods of interfacing with Windows 2003 Server. In order to release new enhancements more quickly, the newer hardware architectures (64-bit EMT and so forth) are only tested on the SDDDSM code stream; therefore, only SDDDSM packages are available. The older version of SDD multipathing driver works with the scsiport drivers. This version is required for Windows Server 2000 servers, because storport.sys is not available. The SDD software is also available for Windows 2003 Server servers when the scsiport hba drivers are used.
Tunable parameters
With Windows operating systems, the queue depth settings are the responsibility of the host adapters and configured through the BIOS setting. This varies from vendor to vendor. Refer to your manufacturers instructions about how to configure your specific cards and the IBM System Storage SAN Volume Controller Host Attachment Users Guide Version 4.2.0, SC26-7905: http://www-1.ibm.com/support/docview.wss?rs=591&context=STCWGAV&context=STC7HAC&co ntext=STCWGBP&dc=DA400&q1=english&uid=ssg1S7001712&loc=en_US&cs=utf-8&lang=en Queue depth is also controlled by the Windows application program. The application program has control of how many I/O commands it will allow to be outstanding before waiting for completion.
193
For IBM FAStT FC2-133 (and QLogic-based HBAs), the queue depth is known as execution throttle, which can be set with either the QLogic SANSurfer tool or in the BIOS of the QLogic-based HBA by pressing CTL+Q during the startup process.
10.5.5 Linux
IBM has decided to transition SVC multipathing support from IBM SDD to Linux native DM-MPIO multipathing. Refer to the V4.2.0 - Recommended Software Levels for SAN Volume Controller for which versions of each Linux kernel require SDD or DM-MPIO support: http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003090#_Supported_Host_o perating_system_Lev If your kernel is not listed for support, contact your IBM marketing representative to request a Request for Price Quotation (RPQ) for your specific configuration. Linux Clustering is not supported, and Linux OS does not use the legacy reserve function. Therefore, there are no persistent reserves used in Linux. Contact IBM marketing for RPQ support if you need Linux Clustering in your specific environment.
Tunable parameters
Linux performance is influenced by HBA parameter settings and queue depth. Queue depth for Linux servers can be determined by using the formula specified in the IBM System Storage SAN Volume Controller: Software Installation and Configuration Guide, SC23-6628: http://www-1.ibm.com/support/docview.wss?uid=ssg1S7001711 Refer to the settings for each specific HBA type and general Linux OS tunable parameters in the IBM System Storage SAN Volume Controller V4.2.0 - Host Attachment Guide, SC26-7905: http://www-1.ibm.com/support/docview.wss?rs=591&context=STCWGAV&context=STC7HAC&co ntext=STCWGBP&dc=DA400&q1=english&uid=ssg1S7001712&loc=en_US&cs=utf-8&lang=en In addition to the I/O and OS parameters, Linux also has tunable file system parameters. The command tune2fs can be used to increase file system performance based on your specific configuration. The journal mode and size can be changed. Also, the directories can be indexed. Refer to the following open source document for details: http://swik.net/how-to-increase-ext3-and-reiserfs-filesystems-performance
10.5.6 Solaris
There are several options for multipathing support on Solaris hosts. You can choose between IBM SDD, Symantec/VERITAS Volume Manager, or you can use Solaris MPxIO depending on the OS levels in the latest SVC software level matrix.
194
SAN startup support and clustering support are available for Symantec/VERITAS Volume Manager, and SAN boot support is also available for MPxIO.
Solaris MPxIO
Releases of SVC code prior to 4.2 did not support load balancing of the MPxIO software. Configure your SVC host object with the type attribute set to tpgs if you want to run MPxIO on your Sun SPARC host. For example: svctask mkhost -name new_name_arg -hbawwpn wwpn_list -type tpgs Where -type specifies the type of host. Valid entries are hpux, tpgs, or generic. The tpgs option enables extra target port unit attentions. The default is generic.
195
For SDD passthrough: http://support.veritas.com/docs/281321 # pkginfo -l VRTSsanvc PKG=VRTSsanvc BASEDIR=/etc/vx NAME=Array Support Library for IBM SAN.VC with SDD. PRODNAME=VERITAS ASL for IBM SAN.VC with SDD. For native DMP: http://support.veritas.com/docs/276913 pkginfo -l VRTSsanvc PKGINST: VRTSsanvc NAME: Array Support Librarry for IBM SAN.VC in NATIVE DMP mode To check the installed Symantec/VERITAS version: showrev -p |grep vxvm To check what IBM ASLs are configured into the volume manager: vxddladm listsupport |grep -i ibm Following install of a new ASL using pkgadd, you need to either reboot or issue vxdctl enable. To list what ASLs are active, run vxddladm listsupport.
10.5.7 VMWare
Review the V4.2.0 - Recommended Software Levels for SAN Volume Controller Web site for the various ESX levels that are supported: http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003090#_VMWare Support for specific configurations for VMWare 3.01 is provided by special engineering request only. Contact your IBM marketing representative for details and the submission of an RPQ. The necessary patches and procedure to apply them will be supplied once the specific configuration is reviewed and approved.
196
10.7 Monitoring
A consistent set of monitoring tools is available when IBM SDD, SDDDSM, and SDDPCM are used for the multipathing software on the various OS environments. Examples earlier in this chapter showed how the datapath query device and datapath query adapter commands can be used for path monitoring. Path performance can also be monitored via datapath commands: datapath query devstats. (or pcmpath query devstats) This command shows performance information for a single device, all devices, or a range of devices. Example 10-5 shows the output of datapath query devstats for two devices.
Example 10-5 datapath query devstats output
C:\Program Files\IBM\Subsystem Device Driver>datapath query devstats Total Devices : 2 Device #: 0 ============= I/O: SECTOR: Transfer Size: Total Read 1755189 14168026 <= 512 271 Total Write 1749581 153842715 <= 4k 2337858 Active Read 0 0 <= 16K 104 Active Write 0 0 <= 64K 1166537 Maximum 3 256 > 64K 0 197
Device #: 1 ============= I/O: SECTOR: Transfer Size: Total Read 20353800 162956588 <= 512 296 Total Write 9883944 451987840 <= 4k 27128331 Active Read 0 0 <= 16K 215 Active Write 1 128 <= 64K 3108902 Maximum 4 256 > 64K 0
Also, an adapter level statistics command is available: datapath query adapstats (also mapped to pcmpath query adaptstats). Refer to Example 10-6 for a two adapter example.
Example 10-6 datapath query adaptstats output
C:\Program Files\IBM\Subsystem Device Driver>datapath query adaptstats Adapter #: 0 ============= I/O: SECTOR: Adapter #: 1 ============= I/O: SECTOR: Total Read 11048415 88512687 Total Write 5930291 317726325 Active Read 0 0 Active Write 1 128 Maximum 2 256 Total Read 11060574 88611927 Total Write 5936795 317987806 Active Read 0 0 Active Write 0 0 Maximum 2 256
It is possible to clear these counters so you can script the usage to cover a precise amount of time. The commands also allow you to choose devices to return as a range, single device, or all devices. The command to clear the counts is datapath clear device count.
198
There are industry standard performance benchmarking tools available. These are available by joining the Storage Performance Council. The information about how to join is available here: http://www.storageperformance.org/home These tools are available to both create stress and measure the stress that was created with a standardized tool and are highly recommended for generating stress for your test environments to compare against the industry measurements. Another recommended stress tool available is iometer for Windows and Linux hosts: http://www.iometer.org AIX System p has Wikis on performance tools and has made a set available for their users: http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/Performance+Monitoring +Tools http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/nstress Xdd is a tool for measuring and analyzing disk performance characteristics on single systems or clusters of systems. It was designed by Thomas M. Ruwart from I/O Performance, Inc. to provide consistent and reproducible performance of a sustained transfer rate of an I/O subsystem. It is a command line-based tool that grew out of the UNIX world and has been ported to run in Windows environments as well. Xdd is a free software program distributed under a GNU General Public License. Xdd is available for download at: http://www.ioperformance.com/products.htm The Xdd distribution comes with all the source code necessary to install Xdd and the companion programs for the timeserver and the gettime utility programs. DS4000 Best Practices and Performance Tuning Guide, SG24-6363-02, has detailed descriptions of how to use these measurement and test tools: http://www.redbooks.ibm.com/abstracts/sg246363.html?Open
199
200
11
Chapter 11.
Applications
This chapter provides information about laying out storage for the best performance for general applications, virtual I/O (VIO) servers, and DB2 databases specifically. While this information is directed to AIX hosts, it is also relevant to other host types.
201
202
Generally, I/O (and therefore application) performance will be best when the I/O activity is evenly spread across the entire I/O subsystem.
203
So, what are the traits of a transaction-based application? In the following sections, we explain these traits in more detail. As mentioned earlier, you can expect to see a high number of transactions and a fairly small block size. Different databases use different I/O sizes for their logs (see the following examples), these logs vary from vendor to vendor. In all cases, the logs are generally high write workloads. For table spaces, most databases use between a 4 KB and 16 KB blocksize. In some applications, larger chunks (for example, 64 KB) will be moved to host application cache memory for processing. Understanding how your application is going to handle its I/O is critical to laying out the data properly on the storage server. In many cases, the table space is generally a large file made up of small blocks of data records. The records are normally accessed using small I/Os of a random nature, which can result in about a 50% cache miss ratio. For this reason and to not waste space with unused data, plan for the SAN Volume Controller (SVC) to read and write data into cache in small chunks (use striped VDisks with smaller extent sizes). Another point to consider is whether the typical I/O is read or write. In most Online Transaction Processing (OLTP) environments, there is generally a mix of about 70% reads and 30% writes. However, the transaction logs of a database application have a much higher write ratio and, therefore, perform better in a different managed disk (MDisk) group (MDG). Also, you need to place the logs on a separate virtual disk (VDisk), which for best performance must be located on a different MDG that is defined to better support the heavy write need. Mail servers also frequently have a higher write ratio than read. Best practice: Database table spaces, journals, and logs must never be collocated on the same MDisk or MDG in order to avoid placing them on the same back-end storage logical unit number (LUN) or Redundant Array of Independent Disks (RAID) array.
204
layers, and avoid the performance problems and hot spots that come with poor data layout. Your goal is to balance I/Os evenly across the physical disks in the back-end storage devices. You can treat sequential I/O applications the same as random I/O applications unless the sequential rate is high enough to matter. We will specifically show you how to lay out storage for DB2 applications as a good example of how an application might balance its I/Os within the application. There are also different implications for the host data layout based on whether you utilize image mode or striped mode VDisks.
205
206
207
Table 11-1 Extent size as opposed to maximum storage capacity Extent size 16 MB 32 MB 64 MB 128 MB 256 MB 512 MB Maximum storage capacity of cluster 64 TB 128 TB 256 TB 512 TB 1 PB 2 PB
208
just need to know the alignment boundary that they use. Other operating systems, however, might require manual intervention to set their start point to a value that aligns them. With an SVC managing the storage for the host as striped VDisks, aligning the partitions is easier, because the extents of the VDisk are spread across the MDisks in the MDG. The storage administrator must ensure an adequate distribution. Understanding how your host-based volume manager (if used) defines and makes use of the logical drives when they are presented is also an important part of the data layout. Volume managers are generally set up to place logical drives into usage groups for their use. The volume manager then creates volumes by carving up the logical drives into partitions (sometimes referred to as slices) and then building a volume from them by either striping or concatenating them to form the desired volume size. How the partitions are selected for use and laid out can vary from system to system. In all cases, you need to ensure that spreading the partitions is done in a manner to achieve maximum I/Os available to the logical drives in the group. Generally, large volumes are built across a number of different logical drives to bring more resources to bear. You must be careful when selecting logical drives when you do this in order to not use logical drives that will compete for resources and degrade performance.
209
DB2 also evenly balances I/Os across DB2 database partitions, also known as DPARs, (these DPARs can exist on different AIX logical partitions (LPARs) or systems). The same I/O principles are applied to each DPAR separately. DB2 also has different options for containers, including: Storage Managed Space (SMS) file system directories Database Managed Space (DMS) file system files DMS raw Automatic Storage for DB2 8.2.2 DMS and SMS are DB2 acronyms for Database Managed Space and Storage Managed Space. Think of DMS containers as preallocated storage and SMS containers as dynamic storage. Note that if we use SMS file system directories, it is important to have one file system (and underlying LV) per container. That is, do not have two SMS file system directory containers in the same file system. Also, for DMS file system files, it is important to have just one file per file system (and underlying LV) per container. In other words, we have only one container per LV. The reason for these restrictions is that we do not have control of where each container resides in the LV; thus, we cannot assure that the LVs are balanced across physical disks. The simplest way to think of DB2 data layout is to assume that we are using many disks, and we create one container per disk. In general, each container has the same sustained IOPS bandwidth and resides on a set of physically independent physical disks, because each container will be accessed equally by DB2 agents. DB2 also has different types of tablespaces and storage uses. For example, tablespaces can be created separately for table data, indexes, and DB2 temporary work areas. The principles of storage design for even I/O balance among tablespace containers applies to each of these tablespace types. Furthermore, containers for different tablespace types can be shared on the same array, thus, allowing all database objects to have equal opportunity at using all I/O performance of the underlying storage subsystem and disks. Also note that different options can be used for each container type, for example, DMS file containers might be used for data tablespaces, and SMS file system directories might be used for DB2 temporary tablespace containers. DB2 connects physical storage to DB2 tables and database structures through the use of DB2 tablespaces. Collaboration between a DB2 DBA and the AIX Administrator (or storage administrator) to create the DB2 tablespace definitions can ensure that the guidance provided for the database storage design is implemented for optimal I/O performance of the storage subsystem by the DB2 database. Use of Automatic Storage bypasses LVM entirely, and here, DB2 uses disks for containers. So in this case, each disk must have similar IOPS characteristics. We will not describe this option here.
210
211
Therefore, we recommend SVC striping even when the application does its own unless you have carefully planned and tested the application and the entire environment. This approach adds a great deal more robustness to the situation. It now becomes easy to accommodate completely new databases and tablespaces with no special planning and without disrupting the balance of work. Also, the extra level of striping ensures that the load will be balanced even if the application striping fails. Perhaps most important, this recommendation lifts a significant burden from the database administrator, because good performance can be achieved with much less care and planning.
11.5 Data layout with the AIX virtual I/O (VIO) server
The purpose of this section is to describe strategies to get the best I/O performance by evenly balancing I/Os across physical disks when using the VIO Server.
11.5.1 Overview
In setting up storage at a VIO server (VIOS), a broad range of possibilities exist for creating VDisks and serving them up to VIO clients (VIOCs). The obvious consideration is to create sufficient storage for each VIOC. Less obvious but equally important is getting the best use of the storage. Performance and availability are of paramount importance. There are typically internal Small Computer System Interface (SCSI) disks (typically used for the VIOS operating system) and SAN disks. Availability for disk is usually handled via RAID on the SAN or via SCSI RAID adapters on the VIOS. We will assume here that any internal SCSI disks are used for the VIOS operating system and possibly for the VIOCs operating systems. Furthermore, we will assume that the applications are configured so that the limited I/O will occur to the internal SCSI disks on the VIOS and to the VIOCs rootvgs. If you expect your rootvg will have a significant IOPS rate, you can configure it in the same fashion as we recommend for other application VGs later.
VIOS restrictions
There are two types of VDisks that you can create on a VIOS: physical volume (PV) VSCSI HDisks and logical volume (LV) VSCSI HDisks. PV VSCSI HDisks are entire LUNs from the VIOS point of view, and if you are concerned about failure of a VIOS and have configured redundant VIOS for that reason, you must use PV VSCSI HDisks. So, PV VSCSI HDisks are entire LUNs that are VDisks from the VIOC point of view. An LV VSCSI HDisk cannot be served up from multiple VIOSs. LV VSCSI HDisks reside in LVM VGs on the VIOS and cannot span PVs in that VG, nor be striped LVs.
300 IOPS (assuming an average I/O service time of 10 ms). Thus, you need to configure a sufficient number of VSCSI HDisks to get the IOPS bandwidth needed. But, at the subject levels, queue_depth for the VIOC HDisks is configurable up to 256. When possible, set the queue depth of the VIOC HDisks to match that of the VIOS HDisk to which it maps.
213
214
12
Chapter 12.
Monitoring
The examples in this chapter were taken from TotalStorage Productivity Center (TPC) V3.3, which was released in July 2007. This chapter will not discuss how to use TPC to monitor your storage controllers, switches, and host data. In 12.1, Configuring TPC to analyze the SVC on page 216, we show you how to set up TPC to monitor your SVC environment. We assume that you already have TPC monitoring your other SAN equipment. If you have an earlier version of TPC installed, you might still be able to reproduce the reports described here.
215
2. When you click Save, TPC will validate the information that you have provided by testing the connection to the CIMOM. If there is an error, an alert will pop up, and you must correct the error before you can save the configuration again.
216
3. After the connection has been successfully configured, TPC must run a CIMOM Discovery (under Administrative Services Discovery CIMOM) before you can set up performance monitoring or before the SVC cluster will appear in the Topology Viewer. Note: The SVC Config Node (that owns the IP address for the cluster) has a 10 session SSH limit. TPC will use one of these sessions while interacting with the SVC. You can read more information about the session limit in Connection limitations on page 35.
217
Information: When we cabled our SVC, we intended to connect ports 1 and 3 to one switch (IBM_2109_F32) and ports 2 and 4 to the other switch (swd77). We thought that we were really careful about labeling our cables and configuring our ports. TPC showed us that we did not configure the ports this way, and additionally, we made two mistakes. Figure 12-2 shows that we: Correctly configured all four nodes with port 1 to switch IBM_2109_F32 Correctly configured all four nodes with port 2 to switch swd77 Incorrectly configured two nodes with port 3 to switch swd77 Incorrectly configured two nodes with port 4 to switch IBM_2109_F32
Figure 12-2 Checking the SVC ports to ensure they are connected to the SAN fabric
TPC can also show us where our host and storage are in our fabric and which switches the I/Os will go through when I/Os are generated from the host to the SVC or from the SVC to the storage controller. For redundancy, all storage controllers must be connected to at least two fabrics, and those same fabrics need to be the ones to which the SVC is connected. Figure 12-3 on page 219 shows our DS4500 is also connected to fabrics FABRIC-2GBS and FABRIC-4GBS as we planned. Information: Our DS4500 was shared with other users, so we were only able to use two of the available four ports. The other two ports were used by a different SAN infrastructure.
218
Figure 12-2 on page 218 shows an example where all the TPC ports are connected and the switch ports are healthy. Figure 12-4 on page 220 shows an example where the SVC ports are not healthy. In this example, the two ports that have a black line drawn between the switch and the SVC node port are in fact down. Because TPC knew to where these two ports were connected on a previous probe (and, thus, they were previously shown with a green line), the probe discovered that these ports were no longer connected, which resulted in the green line becoming a black line.
Chapter 12. Monitoring
219
If these ports had never been connected to the switch, no lines will show for them, and we will only see six of the eight ports connected to the switch.
220
Our SVC will also be used in a Metro Mirror and Global Mirror relationship with another SVC cluster. In order for this configuration to be a supported configuration, we must make sure that every SVC in this cluster is zoned so that it can see every port in the remote cluster. In each fabric, we made a zone set called SVC_MM_NODE with all the node ports for all of the SVC nodes. We can check each SVC to make sure that all its ports are in fact in this zone set. Figure 12-6 on page 222 shows that we have correctly configured all ports for SVC Cluster ITSO_CL1.
221
222
Figure 12-7 Verifying the health between two objects in the SVC
223
Figure 12-8 Kanaga has two HBAs but is only zoned into one fabric
Using the Fabric Manager component of TPC, we can quickly fix this situation. The fixed results are shown in Figure 12-9 on page 225.
224
The Data Path Viewer in TPC can also be used to check to confirm path connectivity between a disk that an operating system sees and the VDisk that the SVC provides. Figure 12-10 on page 226 shows two diagrams for the path information relating to host KANAGA: The top (left) diagram shows the path information before we fixed our zoning configuration. It confirms that KANAGA only has one path to the SVC VDisk vdisk4. Figure 12-8 on page 224 confirmed that KANAGA has two HBAs and that they are connected to our SAN fabrics. From this panel, we can deduce our problem is likely to be a zoning configuration problem. The lower (right) diagram is the result that shows the zoning fixed. Figure 12-10 on page 226 does not show us that you can hover over each component to also get health and performance information, which might also be useful when you perform problem determination and analysis.
225
226
227
228
When starting to analyze the performance of the SVC environment to identify a performance problem, we recommend that you identify all of the components between the two systems and verify the performance of the smaller components. Thus, traffic between a host, the SVC nodes, and a storage controller goes through these paths: 1. The host generates the I/O and transmits it on the fabric. 2. The I/O is received on the SVC node ports. 3. If the I/O is a write I/O: a. The SVC node writes the I/O to the SVC node cache. b. The SVC node sends a copy to its partner node to write to the partner nodes cache. c. If the I/O is part of a Metro Mirror and Global Mirror, a copy needs to go to the target VDisk of the relationship. d. If the I/O is part of a FlashCopy and the FlashCopy block has not been copied to the target VDisk, this action needs to be scheduled. 4. If the I/O is a read I/O: a. The SVC needs to check the cache to see if the Read I/O is already there. b. If the I/O is not in the cache, the SVC needs to read the data from the physical LUNs. 5. At some point, write I/Os will be sent to the storage controller. 6. The SVC might also do some read ahead I/Os to load the cache in case the next read I/O from the host is the next block. TPC can help you report on most of these steps so that it is easier to identify where a bottleneck might exist.
229
An important metric in this report is the CPU utilization (in dark blue). The CPU Utilization reports give you an indication of how busy the cluster CPUs are. A continual high CPU Utilization rate indicates a busy cluster. If the CPU utilization remains constantly high, it might be time to increase the cluster by adding more resources. You can add cluster resources by adding another I/O Group to the cluster (two nodes) up to the maximum of four I/O Groups per cluster. After there are four I/O Groups in a cluster and high CPU utilization is still indicated in the reports, it is time to build a new cluster and consider either migrating some storage to the new cluster or servicing new storage requests from it. We recommend that you plan additional resources for the cluster if your CPU utilization indicates workload continually above 70%. The cache memory resource reports provide an understanding of the utilization of the SVC cache. These reports provide you with an indication of whether the cache is able to service and buffer the current workload. In Figure 12-11, you will notice that there is an increase in the Write-cache Delay Percentage and Write-cache Flush Through Percentage and a drop in the Write-cache Hits Percentage, Read Cache Hits, and Read-ahead percentage of cache hits. This change is noted about halfway through the graph. This change in these performance metrics together with an increase in back-end response time shows that the storage controller is heavily burdened with I/O, and at this time interval, the SVC cache is probably full of outstanding write I/Os. (We expected this result with our test run.) Host I/O activity will now be impacted with the backlog of data in the SVC cache and with any other SVC workload that is going on to the same MDisks (FlashCopy and Global/Metro Mirror).
230
If cache utilization is a problem, you can add additional cache to the cluster by adding an I/O Group and moving VDisks to the new I/O Group.
Figure 12-12 Port receive and send data rate for each I/O Group
Figure 12-12 and Figure 12-13 on page 232 show two versions of port rate reports. Figure 12-12 shows the overall SVC node port rates for send and receive traffic. With a 2 Gb per second fabric, these rates are well below the throughput capability of this fabric, and thus the fabric is not a bottleneck here. Figure 12-13 on page 232 shows the port traffic broken down into host, node, and disk traffic. During our busy time as reported in Figure 12-11 on page 230, we can see that host port traffic drops while disk port traffic continues. This indicates that the SVC is communicating with the storage controller, possibly flushing outstanding I/O write data in the cache and performing other non-host functions, such as FlashCopy and Metro Mirror and Global Mirror copy synchronization.
231
Figure 12-13 Total port to disk, host, and local node report
Figure 12-14 on page 233 shows an example TPC report looking at port rates between SVC nodes, hosts and disk storage controllers. This report shows low queue and response times, indicating that the nodes do not have a problem communicating with each other. If this report showed usually high queue and response times, our write activity (because each node communicates to each other node over the fabric) is affected. Unusually high numbers in this report indicate: SVC node or port problem (unlikely) Fabric switch congestion (more likely) Faulty fabric ports or cables (most likely)
232
Figure 12-14 Port to local node send and receive response and queue times
233
In Figure 12-15 on page 233, we see an unusual spike in back-end response time for both read and write operations, and this spike is consistent for both our I/O Groups. This report confirms that we are receiving poor response from our storage controller and explains our lower than expected host performance. Our cache resource reports (in Figure 12-11 on page 230) also show an unusual pattern in cache usage during the same time interval. Thus, we can attribute the cache performance to be a result of the poor back-end response time that the SVC is receiving from the storage controller. Here is a summary of the available cluster reports in TPC 3.3: Overall Data Rates and I/O Rates Backend I/O Rates and Data Rates Response Time and Backend Response Time Transfer Size and Backend Transfer Size Disk to Cache Transfer Rate Queue Time Overall Cache Hit Rates and Write Cache Delay Readahead and Dirty Write cache Write cache overflow, flush-through, and write-through Port Data Rates and I/O Rates CPU Utilization Data Rates, I/O Rates, Response Time, and Queue Time for: Port to Host Port to Disk Port to Local Node Port to Remote Node
234
235
Several useful alert events that you should set include: CPU utilization threshold The CPU utilization report alerts you when your SVC nodes become too busy. If this alert is generated too often, it might be time to upgrade your cluster with additional resources. Overall port response time threshold The port response time alert can let you know when the SAN fabric is becoming a bottleneck. If the response times are consistently poor, perform additional analysis of your SAN fabric. Overall back-end response time threshold An increase in back-end response time might indicate that you are overloading your back-end storage.
236
13
Chapter 13.
Maintenance
As with any piece of enterprise storage equipment, the IBM SAN Volume Controller (SVC) is not a completely hands-off device. It requires configuration changes to meet growing needs, updates to software for enhanced performance, features, and reliability, and the tracking of all the data that you used to configure your SVC.
237
13.1.1 SAN
Tracking how your SAN is configured is going to be pretty important.
SAN diagram
The most basic piece of SAN documentation is the SAN diagram. If you ever call support asking for help with your SAN, you can be sure that the SAN diagram is likely to be one of the first things that you are asked to produce. Maintaining a proper SAN diagram is not as difficult as it sounds. It is not necessary for the diagram to show every last host and the location of every last port; this information is more properly collected (and easier to read) in other places. To understand how difficult an overly detailed diagram is to read, refer to Figure 13-1 on page 239.
238
Instead, a SAN diagram needs to only include every switch, every storage device, all inter-switch links (ISLs), along with how many there are, and some representation of which switches have hosts connected to them. An example is shown in Figure 13-2 on page 240. In larger SANs with many storage devices, the diagram can still be too large to print without a large-format printer, but it can still be viewed on a panel using the zoom feature. We suggest a tool, such as Microsoft Visio, to create your diagrams. Do not worry about finding fancy stencils or official shapes, because your diagram does not need to show exactly into which port everything is plugged. You can use your port inventory for that. Your diagram can be appropriately simple. You will notice that our sample diagram just uses simple geometric shapes and standard stencils to represent a SAN. Note: These SAN diagrams are just sample diagrams. They do not necessarily depict a SAN that you actually want to deploy.
239
DS8k 12345
DS4300 8K4698
DS8k 45678
DS8k 93782
Switch 1, 2
Switch 3, 4
Switch 5, 6
2 2 2 2 2
Switch 9, 10
Switch 17, 18
Switch 11, 12
Switch 19, 20
Switch 13, 14
Switch 21, 22
Notice that in our simplified diagram, individual hosts do not appear; instead, we merely note which switches have connections to hosts. Also, because typically SANs are symmetrical in most installations, one diagram will suffice for both. The numbers inside the switch boxes denote the Domain IDs.
Port inventory
Along with the SAN diagram, an inventory of what is supposed to be plugged in where is also quite important. Again, you can create this inventory manually or generate it with automated tools. Before using automated tools, remember that it is important that your inventory contains not just what is currently plugged into the SAN, but also what is supposed to be attached to the SAN. If a server has lost its SAN connection, merely looking at the current status of the SAN will not tell you where it was supposed to be attached. This inventory must exist in a format that can be exported and sent to someone else and retained in an archive for long-term tracking. 240
SAN Volume Controller: Best Practices and Performance Guidelines
The list, spreadsheet, database, or automated tool needs to contain the following information for each port in the SAN: The name of the attached device and whether it is a storage device, host, or another switch The port on the device to which the switch port is attached, for example, Host Slot 6 for a host connection or Switch Port 126 for an ISL The speed of the port If the port is not an ISL, list the attached worldwide port name (WWPN) For host ports or SVC ports, the destination aliases to which the host is zoned Automated tools, obviously, can do a decent job of keeping this inventory up-to-date, but even with a fairly large SAN, a simple database, combined with standard operating procedures, can be equally effective. For smaller SANs, spreadsheets are a time-honored and simple method of record keeping.
Zoning
While you need snapshots of your zoning configuration, you do not really need a separate spreadsheet or database just to keep track of your zones. If you lose your zoning configuration, you can rebuild the SVC parts from your zoning snapshot, and the host zones can be rebuilt from your port inventory.
13.1.2 SVC
For the SVC, there are several important components that you need to document.
241
13.1.3 Storage
Actually, for the LUNs themselves, you do not need to track anything outside of what is already in your configuration documentation for the MDisks, unless the disk array is also used for direct-attached hosts.
242
These snapshots can include: supportShow output from Brocade switches show tech details from Cisco switches Data Collections from EFCM-equipped McData switches. EFCM will also be the future admin tool for Brocade switches. SVC Config Dumps DS4x00 Subsystem Profiles DS8x00 LUN Inventory commands: lsfbvol lshostconnect lsarray lsrank lsioports lsvolgrp Obviously, you do not need to pull DS4x00 profiles if the only thing you are modifying is SAN zoning.
Abstract: Request__ABC456__ : Add new server __XYZ123__ to the SAN and allocate __200GB__ from SVC Cluster __1__ Date of Implementation: __04/01/2007__ Implementing Storage Administrator: Peter Mescher (x1234) Server Administrator: Jon Tate (x5678) Impact: None. This is a non-disruptive change. Risk: Low. Time estimate: __30 minutes__
243
Backout Plan: Reverse changes Implementation Checklist: 1. ___ Verify (via phone or e-mail) that the server administrator has installed all code levels listed on the intranet site http://w3.itsoelectronics.com/storage_server_code.html 2. ___ Verify that the cabling change request, __CAB927__ has been completed. 3. ___ For each HBA in the server, update the switch configuration spreadsheet with the new server using the information below. To decide on which SVC Cluster to use: All new servers must be allocated to SVC cluster 2, unless otherwise indicated by the Storage Architect. To decide which I/O Group to Use: These must roughly be evenly distributed. Note: If this is a high-bandwidth host, the Storage Architect may give a specific I/O Group assignment, which should be noted in the abstract. To select which Node Ports to Use: If the last digit of the first WWPN is odd (in hexidecimal, B, D, and F are also odd) use ports 1 and 3; if even, 2 and 4. HBA A: Switch: __McD_1__ Port: __47__ WWPN: __00:11:22:33:44:55:66:77__ Port Name:__XYZ123_A__ Host Slot/Port: __5__ Targets: __SVC 1, IOGroup 2, Node Ports 1__ HBA B: Switch: __McD_2__ Port: __47__ WWPN: __00:11:22:33:44:55:66:88__ Port Name:__XYZ123_B__ Host Slot/Port: __6__ Targets: __SVC 1, IOGroup 2, Node Ports 4__ 4. ___ Log in to EFCM and modify the Nicknames for the new ports (using the information above). 5. ___ Collect Data Collections from both switches and attach them to this ticket with the filenames of ticket_number>_<switch name_old.zip
6. ___ Add new zones to the zoning configuration using the standard naming convention and the information above. 7. ___ Collect Data Collections from both switches again and attach them with the filenames of <ticket_number>_<switch name>_new.zip 8. Log on to the SVC Console for Cluster __2__ and: ___ Obtain a config dump and attach it to this ticket under the filename <ticket_number>_<cluster_name>_old.zip ___ Add the new host definition to the SVC using the information above and setting the host type to __Generic__ Do not type in the WWPN. If it does not appear in
244
the drop-down list, cancel the operation and retry. If it still does not appear, check zoning and perform other troubleshooting as necessary. ___ Create new VDisk(s) with the following parameters: To decide on the MDiskGroup: For current requests (as of 1/1/07) use ESS4_Group_5, assuming that it has sufficient free space. If it does not have sufficient free space, inform the storage architect prior to submitting this change ticket and request an update to these procedures. Use Striped (instead of Sequential) VDisks for all requests, unless otherwise noted in the abstract. Name: __XYZ123_1__ Size: __200GB__ IO Group: __2__ MDisk Group: __ESS 4_Group_5__ Mode: __Striped__ 9. ___ Map the new VDisk to the Host 10.___ Obtain a config dump and attach it to this ticket under <ticket_number>_<cluster_name>_new.zip 11.___ Update the SVC Configuration spreadsheet using the above information, and the following supplemental data: Request: __ABC456__Project: __Foo__ 12.Also update the entry for the remaining free space in the MDiskGroup with the information pulled from the SVC console. 13.___ Call the Server Administrator in the ticket header and request storage discovery. Ask them to obtain a pathcount to the new disk(s). If it is not 4, perform necessary troubleshooting as to why there are an incorrect number of paths. 14.___ Request that the storage admin confirm R/W connectivity to the paths. 15.Make notes on anything unusual in the implementation here: ____
Note that the example checklist does not contain pages upon pages of screen captures or click Option A, select Option 7.... Instead, it assumes that the user of the checklist understands the basic operational steps for the environment. After the change is over, the entire checklist, along with the configuration snapshots needs to be stored in a safe place, not the SVC or any other SAN-attached location. Even non-routine changes, such as migration projects, need to use detailed checklists to help the implementation go smoothly and provide an easy-to-read record of what was done. Writing a one-use checklist might seem horribly inefficient, but if you have to review the process for a complex project a few weeks after implementation, you might discover that your memory of exactly what was done is not as good as you thought. Also, complex, one-off projects are actually more likely to have steps skipped, because they are not routine.
245
246
The exception to this rule is if you discovered that some part of your SAN is accidentally running ancient code, such as a server running a three year old copy of IBM Subsystem Device Driver (SDD).
247
From here, we can see that fscsi0 has the adapter ID of 3 in SDD. We will use this ID when taking the adapter offline prior to maintenance. Note how the SDD ID was 3 even though the adapter had been assigned the device name fscsi0 by the OS.
248
AIX
In AIX without the SDDPCM, if you do not properly deal with a destination FCID change, running cfgmgr will create brand-new hdisk devices, all of your old paths will go into a defined state, and you will have some difficulty removing them from your ODM database. There are two ways of preventing this issue in AIX.
Dynamic Tracking
This is an AIX feature present in AIX 5.2 Technology Level (TL) 1 and later. It causes AIX to bind HDisks to the WWPN instead of the destination FCID. However, this feature is not enabled by default, has extensive prerequisite requirements, and is disruptive to enable. For these reasons, we do not recommend that you rely on this feature to aid in scheduled changes. The alternate procedure is not particularly difficult, but if you are still interested in Dynamic Tracking, refer to the IBM System Storage Multipath Subsystem Device Driver Users Guide, SC30-4096, for full details. If you do choose to use Dynamic Tracking, we strongly recommend that you be at the latest available TL. If Dynamic Tracking is enabled, no special procedures are necessary to change the FCID.
249
3. If there are hosts attached to the switch, gracefully take the paths offline. In SDD, the appropriate command is datapath set adapter X offline where X is the adapter number. While technically this is not necessary, it is nevertheless a good idea. Follow the procedure in 13.5.1, Cross-referencing the SDD adapter number with the WWPN on page 248 for details. 4. Power off the old switch. Note that the SVC will log all sorts of error messages when you power off the old switch. Perform at least a spot-check of your hosts to make sure that your access to disk still works. 5. Remove the old switch, put in the new switch, and power it up; do not attach any of the Fibre Channel ports yet. 6. If appropriate, match the code level on the new switch with the other switches in your fabric. 7. Give the new switch the same Domain ID as the old switch. You might also want to upload the configuration of the old switch into the new switch as well. In the case of a Cisco switch, it is important to upload the configuration of the old switch into the new switch if you have AIX hosts using SDD. Uploading the configuration of the old switch into the new switch ensures that the FCID of the destination devices remains constant, which often is important to AIX hosts with SDD. 8. Plug the ISLs into the new switch and make sure it merges into the fabric successfully. 9. Attach the storage ports, making sure to use the same physical ports as the old switch. 10.Attach the SVC ports and perform appropriate maintenance procedures to bring the disk paths back online. 11.Attach the host ports and bring their paths back online.
7. Swap out the WWPNs in the SVC host definition interface. 8. Perform the device detection procedures appropriate for your OS to bring the paths back up and verify this with your multipathing software. (Use the command datapath query adapter in SDD.)
13.6.2 Controllers
It is common to refer to disk controllers by part of their serial number, which helps facilitate troubleshooting by making the cross-referencing of logs easier. If you have a unique name, by all means, use it, but it is helpful to append the serial number to the end.
13.6.3 MDisks
The MDisks must most certainly be changed from the default of mDisk X. The name must include the serial number of the controller, the array number/name, and the volume number/name. Unfortunately, you are limited to fifteen characters. This design builds a name similar to: 23K45_A7V10 - Serial 23K45, Array 7, Volume 10.
13.6.4 VDisks
The VDisk name must indicate for what host the VDisk is intended, along with any other identifying information that might distinguish this VDisk from other VDisks.
13.6.5 MDGs
MDG names must indicate from which controller the group comes, the RAID level, and the disk size and type. For example, 23K45_R1015k300 is an MDG on 23K45, RAID 10, 15k, 300 GB drives. (As with the other names on the SVC, you are limited to 15 characters.)
251
252
14
Chapter 14.
253
14.1 Cabling
None of what we are going to tell you in the following section is SVC-specific. However, because some cabling problems can produce SVC issues that will be troublesome and tedious to diagnose, we thought that reminders of how to structure cabling might be useful.
14.1.3 Labeling
All cables must be labeled at both ends with their source and destination locations. Even in the smallest SVC installations, a lack of cable labels quickly becomes an unusable mess if you are trying to trace problems. A small SVC installation consisting of a two-port storage array, 10 hosts, and a single I/O Group requires 30 fiber-optic cables to set up.
254
inaccessible nightmare as you try to merely reach, much less unplug, all of the appropriate cables. Things become even more difficult when you try to plug all those cables back into the proper port from which they came. If you can possibly spare the rack space, your cable management trays and guides need to take up about as much space as your switches themselves take.
255
14.2 Power
The SVC itself has no particularly exotic power requirements. Nevertheless, it is a source of some field issues.
14.3 Cooling
The SVC has no extraordinary cooling requirements. From the perspective of a data center designer, it is merely a pile of 1U servers. In case you need a refresher, here are a few pointers: The SVC, and most SAN equipment (with the exception of Cisco switches), cools front-to-back. However, front and back can be something of a confusing concept, especially with some smaller switches. When installing equipment, make sure that the side of the switch with the air intake is in the front. Fill empty spaces in your rack with filler panels, which helps to prevent recirculating hot exhaust air back into the rack intake. The most common filler panels do not even require screws to mount. Data centers with rows of racks must be set up with hot and cold aisles. You do not ever want the hot air from one rack dumping into the intake of another rack. In a raised-floor installation, the vent tiles must only be in the cold aisles. Vent tiles in the hot aisle can cause air recirculation problems. If you discover yourself deploying fans on the floor to fix hot spots, you really need to reevaluate your data center cooling configuration. Fans on the floor are a poor solution that will almost certainly lead to reduced equipment life. Instead, engage IBM, or any one of a number of professional data center contractors, to come in and evaluate your cooling configuration. It might be possible to fix your cooling by reconfiguring existing airflow without having to purchase any additional chiller units. 256
SAN Volume Controller: Best Practices and Performance Guidelines
257
14.7.2 Courses
IBM offers several classes to help you learn how to implement the SVC: SAN Volume Controller (SVC) - Planning and Implementation (SN821) or SAN Volume Controller (SVC) Planning and Implementation Workshop (SN830) - These courses provide a basic introduction to SVC implementation. The workshop version of the class also includes a hands-on lab; otherwise, the course content is identical. IBM TotalStorage Productivity Center Implementation and Configuration (SN856) - This class is great if you plan on using TPC to manage your SVC environment. TotalStorage Productivity Center for Replication Workshop (SN880) - This class covers managing replication with TPC. The replication part of TPC is virtually a separate product from the rest of TPC, so it is not covered in the basic course.
258
15
Chapter 15.
259
The following list contains an overview from the SVC perspective of the areas you must check: The attached hosts See 15.1.1, Host problems on page 260 The SAN See 15.1.3, SAN problems on page 262 The attached storage subsystem See 15.1.4, Storage subsystem problems on page 262 There are a few commands with which you can check the current status of the SVC and the attached storage subsystems. Before starting the complete data collection or starting the problem isolation on the SAN or subsystem level, we recommend that you use the following commands first and check the status from the SVC perspective. Several useful command line interface (CLI) commands to check the current environment from the SVC perspective are: svcinfo lscontroller controllerid Check that multiple worldwide path names (WWPNs) matching the back-end controller ports are available. Check that the path_counts are evenly distributed across each controller or that they are distributed correctly based on the preferred controller. Use the path_count calculation. The total of all path_counts must add up to the number of managed disks (MDisks) multiplied by the number of SVC nodes. Then, the path_counts need to be evenly distributed across the WWPNs of the controller ports so that all nodes utilize the same WWPN for a single MDisk (and the preferred controller algorithm in the back end is honored). See Fixing subsystem problems in an SVC-attached environment on page 278. svcinfo lsmdisk Check that all MDisks are online (not degraded nor offline). svcinfo lsmdisk mdiskid Check some of the MDisks from each controller. Are they online? And, do they all have path_count = number of nodes? svcinfo lsvdisk Check that all virtual disks (VDisks) are online (not degraded nor offline). If the VDisks are degraded, are there stopped FlashCopy jobs? Restart these or delete the mappings. svcinfo lshostvdiskmap Check that all VDisks are mapped to the correct host or are mapped at all. If the VDisk is not mapped, create the necessary VDisk to host mapping. svcinfo lsfabric Use of the various options, such as -controller, can allow you to check different parts of the SVC configuration to see that multiple paths are available from each SVC node port to an attached host or controller. Confirm that all node port WWPNs are connected to the back-end storage consistently.
261
problem reporting and data gathering functions, TPC offers a powerful alerting mechanism and a very powerful Topology Viewer, which enables the user to monitor the total environment. In Figure 15-1, we show a screen capture of the TPC Topology Viewer. In this panel, you can see the SVC 2145 cluster attached to a switch. Some of the lines are green and two of the lines are black. The black lines indicate that there is no connectivity between these ports, although there was in the past. This is just one example how TPC can help you to monitor your environment and find problem areas.
If you drill down further as shown in Figure 15-2 on page 264, you can see that all four ports for an SVC node are missing. The black lines again indicate that there is no connectivity between these ports. From a users point of view, this lack of connectivity can be caused by either switch, switch connectivity, or SVC problems. The best starting point to resolve the problem is described in 15.3.2, Solving SVC problems on page 272 , and if the SVC does not help to isolate the problem, continue as explained in 15.3.3, Solving SAN problems on page 275.
263
Figure 15-2 All four ports for an SVC node are missing
264
C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l Total Devices : 1 DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF2800000000000037 LUN IDENTIFIER: 60050768018101BF2800000000000037 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1752399 0 1 * Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 1752371 0 3 * Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0
Collect the following information from the host: Operating system: Version and level HBA: Driver firmware level Multipathing driver level
SDDPCM
SDDPCM has been enhanced to collect SDDPCM trace data periodically and write the trace data to the systems local hard drive. SDDPCM maintains four files for its trace data: pcm.log pcm_bak.log pcmsrv.log pcmsrv_bak.log
265
Starting with SDDPCM 2.1.0.8, the relevant data for debugging problems is collected by running sddpcmgetdata. If this command is not found, collect the following files. These files can be found in the /var/adm/ras directory. pcm.log pcm_bak.log pcmsrv.log pcmsrv_bak.log The output of the pcmpath query adapter command The output of the pcmpath query device command SDDPCM provides the sddpcmgetdata script to collect information used for problem determination. The sddpcmgetdata script creates a tar file at the current directory with the current date and time as a part of the file name. For example: sddpcmdata_hostname_yyyymmdd_hhmmss.tar The variable yyyymmdd_hhmmss is the time stamp of the file creation. When you report an SDDPCM problem, it is essential to run this script and send this tar file for problem determination. See Example 15-2.
Example 15-2 Use of the sddpcmgetdata script
SDDDSM
SDDDSM also provides the sddgetdata script to collect information to use for problem determination. SDDGETDATA.BAT is the batch file that generates the following files: sddgetdata_%host%_%date%_%time%.cab SDD\SDDSrv logs Datapath output Event logs Cluster log SDD specific registry entry HBA information
266
#!/bin/ksh export PATH=/bin:/usr/bin:/sbin echo "y" | snap -r # Clean up old snaps snap -gGfkLN # Collect new; don't package yet cd /tmp/ibmsupt/other # Add supporting data cp /var/adm/ras/sdd* . cp /var/adm/ras/pcm* . cp /etc/vpexclude . datapath query device > sddpath_query_device.out datapath query essmap > sddpath_query_essmap.out pcmpath query device > pcmpath_query_device.out pcmpath query essmap > pcmpath_query_essmap.out sddgetdata sddpcmgetdata snap -c # Package snap and other data echo "Please rename /tmp/ibmsupt/snap.pax.Z after the" echo "PMR number and ftp to IBM." exit 0
IBM_2145:ITSOCL1:admin>svcinfo lsnode id name UPS_serial_number WWNN status IO_group_id IO_group_name config_node UPS_unique_id hardware 6 Node1 1000739007 50050768010037E5 online 0 io_grp0 no 20400001C3240007 8G4 5 Node2 1000739004 50050768010037DC online 0 io_grp0 yes 20400001C3240004 8G4 4 Node3 100068A006 5005076801001D21 online 1 io_grp1 no 2040000188440006 8F4 8 Node4 100068A008 5005076801021D22 online 1 io_grp1 no 2040000188440008 8F4
267
So, for all nodes, except the config node, you have to run the command svctask cpdumps. There is no feedback given for this command. Example 15-5 shows the command.
Example 15-5 Copy the dump files from the other nodes
IBM_2145:ITSOCL1:admin>svctask cpdumps -prefix /dumps 8 IBM_2145:ITSOCL1:admin>svctask cpdumps -prefix /dumps 4 IBM_2145:ITSOCL1:admin>svctask cpdumps -prefix /dumps 6 To collect all the files, including the config.backup file, trace file, errorlog file, and more, you need to run the svc_snap dumpall command. This command collects all of the data, including the dump files. See Example 15-6. It is sometimes better to use the svc_snap and ask for the dumps individually by omitting the dumpall parameter, which captures the data collection, excluding the dump files. Note: Dump files are huge in size. Only request them if you really need them.
Example 15-6 svc_snap dumpall command
IBM_2145:ITSOCL1:admin>svc_snap dumpall Collecting system information... Copying files, please wait... Copying files, please wait... After the data collection with the dumpall command is complete, you can verify if the new snap file appears in your 2145 dumps directory using this command, svcinfo ls2145dumps. See Example 15-7.
Example 15-7 ls2145 dumps command
IBM_2145:ITSOCL1:admin>svcinfo ls2145dumps id 2145_filename 0 svc.config.backup.bak_SVCNode_1 1 svc.config.cron.bak_SVCNode_1 2 104603.trc.old 3 svc.config.cron.bak_node5 . . 25 snap.104603.070731.223110.tgz To copy the file from the SVC cluster, use the PuTTY secure copy (SCP) function. The PuTTY SCP function is described in more detail in Chapter 3, Master console on page 33 of the IBM System Storage SAN Volume Controller, SG24-6423-05, and also the SVC Configuration Guide, SC23-6628-00. Information: If there is no dump file available on the SVC cluster or for a particular SVC node, you need to contact your next level of IBM Support. The support personnel will guide you through the procedure to take a new dump.
268
Brocade switches
For most of the current Brocade switches, you need to issue the supportSave command to collect the support data.
McDATA
Using the Enterprise Fabric Connectivity Manager (EFCM) is the preferred way of collecting data for McDATA switches. For EFCM 8.7 and higher levels (without the group manager license), select the switch for which you want to collect data, right-click on it, and launch the Element Manager. See Figure 15-3. On the Element Manager panel, choose Maintenance Data collection Extended, and save the zip file on the local disk.
Cisco
Telnet to the switch and collect the output from the following commands: terminal length 0, show tech-support detail, and terminal length 24.
269
DS4000
With Storage Manager levels higher than 9.1, there is a feature called Collect All Support Data. To collect the information, open the Storage Manager and select Advanced Troubleshooting Collect All Support Data.
270
Software problems, such as: A down level multipathing driver Failures in the zoning The wrong host to VDisk mapping Example 15-8 shows only three out of four possible paths to the LUN.
Example 15-8 SDD output on the host with missing paths
C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l Total Devices : 1 DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF2800000000000037 LUN IDENTIFIER: 60050768018101BF2800000000000037 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1752398 0 1 * Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 1752370 0 Based on our field experience, we recommend that you check the hardware first: Check for any error light on the host or switch fiber optic connection Check if all parts are seated correctly Ensure that there is no broken fiber optic cable (if possible, swap them to a known good fiber optic connection) After the hardware check, continue to check the software and setup: Check that the HBA driver and firmware level are at the recommended levels and supported Check the multipathing driver level and make sure that it is supported Verify your switch zoning Check the general switch status and health Ensure that port status, link speed, and availability are acceptable In Example 15-9, we completely turned off zoning (which is not applicable in most client environments). After we turned off the zoning, the missing path appeared, which implies that we had a name server issue here. Rebooting the switch caused the name server to be refreshed as well. In our case, you can see all six paths appearing after turning off the zoning.
Example 15-9 Output from datapath query device command
C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l Total Devices : 1 DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF2800000000000037 LUN IDENTIFIER: 60050768018101BF2800000000000037 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1752398 0 1 * Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0
Chapter 15. Troubleshooting and diagnostics
271
2 3 * 4 * 5
1752370 0 0 0
0 0 0 0
C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l Total Devices : 1 DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF2800000000000037 LUN IDENTIFIER: 60050768018101BF2800000000000037 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1752399 0 1 * Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 1752371 0 3 * Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0 4 * Scsi Port2 Bus0/Disk1 Part0 CLOSE OFFLINE 0 0 5 Scsi Port2 Bus0/Disk1 Part0 CLOSE OFFLINE 0 0 C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l Total Devices : 1 After re-enabling the zoning, we got our four paths, and the two remaining paths are CLOSE OFFLINE.
The SVC error log provides you with information, such as all of the events on the SVC, all of the error messages, and SVC warning information. Although you can mark the error as fixed in the error log, we recommend that you always use the Run Maintenance Procedure as shown in Figure 15-4. Starting with SVC code Version 4.2, the error log has a new feature called Sense Expert as shown in Figure 15-5 on page 273. This tool translates the sense data to something more meaningful.
272
Another common practice is to use the SVC CLI to find problems. The following is a list of commands providing you with a set of information to get a status of your current environment: svctask detectmdisk (discover the changes in the back end) svcinfo lscluster clustername (check the cluster status) svcinfo lsnode nodeid (check the node and port status) svcinfo lscontroller controllerid (check the controller status) svcinfo lsmdisk (these commands will give you overall status of all the controllers and MDisks) svcinfo lsmdiskgrp (these commands will give you the overall status of all Managed Disk Groups (MDGs)) svcinfo lsvdisk (are they all online now?) Important: Although the SVC raises error messages, most problems are not caused by the SVC. Most problems are introduced by the storage subsystems or the SAN. If the problem is caused by the SVC and you are unable to fix it either with the Run Maintenance Procedure or with the error log, you need to collect the SVC debug data as explained in 15.2.3, SVC data collection on page 267. If the problem is related to anything outside of the SVC, refer to the appropriate section in this guide to find and fix the problem.
273
Check the back-end storage configurations for SCSI ID to LUN ID mappings. Normally, a 1625 error is detected if there is a problem, but it is also worthwhile to manually check these. Specifically, we need to make sure that the SCSI ID to LUN ID is the same for each node. Each WWPN of the SVC has the identical LUN mapping for every node. For example, LUN 1 is the same as ESS/2107/etc serial number for every WWPN of the SVC cluster. You can use these commands on the ESS to pull the data out to check ESS mapping: esscli list port -d "ess=<ESS name>" esscli list hostconnection -d "ess=<ESS name>" esscli list volumeaccess -d "ess=<ESS name>" And, verify that the mapping is identical. Use the following commands for a DS8000 to check the SCSI ID to LUN ID mappings: lsioport -dev <DS8K name> -l -fullid lshostconnect -dev <DS8K name> -l showvolgrp -lunmap <DS8K name> lsfbvol -dev <DS8K name> -l -vol <SVC Vol Groups> LUN mapping problems are not likely on a DS8000 based on the way that the volume groups are allocated; however, it is still worthwhile verifying the configuration just prior to upgrades. For the DS4000, we also recommend that you verify that each WWPN of the SVC has the identical LUN mapping for every node. Open Storage Management for DS4000 and use the Mappings View to verify the mapping. You can also run the data collection for the DS4000 and use the subsystem profile to check the mapping. For storage subsystems from other vendors, use the corresponding steps to verify the correct mapping. Use the host multipathing commands, such as the datapath query device and svcinfo lsvdisk, mdisk, and lscontroller commands to verify: Host path redundancy Controller redundancy Controller misconfigurations Use the Run Maintenance Procedure or Analyze Error Log function in the SVC console GUI to investigate any unfixed or investigated SVC errors. Download and execute the SAN Volume Controller Software Upgrade Test Utility: http://www-1.ibm.com/support/docview.wss?uid=ssg1S4000585 Review the latest flashes, hints, and tips prior to the cluster upgrade. There will be a list of flashes, hints, and tips on the SVC code download page that are directly applicable. Also, review the latest updates shown here: http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1002860 Note: In most cases, the SVC is not the cause of the problem, but it can certainly help to isolate the root cause of the problem.
274
275
zone:
The correct zoning needs to look like the zoning shown in Example 15-11.
Example 15-11 Correct WWPN zoning
zone:
Example 15-12 shows an unequal LUN distribution on the back-end storage controller. Note that one WWPN has zero path count, while the other WWPN carries all the paths of both LUNs. This situation has two possible causes: If the back end is a controller with a preferred controller, perhaps the LUNs are both allocated to the same controller. This is likely with the DS4000 and can be fixed by redistributing the LUNs evenly on the DS4000 and then rediscovering the LUNs on the SVC. Another possible cause is that the WWPN with zero count is not visible to all the SVC nodes via the SAN zoning. Use svcinfo lsfabric 0 to confirm.
Example 15-12 Unequal LUN distribution on the back-end controller
IBM_2145:ITSOCL1:admin>svcinfo lscontroller 0 id 0 controller_name controller0 WWNN 200400A0B8174431 mdisk_link_count 2 max_mdisk_link_count 4 degraded no vendor_id IBM product_id_low 1742-900 product_id_high product_revision 0520 ctrl_s/n WWPN 200400A0B8174433 path_count 0 max_path_count 4 WWPN 200500A0B8174433 path_count 8 max_path_count 8 IBM_2145:ITSOCL1:admin>svctask detectmdisk IBM_2145:ITSOCL1:admin>svcinfo lscontroller 0 id 0 controller_name controller0
276
WWNN 200400A0B8174431 mdisk_link_count 2 max_mdisk_link_count 4 degraded no vendor_id IBM product_id_low 1742-900 product_id_high product_revision 0520 ctrl_s/n WWPN 200400A0B8174433 path_count 4 max_path_count 4 WWPN 200500A0B8174433 path_count 4 max_path_count 8
Upgradability
Check the following Web site to see from which level to which level the SVC code can be upgraded. Furthermore, check which level of SVC console GUI is required to run the latest SVC code: http://www-304.ibm.com/jct01004c/systems/support/supportsite.wss/supportresources? taskind=2&brandind=5000033&familyind=5329743
Upgrade order
The following list shows a desirable upgrade order: 1. 2. 3. 4. SVC Master Console GUI SVC cluster code SAN switches Host system (HBA, OS and service packs, and multipathing driver)
277
In this example: 2 x 4 = 8
Example 15-13 shows how to obtain this information using the commands svcinfo lscontroller id and svcinfo lsnode.
Example 15-13 svcinfo lscontroller 0 command
IBM_2145:ITSOCL1:admin>svcinfo lscontroller 0 id 0 controller_name controller0 WWNN 200400A0B8174431 mdisk_link_count 2 max_mdisk_link_count 4 degraded no vendor_id IBM product_id_low 1742-900 product_id_high product_revision 0520 ctrl_s/n WWPN 200400A0B8174433 path_count 4 max_path_count 12 WWPN 200500A0B8174433 path_count 4 max_path_count 8 IBM_2145:ITSOCL1:admin>svcinfo lsnode id name UPS_serial_number WWNN status IO_group_id IO_group_name config_node UPS_unique_id hardware 6 Node1 1000739007 50050768010037E5 online 0 io_grp0 no 20400001C3240007 8G4 5 Node2 1000739004 50050768010037DC online 0 io_grp0 yes 20400001C3240004 8G4 4 Node3 100068A006 5005076801001D21 online 1 io_grp1 no 2040000188440006 8F4 8 Node4 100068A008 5005076801021D22 online 1 io_grp1 no 2040000188440008 8F4
278
To run the SVC Maintenance Procedures, open the SVC console GUI. Select Service and Maintenance Run Maintenance Procedures. On the panel that appears on the left side, click Start Analysis.
For more information about how to use the SVC Maintenance Procedures, refer to IBM System Storage SAN Volume Controller, SG24-6423-05, or the SVC Service Guide, GC26-7901-01. 2. Check the attached storage subsystem for misconfigurations or failures: a. Independent of the type of storage subsystem, the first thing to check is if there are any open problems on the system. Use the service or maintenance features provided with the storage subsystem to fix these problems. If needed, call support. b. Then, check if the LUN masking is correct. When attached to the SVC, you have to make sure that the LUN masking maps to the active zone set on the switch. Create a LUN mask for each HBA port that is zoned to the SVC. If you fail to do so, the SVC will raise error messages. For more information, read the SVC configuration requirements and guidelines: http://www-304.ibm.com/jct01004c/systems/support/supportsite.wss/supportreso urces?taskind=3&brandind=5000033&familyind=5329743 c. Check if you have established a good LUN allocation on your storage subsystem and that the LUNs are equally distributed on all zoned subsystem controllers. Next, we show an example of a misconfigured storage subsystem, and how this misconfigured storage system will appear in the SVC. Furthermore, we explain how to fix that problem. By running the svcinfo lscontroller ID command, you will get the output shown in Example 15-14. As highlighted in the example, the MDisks, and therefore, the LUNs, are not equally allocated. In our example, the LUNs provided by the storage subsystem are only visible by one path (WWPN).
Example 15-14 MDisks unevenly distributed
IBM_2145:ITSOCL1:admin>svcinfo lscontroller 0 id 0 controller_name controller0 WWNN 200400A0B8174431 mdisk_link_count 2 max_mdisk_link_count 4 degraded no vendor_id IBM product_id_low 1742-900 product_id_high product_revision 0520 ctrl_s/n WWPN 200400A0B8174433 path_count 8 max_path_count 12
279
WWPN 200500A0B8174433 path_count 0 max_path_count 8 d. To determine the root cause of this problem, follow the actions described in Fixing subsystem problems in an SVC-attached environment on page 278. If running the Maintenance procedure under SVC does not fix the problem, continue with the second step checking the storage subsystem for failures or misconfigurations. If you are unsure about which of the attached MDisks has which corresponding LUN ID, use this command svcinfo lsmdisk (see Example 15-15).
Example 15-15 Determine the UID for the MDisk
IBM_2145:ITSOCL1:admin>svcinfo lsmdisk id name status mode mdisk_grp_id mdisk_grp_name capacity ctrl_LUN_# controller_name UID 0 mdisk0 online managed 0 MDG-1 600.0GB 0000000000000000 controller0 600a0b800017423300000059469cf84500000000000000000000000000000000 2 mdisk2 online managed 0 MDG-1 70.9GB 0000000000000002 controller0 600a0b800017443100000096469cf0e800000000000000000000000000000000 e. With the MDisk and the UID, you can determine which LUN is attached to the SVC. Then, check if the LUNs are equally distributed on the available storage controllers. If not, redistribute them equally over all available storage controllers. f. Run the svcinfo lscontroller ID again to check if this action resolved the problem.
Example 15-16 Equally distributed MDisk on all available paths
IBM_2145:ITSOCL1:admin>svcinfo lscontroller 0 id 0 controller_name controller0 WWNN 200400A0B8174431 mdisk_link_count 2 max_mdisk_link_count 4 degraded no vendor_id IBM product_id_low 1742-900 product_id_high product_revision 0520 ctrl_s/n WWPN 200400A0B8174433 path_count 4 max_path_count 12 WWPN 200500A0B8174433 path_count 4 max_path_count 8 g. In our example, the problem was solved by changing the LUN allocation. If step 2 did not solve the problem, you need to continue with step 3. 3. Check the SAN for switch problems or zoning failures. Problems in the SAN can be caused by a great variety of things. See 15.2.4, SAN data collection on page 269 for more information. 280
SAN Volume Controller: Best Practices and Performance Guidelines
4. Collect all support data and involve IBM support. Collect the support data for the involved SAN, SVC, or storage systems as described in 15.2, Collecting data and isolating the problem on page 262.
15.4 Livedump
SVC livedump is a procedure that IBM Support might ask clients to run by IBM Support. Only invoke livedump under the direction of IBM Support. Sometimes, investigations require a livedump from the configuration node in the cluster. A livedump is a lightweight dump from a node, which can be taken without impacting host I/O. The only impact is a slight reduction in system performance (due to reduced memory being available for the I/O cache) until the dump is finished. The instructions for a livedump are: 1. Prepare the node for taking a livedump: svctask preplivedump <node id/name> This will reserve the necessary system resources to take a livedump. The operation can take some time, because the node might have to flush some data from the cache. System performance might be slightly affected after running this command, because some memory, which normally is available to the cache, is not available while the node is prepared for a livedump.
281
After the command has completed, then the livedump is ready to be triggered. This can be seen looking at the output from svcinfo lslivedump <node id/name>. The status must be reported as prepared. 2. Trigger the livedump: svctask triggerlivedump <node id/name> This command will complete as soon as the data capture is complete, but before the dump file has been written to disk. 3. Query the status and copy the dump off when complete: svcinfo lslivedump <nodeid/name> The status will show dumping while the file is being written to disk and inactive after it is completed. After the status returns to the inactive state, the livedump file can be found in /dumps on that node with a filename of the format: livedump.<panel_id>.<date>.<time> This can be copied off the node like a normal dump using the GUI or SCP. The dump must then be uploaded to IBM for analysis.
282
16
Chapter 16.
283
284
In Figure 16-2 on page 286, we show the improvement for throughput. Because the SPC-2 benchmark was only introduced in 2006, this graph is of necessity over a shorter time span.
285
286
Figure 16-3 Comparison of a software only upgrade to a full upgrade of an 8F4 node (variety of workloads, I/O rate times 1000)
As you can see in Figure 16-3, significant gains can be achieved with the software-only upgrade. The 70/30 miss workload, consisting of 70 percent read misses and 30 percent write misses, is of special interest. This workload contains a mix of both reads and writes, which we ordinarily expect to see under production conditions. Figure 16-4 on page 288 presents another view of the effect of moving to the latest level of software and hardware.
287
Figure 16-5 presents a more detailed view of performance on this specific workload. Figure 16-5 shows that the SVC 4.2 software-only upgrade boosts the maximum throughput for the 70/30 workload by more than 30%. Thus, a significant portion of the overall throughput gain achieved with full hardware and software replacement comes from the software enhancements.
25
20
15
10
4.1.0 8F4
4.2.0 8F4
4.2.0 8G4
Figure 16-5 Comparison of a software only upgrade to a full upgrade of an 8F4 node 70/30 miss workload
288
Figure 16-6 OLTP workload performance with two, four, six, or eight nodes
Figure 16-7 on page 290 presents the database scalability results at a higher level by pulling together the maximum throughputs (observed at a response time of 30 milliseconds or less) for each configuration. The latter figure shows that SVC Version 4.2 performance scales in a nearly linear manner depending upon the number of nodes.
289
As Figure 16-6 on page 289 and Figure 16-7 show, the tested SVC configuration is capable of delivering over 270,000 I/Os per second (IOPS) for the OLTP workload. You are encouraged to compare this result against any other disk storage product currently posted on the SPC Web site at: http://www.storageperformance.org
290
Related publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this book.
Other resources
These publications are also relevant as further information sources: IBM System Storage Open Software Family SAN Volume Controller: Planning Guide, GA22-1052 IBM System Storage Master Console: Installation and Users Guide, GC30-4090 IBM System Storage Open Software Family SAN Volume Controller: Installation Guide, SC26-7541 IBM System Storage Open Software Family SAN Volume Controller: Service Guide, SC26-7542 IBM System Storage Open Software Family SAN Volume Controller: Configuration Guide, SC26-7543 IBM System Storage Open Software Family SAN Volume Controller: Command-Line Interface User's Guide, SC26-7544 IBM System Storage Open Software Family SAN Volume Controller: CIM Agent Developers Reference, SC26-7545 IBM TotalStorage Multipath Subsystem Device Driver Users Guide, SC30-4096 IBM System Storage Open Software Family SAN Volume Controller: Host Attachment Guide, SC26-7563
291
292
Related publications
293
294
Index
Numerics
500 86 automation 35, 125 auxiliary 163 availability 5, 69, 88, 98, 176, 214, 256, 259
A
acceptance xv access 2, 22, 34, 62, 88, 106, 117, 145, 171, 216, 250 access pattern 122 accident 146 active 36, 62, 108, 144, 195, 220, 279 active state 63 adapters 69, 106, 171, 212, 224, 248 adds 80, 204 Admin 35, 205 admin password 51 administration 42, 90, 243 administrator 22, 42, 78, 145, 204, 244, 260 administrators 198, 204, 238, 284 advanced copy 22, 146 aggregate 62, 105 AIX xiv, 66, 170, 201, 248, 264 AIX host 179, 186, 266 AIX LVM admin roles 205 alert 10, 159, 216 alerts 3, 235, 257 Alias 17 alias 16 aliases 14, 241 alignment 209 allocation algorithm 115 amount of I/O 26, 101, 122, 161 analysis 78, 163, 225, 282 application availability 88, 99, 212 performance 57, 88, 98, 119, 144, 202, 227 applications 12, 22, 35, 66, 99, 122, 144, 171, 201 architecture 4, 62, 112, 184, 284 architectures 106, 193 area 183, 210, 262 areas xiv, 33, 169, 203, 260 array 2, 22, 61, 87, 98, 130, 156, 195, 204, 242, 254 arrays 2, 22, 68, 88, 98, 119, 157, 205, 238, 260 asymmetric 29 asynchronous 82, 125, 144 asynchronously 144 attached 3, 26, 58, 62, 86, 114, 169, 203, 240, 260 attention 4, 247 attributes 85 audit 33 Audit log 48 audit log 46 authorization 40 auto 116 automatically discover 180 Automation 53 Copyright IBM Corp. 2008. All rights reserved.
B
backend storage controller 140, 276 back-end storage controllers 161 background copy 159 background copy rate 159 backup 3, 52, 156, 192, 203, 256, 268 balance 15, 63, 93, 98, 114, 160, 176, 205, 257, 286 balanced 15, 63, 111, 136, 172, 209 balancing 19, 93, 114, 172, 206 band 99, 129 Bandwidth 169 bandwidth 2, 23, 71, 109, 122, 156, 172, 203, 244 bandwidth requirements 10 baseline 81, 132 Basic 5, 34 basic 2, 23, 35, 130, 170, 238, 258, 262 best practices xiii, 1, 88, 99, 114, 158, 169 between 3, 27, 59, 62, 88, 100, 114, 143, 171, 204, 216, 256, 263, 283 BIOS 31, 193 blade 14 BladeCenter 10 blades 14 block 56, 69, 114, 147, 202, 229 block size 69, 135, 204 block-for-block translation 115 blocking 2 BM System Storage SAN Volume Controller Host Attachment Users Guide Version 4.2.0 169, 193 boot 172 boot device 190 bottlenecks 135, 202 boundary crossing 208 Brocade 4, 23, 269 buffer 148, 230 buffering 56 buffers 116, 158, 170, 213 bus 22, 182, 224, 284
C
cache 2, 22, 56, 61, 86, 102, 118, 144, 170, 202204, 229, 247, 256, 281, 284 cache disabled 30, 125, 150 cache mode 127 cache-enabled 158 caching 22, 69, 87, 122, 146 cap 101 capacity 9, 22, 73, 87, 98, 114, 147, 207, 241, 280 cards 4, 62, 193 certified xiv, 11, 254
295
changes 3, 25, 43, 81, 87, 135, 146, 170, 212, 237, 257, 260 channel 187 chdev 187 choice 26, 58, 69, 88, 122, 177 CIMOM 35, 216 Cisco 2, 23, 243, 254, 269 classes xiv, 100, 257 CLI 65, 89, 114, 148, 182, 227, 261 commands 44, 72, 89, 273 client 191, 208 cluster 2, 21, 34, 56, 62, 86, 98, 114, 144, 171, 216, 244, 260, 288 creation 51, 114 IP address 39, 217 Cluster configuration 53 cluster ID 47 cluster partnership 52 clustering 184 clustering software 184 clusters 11, 22, 35, 143, 184, 216, 238, 277 code update 30 combination 136, 145, 243 command 30, 36, 63, 89, 114, 148, 173, 207, 216, 249, 257, 265 command prompt 38 commit 151 compatibility 30, 35, 246 complexity 7, 284 conception 12 concepts xiii concurrent 30, 35, 136, 182, 273 configuration 1, 23, 34, 57, 61, 86, 98, 115, 144, 170, 202, 216, 237, 256, 260, 289 configuration backup 52 configuration changes 180 configuration data 180, 270 configuration file 52 configuration node 39, 281 configuration parameters 164, 182 configure 10, 88, 188, 212, 216, 237 congested links 2 congestion 2, 232 control 3 connected 2, 57, 62, 169, 217, 239, 256, 261 connection 36, 77, 185, 216, 240, 271 connections 9, 35, 62, 190, 240 connectivity 189, 216, 245, 260, 289 consistency 40, 143, 197 consistency group 147 consistency groups 40, 147 consistent 29, 130, 162, 197, 234, 243 consolidation 98, 284 containers 208 control 22, 73, 93, 125, 145, 171, 205, 238, 284 controller port 86 copy 22, 44, 56, 100, 116, 144, 197, 229, 238, 256, 267 copy rate 150 copy service 144 copy services 22, 58, 116, 144
core 4, 254 core fabric 11 core switches 5 correctly configured 163, 218 corrupted 197 corruption 20, 76 cost 11, 88, 98, 146, 202, 246 counters 198, 228 create a FlashCopy 151 critical xiv, 34, 69, 90, 202 current 23, 35, 56, 68, 146, 182, 217, 240, 261, 286 CWDM 11
D
data 3, 22, 47, 63, 87, 100, 115, 144, 171, 202, 215, 237, 254, 259 consistency 151 data formats 192 data integrity 118, 148 data layout 112, 116, 204 Data layout strategies 213 data migration 156, 192 data path 80, 115 data pattern 202 data rate 101, 136, 167, 228 data structures 209 data traffic 10 database 3, 81, 122, 152, 179, 203, 228, 241, 260, 289 log 204 date 56, 217, 241, 266 DB2 container 209 DB2 I/O characteristics 209 db2logs 209 debug 77, 262 dedicate bandwidth 12 default 34, 62, 114, 160, 173, 217, 249 default values 69 defined 18, 140, 147, 204, 220, 249 degraded 133, 144, 261 delay 130, 151 delete a VDisk 117 deleted 150 demand 100, 289 dependency 111, 154 design 1, 22, 81, 99, 130, 178, 208, 255, 286 destage 29, 56, 69, 87, 129 device 2, 69, 106, 130, 146, 172, 206, 219, 237, 264 device driver 146, 185 diagnose 15, 163, 254 diagnostic 186, 270 different vendors 146 director 5 directors 4 disabled 30, 57, 125, 144, 247 disaster 25, 145, 197, 255 discovery 63, 94, 179, 245 disk 2, 22, 56, 67, 85, 98, 114, 147, 179, 202, 225, 238, 256, 269, 290 latency 29, 203
296
disk access profile 122 disk groups 25 Disk Magic 136 disruptive 3, 30, 114, 243 distance 11, 158, 254 limitations 11 distances 11 DMP 177 DNS 34 documentation 1, 38, 238 domain 4, 74, 98 Domain ID 20, 250 domain ID 20 domains 98 download 199, 274 downtime 57, 115, 151 driver 30, 62, 146, 185, 249, 260 drops 105, 231 DS4000 26, 62, 89, 102, 199, 203, 234, 270 DS4000 Storage Server 203 DS4100 86 DS4500 57, 218 DS4800 17, 69, 86 DS6000 xiv, 62, 89, 102, 188, 270 DS8000 xiv, 17, 62, 86, 102, 188, 207, 270 dual fabrics 14 DWDM 11
extent 25, 71, 85, 114, 204 size 114, 208 extent migration 27 extent size 114, 207 extent sizes 114, 207 extents 27, 61, 89, 114, 209
F
Fabric 20, 27, 224, 269 fabric 1, 22, 135, 157, 170, 217, 246, 260 isolation 176 login 178 fabric outage 3 fabrics 6, 171, 217 failed node 56 failover 29, 62, 122, 170, 260 failure boundaries 100, 206 FAStT 14, 194 storage 14 FAStT200 86 fault tolerant 88 FC xiv, 2, 69, 178 fcs 19, 187, 248 fcs device 187 features 22, 146, 189, 237, 256, 260 Fibre Channel 2, 56, 62, 158, 169, 248, 254, 260 ports 10, 62 routers 158 traffic 3 Fibre Channel (FC) 171 Fibre Channel ports 62, 172, 250, 275 file system 148, 194, 209 file system level 197 filesets 191 firmware 163, 247, 262 flag 118, 162 flash 56, 147 FlashCopy 23, 40, 56, 66, 87, 111, 116, 144, 229, 261 applications 68, 111 bitmap 147 mapping 30, 78, 147 prepare 150 rules 158 source 23, 67, 116, 147 Start 116 target 126, 147, 229 FlashCopy mapping 148 FlashCopy mappings 117, 147 flexibility 23, 122, 146, 184, 238 flow 3, 136 flush the cache 182 force flag 118 format 47, 77, 192, 239, 258, 282 frames 2 free extents 121 front panel 36 full bandwidth 5 function 30, 66, 89, 112, 145, 194, 214, 256, 268 functions 22, 34, 66, 144, 189, 231, 262
E
edge xv, 2 edge fabric 4 edge switch 3 edge switches 4 efficiency 121 element 23 eliminates 78 e-mail xvi, 12, 198, 244 EMC 61 EMC Symmetrix 63 enable 7, 22, 51, 119, 151, 188, 203, 227, 249 enforce 9 Enterprise 62, 235, 269 error 11, 36, 61, 90, 143, 170, 216, 238, 260 Error Code 68 error handling 68 error log 67, 247, 264 errors 11, 34, 66, 143, 170, 247, 260 ESS xiv, 62, 89, 102, 207, 274 ESS storage 63 Ethernet 2, 39 evenly balancing I/Os 212 event 3, 29, 62, 98, 122, 145, 188, 235 events 34, 146, 235, 272 exchange 151 execution throttle 194 expand 26 expansion 3, 211 extenders 158 extension 11
Index
297
G
gateway 34 gateway IP address 40 GB 22, 57, 147, 242 Gb 4, 69, 231 General Public License (GNU) 199 Global 23, 40, 56, 117, 144, 221, 277 Global Mirror 23, 144, 222 Global Mirror relationship 158 gmlinktolerance 160 GNU 199 governing throttle 122 grain 87 granularity 114, 197, 286 graph 81, 135, 230, 285 graphs 27, 183 group 9, 23, 56, 72, 98, 114, 143, 172, 204, 227, 241, 254, 269 groups 10, 25, 40, 73, 85, 113, 147, 173, 206, 230, 274, 289 growth 81, 211 GUI 13, 30, 35, 63, 89, 114, 148, 177, 216, 257, 267
H
HACMP 35, 189 hardware xiii, 2, 23, 34, 58, 62, 89, 100, 163, 193, 242, 260, 284 selection 5 HBA 10, 22, 176, 187, 194, 224, 244, 260 HBAs 13, 133, 171172, 194, 203, 224, 247 health 190, 216, 271 healthy 164, 219 heartbeat 159 help xv, 9, 36, 56, 66, 101, 113, 156, 186, 204, 229, 238, 257, 260 heterogeneous 22, 262 hops 3 host 2, 22, 35, 56, 62, 86, 109, 114, 144, 169, 201, 215, 238, 260 configuration 15, 117, 158, 205, 261 creating 17 definitions 117, 180, 203 HBAs 15 information 31, 177, 225, 244, 264 showing 27 systems 26, 169, 203, 260 zone 14, 114, 171, 262 host bus adapter 193 host level 172 host mapping 129, 172, 261 host type 62, 244 host zones 17, 241
I/O response time 132 IBM storage products xiii IBM Subsystem Device Driver 62, 89, 118, 146, 189 IBM TotalStorage Productivity Center 20, 159, 217, 258, 262 ICAT 50 identification 90, 173 identify 36, 61, 89, 100, 148, 188, 229 identity 47 IEEE 192 image 23, 57, 87, 115, 147, 172, 205, 241 Image Mode 115, 145 Image mode 27, 119, 145, 206 image mode 26, 116, 149, 179, 206 image mode VDisk 28, 116, 211 Image Mode VDisks 146 image mode virtual disk 125 implement 3, 25, 59, 89, 193, 243, 258 implementing xiii, 1, 101, 184, 257 import 116 improvements 23, 59, 111, 137, 190, 283 Improves 22 in-band 129 information 1, 30, 33, 56, 63, 121, 148, 179, 201, 216, 238, 257, 260 infrastructure 100, 125, 146, 218, 246 initial configuration 176 initiate 40 initiating 78 initiators 86, 185 install 6, 38, 157, 193, 227, 246 installation 1, 88, 216, 238, 254 insufficient bandwidth 3 integrity 118, 148 Inter Switch Link 2 interface 22, 50, 148, 169, 216, 250 interoperability 10, 246 interval 163, 228 introduction 27, 79, 258, 284 iogrp 118, 171 IOPS 170, 202, 284 IP 11, 33, 216 IP address 34 IP traffic 12 ISL 2, 241 ISL oversubscription 3 ISLs 3, 239 isolated 74, 176 isolation 2, 63, 90, 101, 176, 261
J
journal 194, 204
I
I/O governing 122 I/O governing rate 124 I/O group 9, 23, 56, 114, 143, 177, 229, 244, 256 I/O groups 16, 114, 167, 180, 234 I/O performance 56, 187, 210
K
kernel 194 key 38, 122, 179, 208, 242 keys 38, 185
298
L
last extent 114 latency 29, 129, 151, 203 LBA 67 level 13, 23, 35, 62, 88, 131, 144, 172, 212, 233, 260, 287 storage 78, 197, 246, 261 levels 4, 34, 68, 87, 102, 132, 184, 211, 244, 262, 286 lg_term_dma 187 library 195 license 25, 269 light 101, 202, 271 limitation 36, 183, 228 limitations 1, 25, 35, 145, 203, 270 limiting factor 130 limits 22, 35, 130, 144, 183, 212 lines of business 206 link 2, 25, 36, 144, 191, 257, 271 bandwidth 12, 159 latency 158 links 2, 158, 217, 254, 270 Linux xiii, 194, 264 list 13, 22, 47, 66, 87, 115, 158, 195, 238, 260 list dump 53 livedump 281 load balance 122, 177 Load balancing 190 load balancing 114, 193 loading 71, 111, 143 LOBs 206 location 38, 77, 87, 133, 202, 238 locking 184, 286 log 46, 67, 146, 230, 247, 257, 264 logged 36, 75 Logical Block Address 67 logical drive 62, 92, 187, 204, 208 logical unit number 145 logical units 25 login 36, 122, 171 logins 171 logs 33, 152, 204, 247, 266 long distance 158 loops 70, 255 LPAR 192, 213 LU 172 LUN 12, 26, 61, 85, 102, 125, 145, 170, 204, 222, 241, 262 access 146, 185 LUN mapping 90, 172, 274 LUN masking 20, 75, 262 LUN Number 63, 90 LUN per 102, 206 LUNs 62, 86, 97, 146, 172, 205, 227, 242, 276 LVM 117, 190, 205
M
M12 4 maintaining passwords 33 maintenance 31, 36, 160, 178, 248, 260
maintenance procedures 36, 250, 278 maintenance window 160 manage 22, 33, 62, 113, 147, 170, 206, 216, 246, 258 managed disk 115, 213, 275 managed disk group 119, 213 Managed Mode 70, 119 management xiii, 7, 34, 56, 98, 144, 170, 205, 216, 254, 262, 286 capability 171, 213 port 171, 235 software 173 managing 22, 39, 58, 170, 208, 238, 258, 260 map 65, 129, 157, 173 map a VDisk 176 mapping 30, 61, 90, 106, 117, 148, 170, 206, 261 mappings 40, 117, 147, 185, 261 maps 147, 212, 279 mask 10, 34, 146, 171, 279 masking 12, 26, 75, 157, 171, 262 master 31, 34, 148 master console 34, 149 max_xfer_size 187188 maximum IOs 209 MB 12, 57, 70, 114, 188, 207 Mb 12, 25 McDATA 10, 23, 269 MDGs 85, 97, 114, 206, 273 MDisk 27, 52, 61, 85, 100, 114, 146, 177, 204, 222, 241, 261, 286 adding 89, 132 removing 186 MDisk group 116, 146, 204 media 66, 164, 227, 275 member xiii, 16 members 14, 70, 260 memory 22, 49, 147, 170, 204, 229, 245, 256, 281, 286 message 20, 36, 163, 249 messages 177, 250, 272 metric 81, 130, 166, 230 Metro 23, 40, 116, 144, 221, 277 Metro Mirror 23, 144, 230 Metro Mirror relationship 151 microcode xiii, 68 migrate 10, 59, 116, 156, 172 migrate data 119, 191 migrate VDisks 117 migration 2, 26, 66, 119, 156, 179, 245, 258 migration scenarios 9 mirrored 22, 129, 159, 197 mirroring 11, 117, 144, 190 misalignment 208 mkrcrelationship 162 Mode 70, 96, 115, 145, 174, 245, 265 mode 23, 50, 56, 87, 97, 115, 145, 171, 205, 254, 280 settings 158 monitor 20, 43, 132, 159, 215, 246, 262 monitored 81, 132, 164, 197, 260 monitoring 56, 79, 159, 169, 215, 246, 275 monitors 136, 228 mount 118, 151, 256
Index
299
MPIO 189, 249 multipath drivers 89, 247 multipath software 184 multipathing xiii, 30, 62, 170, 248, 259 Multipathing software 178 multipathing software xiv, 176, 251 multiple paths 122, 176, 261 multiple vendors 10 multiplexing 11
N
Name Server 34 name server 178, 250, 271 names 16, 47, 114, 192, 251 nameserver 178 naming 14, 47, 64, 88, 114, 244 naming conventions 52 new disks 179 new MDisk 92 no virtualization 115 NOCOPY 150 node 2, 24, 39, 56, 75, 86, 101, 114, 143, 170, 217, 247, 256, 260, 284 adding 25 failure 29, 56, 122, 178 port 14, 122, 164, 171, 217, 261 nodes 3, 22, 39, 56, 74, 86, 114, 154, 171, 217, 247, 256, 261, 283 noise 130 non 7, 22, 76, 116, 155, 177, 206, 231, 243, 258, 264 non-disruptive 119 non-preferred path 121 num_cmd_elem 187188
O
offline 30, 39, 57, 68, 89, 118, 146, 177, 224, 248, 261 online xv, 89, 112, 117, 148, 219, 248, 261 OnLine Transaction Processing (OLTP) 203 online transaction processing (OLTP) 203204 open systems xiv operating system (OS) 202 operating systems 176, 208, 249, 264 Operator 40 optimize 112, 283 Oracle 190, 206 ordered list 115 organizations 12 OS 51, 170, 213, 247, 277 overlap 14 overloading 140, 167, 236 oversubscription 3 overview 27, 35, 85, 204, 259
P
parameters 29, 47, 61, 86, 122, 164, 172, 203, 245 partition 191 partitions 69, 136, 191, 208 partnership 52, 159
password 50 passwords 33 path 3, 29, 35, 57, 62, 101, 115, 170, 213, 222, 248, 260 selection 189 paths 8, 30, 62, 121, 170, 222, 245, 261 peak 3, 159 per cluster 24, 114, 147, 230 performance xiii, 3, 22, 56, 61, 87, 98, 113, 144, 169, 201, 217, 237, 260, 283 degradation 63, 102, 144 performance advantage 88, 105 performance characteristics 100, 116, 199, 213 performance improvement 26, 119, 231, 284 performance monitoring 166, 172 performance requirements 59 permanent 163 permit 3, 286 persistent 89, 184 PFE xiv physical 11, 22, 61, 87, 139, 148, 169, 202, 229, 241, 255 physical volume 191, 212 ping 39 PiT 126 planning 15, 87, 97, 130, 165, 203 plink 38 plink.exe 38 PLOGI 178 point-in-time 145 point-in-time copy 146 policies 189 policy 30, 51, 101, 115, 185, 246 pool 22, 71, 100, 160 port 2, 22, 61, 86, 133, 164, 170, 217, 238, 254, 261 types 63 port layout 4 port zoning 12 ports 2, 23, 57, 62, 86, 133, 170, 217, 241, 254, 261 power 29, 182, 250, 256, 275 PPRC 56 preferred 11, 26, 34, 56, 62, 114, 160, 171, 207, 261 preferred node 56, 114, 160, 177 preferred path 57, 62, 121, 177 preferred paths 122, 177, 264 prepare a FlashCopy 165 prepared state 164 primary 25, 39, 87, 98, 125, 144, 206 priority 36 private 38 private key 38 problems 2, 31, 35, 63, 89, 129, 157, 186, 202, 242, 254, 259 productivity xv profile xiv, 69, 93, 122, 274 progress 30, 50, 56 properties 130, 195 protect 159 protecting 70 provisioning 88, 102 pSeries 19, 76, 199 public key 38
300
Q
quad 4 queue depth 85, 182, 188, 193194, 212 quickly 2, 40, 78, 129, 150, 176, 224, 243, 254 quiesce 117, 151, 181
route 160 router 158 routers 158 routes 7 routing xiv, 4, 62, 255 RPQ 3, 194, 246 RSCN 178 rules 38, 80, 141, 158, 170, 262
S
SAN xiii, 1, 21, 33, 62, 114, 157, 169, 212, 215, 237, 254, 259, 284 availability 176 fabric 1, 157, 176, 218 SAN configuration 1 SAN fabric 1, 157, 171, 217, 262 SAN Volume Controller xiii, 1, 15, 22, 33, 119, 169, 257, 262 multipathing 194 SAN zoning 122, 220, 243, 260 scalability 2, 21, 289 scalable 1, 22 scale 23, 112, 289 scaling 58, 112, 283 scan 179 scripts 125, 182 SCSI 67, 121, 178, 274 commands 184, 274 SCSI disk 192 SCSI-3 184 SDD xiii, 15, 62, 89, 118, 146, 170, 189, 211, 247, 264 SDD for Linux 194 SDDDSM 173, 264 secondary 25, 39, 125, 145, 203 secondary site 25, 145 secure 50 Secure Shell 46 Security 40 security 12, 40, 190, 247 segment 69 separate zone 18 sequence 27, 42 sequential 29, 87, 97, 115, 145, 170, 203, 241, 289 sequential policy 115 serial number 64, 172, 242, 274 serial numbers 173 Server 34, 62, 191, 212, 216, 243, 265 server 3, 22, 69, 133, 151, 178, 202, 227, 240, 255, 271 Servers 192, 203 servers 3, 23, 34, 190, 201, 244, 256 service 31, 34, 56, 80, 88, 144, 212, 230, 257, 260 settings 39, 164, 186, 202, 262 setup 34, 186, 207, 215, 254, 262 share 20, 74, 88, 101, 139, 171, 209 shared 12, 162, 185, 210, 218 sharing 7, 138, 184, 203 shutdown 56, 117, 157, 179, 256 single storage device 177 site 25, 58, 67, 125, 145, 196, 228, 244, 290 slice 209 Index
R
RAID 70, 88, 119, 162, 204, 241, 289 RAID array 130, 164, 205 RAID arrays 130, 205 RAID types 205 ranges 130 RDAC 62, 89 Read cache 202 reboot 117, 182 rebooted 191 receive 95, 231, 249 recovery 25, 51, 66, 92, 119, 152, 170, 275 recovery point 159 Redbooks Web site 292 Contact us xv redundancy 2, 39, 62, 112, 159, 171, 218, 262 redundant 22, 39, 74, 159, 171, 212, 224, 260 redundant paths 171 redundant SAN 74 registry 179, 266 relationship 20, 62, 116, 144, 191, 221, 277 reliability 15, 89, 237, 284 remote cluster 31, 158, 221 remote copy 125, 147 remote mirroring 11 remotely 34 remount 129 removed 20, 59, 117, 179 rename 38, 157, 267 replicate 145 replication 144, 246, 258 reporting 80, 131, 235, 263 reports 133, 179, 215 reset 36, 178, 247, 260 resources 22, 78, 93, 98, 125, 160, 170, 209, 229, 281, 286 restart 35, 157, 256 restarting 159 restarts 178 restore 53, 166, 192 restricting access 184 rights 40 risk 78, 89, 98, 146, 257 role 40, 203 roles 40, 205 root 133, 185, 235, 248, 274 round 93, 158, 209 round-robin 94
301
slot number 19, 248 slots 70 snapshot 156, 241 SNIA xiii SNMP 34, 235 Software xiii, 1, 15, 180, 246, 257, 260 software xiii, 2, 34, 56, 146, 170, 212, 237, 257, 260, 286 Solaris 194, 265 solution 1, 34, 88, 130, 166, 202, 238, 256 solutions xiii, 113, 238 source 12, 23, 56, 67, 117, 144, 194, 227, 254 sources 147, 256 space 82, 87, 114, 147, 204, 245, 255 spare 3, 26, 70, 87, 255 speed 5, 22, 130, 162, 241, 271 speeds 11, 83, 130, 254, 284 split 6, 21, 73, 137, 147 SSH 35, 217 SSH keys 38 standards 10, 216, 254 start 11, 23, 35, 81, 140, 144, 172, 209, 216, 260 state 36, 63, 119, 144, 170, 247, 282 synchronized 162 statistics 87, 163, 198, 228 statistics collection 163 status 38, 66, 117, 148, 186, 216, 240, 261 storage xiii, 1, 22, 34, 61, 85, 97, 114, 145, 169, 201, 215, 237, 254, 260, 284 storage controller 14, 22, 61, 86, 99, 125, 145, 218, 276 storage controllers 14, 22, 63, 88, 102, 139, 145, 216, 280 Storage Manager 70, 166, 270 Storage Networking Industry Association xiii storage performance 81, 129, 233 storage traffic 2 streaming 111, 122, 203 strip 208 Strip Size Considerations 208 strip sizes 208 stripe 73, 98, 206 striped 28, 93, 115, 145, 179, 204 striped mode 150, 205 striped mode VDisks 207 striped VDisk 148 stripes 208 Striping 97 striping 22, 68, 89, 105, 115, 205, 209, 283 subnet 34 subnet mask 40 Subsystem Device Driver 62, 89, 118, 146, 174, 189, 249, 265 superuser 50 support xiii, 23, 34, 62, 89, 204, 238, 255, 284 surviving node 29, 56 SVC xiii, 1, 21, 33, 56, 62, 86, 98, 114, 144, 169, 204, 215, 237, 254, 259, 283 SVC cluster 3, 21, 35, 64, 86, 99, 144, 176, 216, 268 SVC configuration 52, 171, 242, 258, 279, 290 SVC installations 5, 101, 254 SVC master console 50, 152
SVC node 14, 25, 56, 146, 171, 217, 260 SVC nodes 8, 22, 58, 75, 129, 154, 171, 221, 275, 284 SVC software 172, 260 svcinfo 40, 66, 89, 117, 148, 172, 261 svcinfo lsmigrate 89 svctask 40, 63, 89, 114, 157, 195, 268 svctask dumpinternallog 47 svctask finderr 47 switch 2, 52, 57, 133, 163, 169, 217, 239, 254, 263 fabric 3, 247 failure 3, 198 interoperability 10 switch fabric 2, 171, 222 switch ports 9, 219 switches 2, 46, 57, 159, 215, 239, 254, 260, 286 Symmetrix 61 synchronization 159, 277 Synchronized 162 synchronized 144 system 26, 35, 81, 112, 114, 147, 169, 203, 225, 248, 264, 283 system performance 116, 194, 281
T
T0 155 tablespace 204, 209 tape 3, 165, 171 target 23, 56, 63, 86, 116, 144, 171, 227, 275 target ports 75, 171 targets 23, 147, 181 tasks 41, 235, 246 test 2, 27, 38, 57, 89, 105, 117, 158, 169, 211, 230 tested 23, 89, 158, 170, 212, 246, 257, 262, 290 This xiii, 1, 21, 35, 56, 62, 85, 97, 114, 146, 169, 201, 215, 238, 253, 260, 287 thread 182, 208 threshold 3, 144, 235 thresholds 129 throttle 122, 194 throttles 122 throughput 24, 57, 69, 88, 101, 130, 151, 177, 188, 202204, 231, 284 throughput based 202203 tier 88, 101 time 2, 23, 39, 57, 62, 92, 99, 114, 144, 170, 202, 217, 241, 255, 259, 284 Tivoli xiii, 166, 235 Tivoli Storage Manager (TSM) 203 tools 34, 169, 238, 262 Topology 133, 217, 263 topology 2, 217, 263 traditional 12 traffic 2, 57, 158, 177, 229 congestion 3 Fibre Channel 10 transaction 68, 152, 187, 202 transaction based 202203 Transaction log 204 transceivers 11, 254 transfer 62, 85, 122, 170, 202
302
transit 2 traps 34 trends 81 trigger 15, 242 troubleshooting 13, 169, 245 TSM xiii, 208 tuning 136, 169, 213
WWNN 12, 63, 180, 250, 267 WWNs 13 WWPN 12, 26, 61, 86, 217, 241, 261 WWPNs 13, 76, 171, 251, 261
Z
zone 8, 157, 171, 220, 262 zone name 19 zoned 2, 171, 221, 241, 275 zones 12, 157, 220, 241, 262 zoneset 18, 221, 279 Zoning 12, 241 zoning 7, 25, 76, 122, 171, 220, 241, 260 zoning configuration 12, 225, 241 zSeries 112
U
UID 90, 121, 280 unique identifier 77, 172 UNIX 152, 198 Unmanaged MDisk 146 unmanaged MDisk 119 unmap 116 unused space 114 upgrade 30, 56, 163, 178, 236, 246, 273, 287 upgrades 56, 178, 246, 274 upgrading 31, 35, 58, 184, 231, 248, 277, 286 upstream 2, 235 URL 34 users 4, 22, 40, 131, 179, 218, 257 using SDD 146, 189, 250 utility 89, 199
V
VDisk 15, 23, 52, 56, 61, 85, 100, 113, 144, 172, 204, 225, 241, 261 creating 89 migrating 119 modifying 137 showing 133 VDisk extents 66 VDisk migration 67 VIO clients 212 VIO server 191, 212 VIOC 191, 212 VIOS 191, 212 virtual disk 29, 95, 121, 192 Virtualization xiii virtualization 21, 78, 115, 204, 259 virtualized storage 27 virtualizing 7, 179 volume abstraction 205 volume group 75, 189 VSAN 7 VSANs 2 VSCSI 191, 212
W
Windows 2003 27, 34, 193 workload 25, 41, 56, 62, 93, 99, 114, 158, 186, 202203, 229, 287 throughput based 202 transaction based 202 workload type 203 workloads 3, 69, 88, 100, 125, 158, 170, 202, 286 writes 29, 56, 69, 87, 102, 129, 144, 170, 204, 229, 287
Index
303
304
Back cover
Read about best practices learned from the field Learn about SVC performance advantages Fine-tune your SVC
This IBM Redbook captures some of the best practices based on field experience and details the performance gains that can be achieved by implementing the IBM System Storage SAN Volume Controller. This book is intended for very experienced storage, SAN, and SVC administrators and technicians. Readers are expected to have an advanced knowledge of the SVC and SAN environment, and we recommend these books as background reading: IBM System Storage SAN Volume Controller, SG24-6423 Introduction to Storage Area Networks, SG24-5470 Using the SVC for Business Continuity, SG24-7371