IBM® Edge2013 - Storage Migration Methods
IBM® Edge2013 - Storage Migration Methods
My objective is simple
To teach someone here today:
One new concept that will make you better Even better, if many of you learn a few new things today that will make your job easier
2
2
Knowledge is POWER!
As Systems Administrators we dont always KNOW what we dont know about storage
Ask for storage, leveraging what you know Avoid bottlenecks Use tools available Speed problem isolation Make more informed architectural decisions
As Storage Administrators we dont always KNOW how the storage will be utilized
Make more informed architectural decisions Ask what is needed for best performance and IO separation
Problem statement
Storage and System Administrators often clash in the common goal to achieve data performance and availability, leading to:
Too many logical configuration related outages Performance related enhancements not working to specification Leading causes:
Lack of understanding configurations No cohesiveness between the logical and physical implementations Lack of communication between System and Storage Administrators
Resulting in:
A lack of data reliability and IO throughput
A deeper dive
Physical to logical makeup
NOSaying SAN Volume Controller doesn't count! What is SVC? SVC Provides Flexibility Across Entire Storage Infrastructure
SAN
Apply copy services across the storage pool
Storage Subsystems
DS8000 15K rpm
8 2013 IBM Corporation
HDS
DS4000
EMC
HP
SATA
RAID 5
RAID 1
JBOD
Combine the capacity from multiple arrays on frames into storage pools
Volume
vdisk0 125GB
vdisk1 10GB
vdisk2 525GB
vdisk3 1500GB
vdisk4 275GB
vdisk5 5GB
Storage Pool
Stripe 16MB 2GB
Managed Disk
mdisk0 100GB
mdisk1 100GB
mdisk2 100GB
mdisk3 100GB
mdisk4 200GB
mdisk5 200GB
mdisk6 200GB
LUN
EMC 100GB
EMC 100GB
EMC 100GB
EMC 100GB
IBM 200GB
IBM 200GB
IBM 200GB
vdisk1
vdisk2
Preferred path for vdisk1 is SVC N1P2 & N1P3 Non Preferred path for vdisk1 is SVC N2P2 &N2P3
Preferred path for vdisk2 is SVC N2P2 & N2P3 Non Preferred path for vdisk2 is SVC N1P2 &N1P3
vdisk1
vdisk2
v d i s k 3
v d i s k 4
10
Familiar Layout
Balance
Just like an onion -virtualization has many layers
A
12 2013 IBM Corporation
14
a. Seek time- time to position the read/write head b. Rotational delay- time waiting for disk to spin to proper starting point c. Transfer time
RAID-5 6+P
b) Logical-disk1 would provide greater sequential throughput since it is on the outer edge of the disks.
EXT EXT 3 3 EXT LV3 3 EXT 4 EXT LV4 4
EXT LV 1 1
EXT 2 LV 2
15
16
Interface
Interface
Interface
Switching
Data Module 1
Data Module 2
Data Module 3
Data 4 NodeModule 4
[ hardware upgrade ]
18
2013 IBM Corporation
The fact that distribution is full and automatic ensures that all spindles join the effort of data [re-distribution after hardware failure ] configuration change. Tremendous performance gains are seen in recovery/optimization times thanks Data Module 2 Data Module 1 to this fact.
Data Module 3
Data Module 4
19
Be aware of Queue Depth when planning system layout, adjust only if necessary Queue Depth is central to the following fundamental performance formula:
IO Rate = Number Commands * Response Time per Command To calculate - best of thing to do is go to each device Information Center URLs listed in link slide For example: IO Rate = 32 Commands per Second / .01 Seconds (10 milliseconds) per Command = 3200 IOPs What are the default Queue Depths? ___ Some real-world examples: rates OS=Default Queue Depth= Expected IO Rate HBA transfer AIX Standalone = 16 per LUN = 1600 IOPs per LUN FC adapters AIX VIOS = 20 per LUN = 2000 IOPs per LUN LVM striping vs spreading AIX VIOC = 3 per LUN = 300 IOPs per LUN Windows = 32 per Disk = 3200 IOPS per LUN Data Placement Random versus sequential Spreading versus Isolation
Source: Queue Depth content provided by Mark Chitti 20 2013 IBM Corporation
Should you ever isolate data to specific hardware resources? Name a circumstance! Isolation
In some cases more isolation on dedicated resources may produce better I/O throughput by eliminating I/O contention Separate FlashCopy Source and Target LUNs on isolated spindles
Slide Provided by Dan Braden
21 2013 IBM Corporation
Most commonly when workloads peak at the same time or log files and data files share physical spindles
22 2013 IBM Corporation
Striped Pools
No Host Stripe
Sequential Pools
Host Stripe
S t r i p e
Host Stripe
2
3 4 5
RAID array LUN or logical disk PV
datavg
# mklv lv1 e x hdisk1 hdisk2 hdisk5 # mklv lv2 e x hdisk3 hdisk1 . hdisk4 .. Use a random order for the hdisks for each LV
Create VGs with one LUN per array Create LVs that are spread across all PVs in the VG using a PP or LV strip size >= a full stripe on the RAID array Do application IOs equal to, or a multiple of, a full stripe on the RAID array
Avoid LV Striping
Reason: Cant dynamically change the stripe width for LV striping
Use PP Striping
Reason: Can dynamically change the stripe width for PP striping
Source: Redbook SG24-6422-00 IBM 800 Performance Monitoring and Tuning Guide 26 2013 IBM Corporation
Good data layout avoids dealing with disk hot spots An ongoing management issue and cost Data layout must be planned in advance
Changes are generally painful
iostat might and filemon can show unbalanced IO Best practice: evenly balance IOs across all physical disks unless TIERING Random IO best practice:
Spread IOs evenly across all physical disks unless dedicated resources are needed to isolate specific performance sensitive data
For disk subsystems
Create RAID arrays of equal size and type Create VGs with one LUN from every array Spread all LVs across all PVs in the VG
27
Spreading
Isolation
28
Use the SVCQTOOL listed under the tools section of this slide How do I achieve SVC node to Server Balance? deck to produce a spread sheet similar to this Or
Use the script found in the speaker notes of this slide Add a column for preferred node to host client
Are there any automated storage inquiry tools out there that will help me understand my setup?
Storage tools Gathers information such as, but not limited to: LUN layout LUN to Host mapping Storage Pool maps Fabric connectivity DS8QTOOL Go to the following Website to download the tool: http://congsa.ibm.com/~dlutz/public/ds8qtool/index.htm
30
31
Troubleshooting Whats the most common thing that changes over time?
Pool 1 Rank
Pool 2 Rank
Depending on the work load characteristics, isolating the workload may prove to be more Vdisk1 beneficial and out perform a larger array. App A
Rank
There are 3 important principles for creating a logical configuration for the Storage Pools to optimize performance: Map Host A
Workload isolation Workload resource-sharing Workload spreading Rank
Some examples of I/O workloads or files/datasets which may have heavy and continuous I/O access patterns are:
Sequential workloads (especially those with large blocksize transfers) Rank Vdisk2 Log files or datasets
Sort/work datasets or files Business Intelligence and Data Mining
Rank
Map
App B
Disk copies (including Point in Time Copy background copies, remote mirroring target volumes, and tape simulation on disk) Video/imaging applications Engineering/scientific applications Rank Certain batch workloads
Host B
Data Migration
Rank
I always separate Log files from Data files for best performance.
Apps sharing the same physical spindles on traditional arrays may peak at the same time
32 2013 IBM Corporation
34
Summary
Knowing - what's inside will help you make informed decisions? You should make a list of the things you dont know
Talk to the Storage Administrator or those who do know
5. Know where to go to get right device drivers 6. Know why documentation matters
7. Keep Topology Diagrams 8. keep Disk Mapping documentation 9. Be able to use Storage Inquiry Tools to find answers 10. Understand how to troubleshoot storage performance bottlenecks
2013 IBM Corporation
35
35
Evaluate
Validate
Migration Process
Plan
Execute
36 2013 IBM Corporation
Evaluate
37
Migrating data is always a disruptive process. Whatever the migration technique used, it always affects to some degree the normal operations of the system.
Selecting the appropriate technique depends on: The criticality of the data being moved The resources available Other business constraints and requirements. Note: Risks should be identified depending on the migration technique used. We strongly recommend that you consider selecting the technique that is the best compromise between efficiency and the least impact to the system users.
38
Pros
Generally lowest initial implementation cost Leverages existing and IP network LVM or LDM tools available Storage device-agnostic Leverages existing Operating System skills Migration can happen on-line during peak hours
Cons
Consumes host resources Operating system specific Management can become complex and time consuming Each host is its own island no central management console May cause an initial outage to install the utility or software if it is not already existing on the host Higher initial cost due to hardware & replication software Requires proprietary hardware and may require implementation of Storage Requires an initial outage to bring the host volumes on-line to SVC Requires the host to reboot to load or upgrade the multipathing drivers Requires the disruption of the applications and down time Slow and cumbersome
Network-based
Fabric TDMF-IP
Supports heterogeneous environments servers and storage Single point of management for replication services
Application-based
SVC
Migration can happen on-line during peak hours Supports heterogeneous environments servers and storage Single point of management for migration Does not require additional special tools, software or hardware Does not require additional skills or training
39
Capability
Performance
Primary volume/ Source data protection Implement tiered storage Multi-vendor environments
TDMF SVC
TDMF SVC with limitations LVM / LDM Tape based TDMF SVC LVM / LDM TDMF SVC LVM / LDM with possible restrictions Fabric Tape based All with limits TDMF SVC LVM / LDM Fabric
Application downtime
Applications have different levels of business criticality and therefore have varying degrees of acceptable downtime
40
Dont
Span multiple storage frames in one LV Avoid LV Striping Reason: Cant dynamically change the stripe width for LV striping Avoid pladding Striping on Striping Using the same spindles for data and logs
42
Plan
43
Planning phase
44
This functionality can be used for: Redistribution of LVs and their workload within or across back-end storage Moving workload onto newly installed storage subsystems Moving workload off of storage so that old/failing storage subsystems can be decommissioned. Moving workload to re-balance a changed workload Migrating data from legacy back-end storage to newer managed storage.
45
Plan
46
Assigned
Status
Date
Announce the migration at least 30 days prior to the intended target migration date
Gather information about the storage server environment and applications (lists, commands, scripts and/or drawings) Schedule a pre-migration rehearsal that includes all the members on the migration team and a data sampling that will enable the application groups to appropriately conduct the pre- and post migration verification process Establish a Migration Status call-in process
47
Utilize a Migration Planning Checklist to assure that all of the pre migration planning steps have been executed 2013 IBM Corporation
Some examples of application availability may be but are not limited to the following list:
Month/end /quarterly processes FlashCopy or Metro/Global mirror/copy running processes and their time restrictions. Database/application refreshes
49
Left Controller(1)
CHA P
1P 1Q
ABCD
E F GH
Right Controller(2)
CHA P CHA P
Current Layout
2V 2W
ABCD
E F GH
Left Controller(1)
CHA P
1P 1Q
ABCD
E F GH
Right Controller(2)
CHA P CHA P
2V 2W
ABCD
E F GH
CHA P
1R 1S
J KLM
N PQR
CHA P
1R 1S
J KLM
N PQR
2X 2Y
J K LM
N PQR
2X 2Y
J K LM
N PQR
50060E80039C6202
50060E80039C6212
8
50060E80039C6208
48
52
53
Frame 1
28
46
47
50060E80039C6218
Brocade 1
46 19 35 62 Port
Brocade 2
SAN
42
06
38
21 Port
01 02 03 04 05 06 07 08 01 02 03 04 05 06 07 08
fcs3 3v-08
10000000C92D1A97
fcs0 5V-08
10000000C93D6ADA
fcs2 3p-08
10000000C93830A5
fcs1 5b-08
10000000C93D72C3
P1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4
SVC
HBA 1
HBA 2
HBA 1
HBA 2
HBA 1
HBA 2
HBA 1
HBA 2
Node 1
Node 2
Node 3
Node 4
50
Topology
Speed of network
51
52
Write and configure any automation scripts that will speed up the process
Run a simple initial test plan that validates the migration process Implement the migration procedures and time line built in the design phase Verify the migration completion by checking the successful completion and status of the migration jobs
53 2013 IBM Corporation
Execute
54
Execute
55
Execute
An example migration may go as follows: This high level illustration is the execution migratepv l
56
Identify the ESS source and DS8000 targeted LUNs on the host server
Identify the sizes of the DS8000 target LUNs Move the DS8000 LUNs into the VGs appropriately Verify the DS8000 LUNs are added to the VG Identify the logical volumes (LVs) to migrate Copy LV data from the ESS source LUNs to the DS8000 target LUNs
lsvg -p vg_name
rmdev -dl rmdev -dl
Lsdev Cc disk Verify the device definitions are are removed In the ESS, unassign the LUNs from the host server
Source: If applicable, describe source origin 57 2013 IBM Corporation
lsvpcfg
chvolgrp dev
Identify and assign the DS8000 LUNs to the targeted AIX host server
Identify the ESS source and DS8000 targeted LUNs on the host server
bootinfo -s
extendvg
lsvg -l
lsvg -p lslv -l lv_name mklv -y lvdummy
mklvcopy
lslv-l
Copy LV data from the ESS source LUNs to the DS8000 target LUNs
Verify the lv copies are made
syncvg
lsvg -l
Syncronize the LV data from the ESS source LUNs to the DS8000 target LUNs
Verify that the sync isn't showing stale, it should show as syncd
58
Command What these commands do lslv -l Verify the source and target LUNs for each lv
rmlvcopy
lsvg -p reducevg rmdev -dl hdisk#
59
Validate
60
Validate
It is important to validate that you have the same data and functionality of the application after the migration You should make sure that the application runs with the new LUNs, that performance is still adequate, that operations and scripts work with the new system
61
A sample validation list may include but not be limited to the following items:
Compile migration statistics Prepare a report to highlight: What worked What didnt work Lessons learned Share the report with all members of the migration team
These types of reports are critical in building a repeatable and consistent process through continuous process improvement, building on what worked and fixing or changing what didnt work. Further, documenting the migration process can help you train your staff, and simplify or streamline the next migration you do, reducing both expense and risk
62
Overall Summary
63
Plan
Determine migration requirements Identify existing environment Define future environment Create migration plan Develop design requirements Migration types Create migration architecture Develop test plan
Execute
Obtain software tools and licenses Communicate deployment plan Validate HW & SW requirements Customize Migration procedures Install & configure Run pre-validation test Perform migration Verify migration completion
Validate
Run Post validation test Perform knowledge transfer Communicate Project information Create report on migration statistics Conduct migration close out meeting
Analyze business impact Risks Business interviews Criticality of data being moved Performance Migration types Key factors Multi-vendor environment requirements Application down time
64
Thank you!
66
67