VSP 40 Availability
VSP 40 Availability
This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent editions of this document, see http://www.vmware.com/support/pubs.
EN-000108-03
You can find the most up-to-date technical documentation on the VMware Web site at: http://www.vmware.com/support/ The VMware Web site also provides the latest product updates. If you have comments about this documentation, submit your feedback to: [email protected]
Copyright 20092011 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies.
VMware, Inc.
Contents
Reducing Planned Downtime 9 Preventing Unplanned Downtime 10 VMware HA Provides Rapid Recovery from Outages 10 VMware Fault Tolerance Provides Continuous Availability 11
How Fault Tolerance Works 31 Fault Tolerance Use Cases 32 Fault Tolerance Configuration Requirements 33 Fault Tolerance Interoperability 34 Preparing Your Cluster and Hosts for Fault Tolerance 35 Turning On Fault Tolerance for Virtual Machines 37 Viewing Information About Fault Tolerant Virtual Machines 39 Fault Tolerance Best Practices 40 VMware Fault Tolerance Configuration Recommendations 41 Troubleshooting Fault Tolerance 42
VMware, Inc.
VMware, Inc.
Updated Information
This vSphere Availability Guide is updated with each release of the product or when necessary. This table provides the update history of the vSphere Availability Guide.
Revision EN-000108-03 EN-000108-02 Description Edited note in Creating a VMware HA Cluster, on page 21 to indicate that automatic startup is not supported when used with VMware HA. The section Failure Detection and Host Network Isolation, on page 14 is updated to change the sentence "By default, the isolated host leaves its virtual machines powered on, but you can change the host isolation response to Shut Down VM or Power Off VM." to "By default, the isolated host shuts down its virtual machines, but you can change the host isolation response to Leave powered on or Power off." Added new information on VMXNET3 driver and Paravirtualized SCSI (PVSCSI) adapter in Table 3-1 in the topic Other Features Incompatible with Fault Tolerance, on page 34. Initial release.
EN-000108-01 EN-000108-00
VMware, Inc.
VMware, Inc.
The vSphere Availability Guide describes solutions that provide business continuity, including how to establish VMware High Availability (HA) and VMware Fault Tolerance.
Intended Audience
This book is for anyone who wants to provide business continuity through the VMware HA and Fault Tolerance solutions. The information in this book is for experienced Windows or Linux system administrators who are familiar with virtual machine technology and datacenter operations.
Document Feedback
VMware welcomes your suggestions for improving our documentation. If you have comments, send your feedback to [email protected].
vSphere Documentation
The vSphere documentation consists of the combined VMware vCenter Server and ESX/ESXi documentation set. The vSphere Availability Guide covers ESX , ESXi, and vCenter Server.
VMware, Inc.
Services provides offerings to help you assess, plan, build, and manage your virtual environment. To access information about education classes, certification programs, and consulting services, go to http://www.vmware.com/services.
VMware, Inc.
Downtime, whether planned or unplanned, brings with it considerable costs. However, solutions to ensure higher levels of availability have traditionally been costly, hard to implement, and difficult to manage. VMware software makes it simpler and less expensive to provide higher levels of availability for important applications. With vSphere, organizations can easily increase the baseline level of availability provided for all applications as well as provide higher levels of availability more easily and cost effectively. With vSphere, you can:
n n n
Provide higher availability independent of hardware, operating system, and applications. Eliminate planned downtime for common maintenance operations. Provide automatic restart in cases of failure.
vSphere makes it possible to reduce planned downtime, prevent unplanned downtime, and recover rapidly from outages. This chapter includes the following topics:
n n n n
Reducing Planned Downtime, on page 9 Preventing Unplanned Downtime, on page 10 VMware HA Provides Rapid Recovery from Outages, on page 10 VMware Fault Tolerance Provides Continuous Availability, on page 11
Eliminate downtime for common maintenance operations. Eliminate planned maintenance windows. Perform maintenance at any time without disrupting users and services.
VMware, Inc.
The VMotion and Storage VMotion functionality in vSphere makes it possible for organizations to dramatically reduce planned downtime because workloads in a VMware environment can be dynamically moved to different physical servers or to different underlying storage without service interruption. Administrators can perform faster and completely transparent maintenance operations, without being forced to schedule inconvenient maintenance windows.
Shared storage. Eliminate single points of failure by storing virtual machine files on shared storage, such as Fibre Channel or iSCSI SAN, or NAS. The use of SAN mirroring and replication features can be used to keep updated copies of virtual disk at disaster recovery sites. Network interface teaming. Provide tolerance of individual network card failures. Storage multipathing. Tolerate storage path failures.
n n
In addition to these capabilities, the VMware HA and Fault Tolerance features can minimize or eliminate unplanned downtime by providing rapid recovery from outages and continuous availability, respectively.
It protects against a server failure by automatically restarting the virtual machines on other hosts within the cluster. It protects against application failure by continuously monitoring a virtual machine and resetting it in the event that a failure is detected.
Unlike other clustering solutions, VMware HA provides the infrastructure to protect all workloads with the infrastructure:
n
No special software needs to be installed within the application or virtual machine. All workloads are protected by VMware HA. After VMware HA is configured, no actions are required to protect new virtual machines. They are automatically protected. VMware HA can be combined with VMware Distributed Resource Scheduler (DRS) not only to protect against failures but also to provide load balancing across the hosts within a cluster.
10
VMware, Inc.
VMware HA has a number of advantages over traditional failover solutions: Minimal setup Reduced hardware cost and setup After a VMware HA cluster is set up, all virtual machines in the cluster get failover support without additional configuration. The virtual machine acts as a portable container for the applications and it can be moved among hosts. Administrators avoid duplicate configurations on multiple machines. When you use VMware HA, you must have sufficient resources to fail over the number of hosts you want to protect with VMware HA. However, the vCenter Server system automatically manages resources and configures clusters. Any application running inside a virtual machine has access to increased availability. Because the virtual machine can recover from hardware failure, all applications that start at boot have increased availability without increased computing needs, even if the application is not itself a clustered application. By monitoring and responding to VMware Tools heartbeats and resetting nonresponsive virtual machines, it also protects against guest operating system crashes. If a host fails and virtual machines are restarted on other hosts, DRS can provide migration recommendations or migrate virtual machines for balanced resource allocation. If one or both of the source and destination hosts of a migration fail, VMware HA can help recover from that failure.
VMware, Inc.
11
logging traffic
12
VMware, Inc.
VMware HA clusters enable a collection of ESX/ESXi hosts to work together so that, as a group, they provide higher levels of availability for virtual machines than each ESX/ESXi host could provide individually. When you plan the creation and usage of a new VMware HA cluster, the options you select affect the way that cluster responds to failures of hosts or virtual machines. Before creating a VMware HA cluster, you should be aware of how VMware HA identifies host failures and isolation and responds to these situations. You also should know how admission control works so that you can choose the policy that best fits your failover needs. After a cluster has been established, you can customize its behavior with advanced attributes and optimize its performance by following recommended best practices. This chapter includes the following topics:
n n n n n
How VMware HA Works, on page 13 VMware HA Admission Control, on page 15 Creating a VMware HA Cluster, on page 21 Customizing VMware HA Behavior, on page 25 Best Practices for VMware HA Clusters, on page 27
VMware, Inc.
13
One of the primary hosts is also designated as the active primary host and its responsibilities include:
n n n
Deciding where to restart virtual machines. Keeping track of failed restart attempts. Determining when it is appropriate to keep trying to restart a virtual machine.
If the active primary host fails, another primary host replaces it.
14
VMware, Inc.
When VMware HA admission control is disabled, failover resource constraints are not passed on to DRS and VMware Distributed Power Management (DPM). The constraints are not enforced.
n
DRS does evacuate virtual machines from hosts and place the hosts in maintenance mode or standby mode regardless of the impact this might have on failover requirements. VMware DPM does power off hosts (place them in standby mode) even if doing so violates failover requirements.
Admission control imposes constraints on resource usage and any action that would violate these constraints is not permitted. Examples of actions that could be disallowed include:
n n n
Powering on a virtual machine. Migrating a virtual machine onto a host or into a cluster or resource pool. Increasing the CPU or memory reservation of a virtual machine.
Of the three types of admission control, only VMware HA admission control can be disabled. However, without it there is no assurance that all virtual machines in the cluster can be restarted after a host failure. VMware recommends that you do not disable admission control, but you might need to do so temporarily, for the following reasons:
n
If you need to violate the failover constraints when there are not enough resources to support them (for example, if you are placing hosts in standby mode to test them for use with DPM). If an automated process needs to take actions that might temporarily violate the failover constraints (for example, as part of an upgrade directed by VMware Update Manager). If you need to perform testing or maintenance operations.
VMware, Inc.
15
Determines the Current Failover Capacity of the cluster. This is the number of hosts that can fail and still leave enough slots to satisfy all of the powered-on virtual machines.
Determines whether the Current Failover Capacity is less than the Configured Failover Capacity (provided by the user). If it is, admission control disallows the operation.
NOTE The maximum Configured Failover Capacity that you can set is four. Each cluster has up to five primary hosts and if all fail simultaneously, failover of all hosts might not be successful.
The CPU component by obtaining the CPU reservation of each powered-on virtual machine and selecting the largest value. If you have not specified a CPU reservation for a virtual machine, it is assigned a default value of 256 MHz (this value can be changed using the das.vmCpuMinMHz advanced attribute.) The memory component by obtaining the memory reservation (plus memory overhead) of each poweredon virtual machine and selecting the largest value.
If your cluster contains any virtual machines that have much larger reservations than the others, they will distort slot size calculation. To avoid this, you can specify an upper bound for the CPU or memory component of the slot size by using the das.slotCpuInMHz or das.slotMemInMB advanced attributes, respectively. When using these advanced attributes, there is a risk of resource fragmentation where virtual machines larger than the slot size are assigned multiple slots. In a cluster that is close to capacity, there might be enough slots in aggregate for a virtual machine to be failed over. However, those slots could be located on multiple hosts and are unusable by a virtual machine assigned multiple slots because a virtual machine can run on only a single ESX/ESXi host at a time.
Slot size. Total slots in cluster. The sum of the slots supported by the good hosts in the cluster. Used slots. The number of slots assigned to powered-on virtual machines. It can be more than the number of powered-on virtual machines if you have defined an upper bound for the slot size using the advanced options.
16
VMware, Inc.
Available slots. The number of slots available to power on additional virtual machines in the cluster. VMware HA automatically reserves the required number of slots for failover. The remaining slots are available to power on new virtual machines. Total powered on VMs in cluster. Total hosts in cluster. Total good hosts in cluster. The number of hosts that are connected, not in maintenance mode, and have no VMware HA errors.
n n n
The cluster is comprised of three hosts, each with a different amount of available CPU and memory resources. The first host (H1) has 9GHz of available CPU resources and 9GB of available memory, while Host 2 (H2) has 9GHz and 6GB and Host 3 (H3) has 6GHz and 6GB. There are five powered-on virtual machines in the cluster with differing CPU and memory requirements. VM1 needs 2GHz of CPU resources and 1GB of memory, while VM2 needs 2GHz and 1GB, VM3 needs 1GHz and 2GB, VM4 needs 1GHz and 1GB, and VM5 needs 1GHz and 1GB. The Host Failures Cluster Tolerates is set to one.
VMware, Inc.
17
Figure 2-1. Admission Control Example with Host Failures Cluster Tolerates Policy
18
VMware, Inc.
It uses the actual reservations of the virtual machines. If a virtual machine does not have reservations, meaning that the reservation is 0, a default of 0MB memory and 256MHz CPU is applied. This is controlled by the same HA advanced options used for the failover level policy.
The CPU component by summing the CPU reservations of the powered-on virtual machines. If you have not specified a CPU reservation for a virtual machine, it is assigned a default value of 256 MHz (this value can be changed using the das.vmCpuMinMHz advanced attribute.) The memory component by summing the memory reservation (plus memory overhead) of each poweredon virtual machine.
The total host resources available for virtual machines is calculated by summing the hosts' CPU and memory resources. These amounts are those contained in the host's root resource pool, not the total physical resources of the host. Resources being used for virtualization purposes are not included. Only hosts that are connected, not in maintenance mode, and have no VMware HA errors are considered. The Current CPU Failover Capacity is computed by subtracting the total CPU resource requirements from the total host CPU resources and dividing the result by the total host CPU resources. The Current Memory Failover Capacity is calculated similarly.
The cluster is comprised of three hosts, each with a different amount of available CPU and memory resources. The first host (H1) has 9GHz of available CPU resources and 9GB of available memory, while Host 2 (H2) has 9GHz and 6GB and Host 3 (H3) has 6GHz and 6GB. There are five powered-on virtual machines in the cluster with differing CPU and memory requirements. VM1 needs 2GHz of CPU resources and 1GB of memory, while VM2 needs 2GHz and 1GB, VM3 needs 1GHz and 2GB, VM4 needs 1GHz and 1GB, and VM5 needs 1GHz and 1GB. The Configured Failover Capacity is set to 25%.
Figure 2-2. Admission Control Example with Percentage of Cluster Resources Reserved Policy
total resource requirements 7GHz, 6GB H1 9GHz 9GB H2 9GHz 6GB H3 6GHz 6GB
VMware, Inc.
19
The total resource requirements for the powered-on virtual machines is 7GHz and 6GB. The total host resources available for virtual machines is 24GHz and 21GB. Based on this, the Current CPU Failover Capacity is 70% ((24GHz - 7GHz)/24GHz). Similarly, the Current Memory Failover Capacity is 71% ((21GB-6GB)/21GB). Because the cluster's Configured Failover Capacity is set to 25%, 45% of the cluster's total CPU resources and 46% of the cluster's memory resources are still available to power on additional virtual machines.
Green. The host is connected, not in maintenance mode, and has no VMware HA errors. Also, no poweredon virtual machines reside on the host. Yellow. The host is connected, not in maintenance mode, and has no VMware HA errors. However, powered-on virtual machines reside on the host. Red. The host is disconnected, in maintenance mode, or has VMware HA errors.
Heterogeneity of Cluster
Clusters can be heterogeneous in terms of virtual machine resource reservations and host total resource capacities. In a heterogeneous cluster, the Host Failures Cluster Tolerates policy can be too conservative because it only considers the largest virtual machine reservations when defining slot size and assumes the largest hosts fail when computing the Current Failover Capacity. The other two admission control policies are not affected by cluster heterogeneity.
20
VMware, Inc.
For ESX, set up redundant service console networking. For ESXi, set up redundant VMkernel networking.
For information about setting up network redundancy, see Network Path Redundancy, on page 29. Procedure 1 2 3 Select the Hosts & Clusters view. Right-click the Datacenter in the Inventory tree and click New Cluster. Complete the New Cluster wizard. Do not enable VMware HA (or DRS) at this time. 4 Click Finish to close the wizard and create the cluster. You have created an empty cluster.
VMware, Inc.
21
5 6
Based on your plan for the resources and networking architecture of the cluster, use the vSphere Client to add hosts to the cluster. Right-click the cluster and click Edit Settings. The cluster's Settings dialog box is where you can modify the VMware HA (and other) settings for the cluster.
7 8
On the Cluster Features page , select Turn On VMware HA. Configure the VMware HA settings as appropriate for your cluster.
n n n n
Cluster Features
The first panel in the New Cluster wizard allows you to specify basic options for the cluster. In this panel you can specify the cluster name and choose one or both cluster features. Name Turn On VMware HA Specifies the name of the cluster. This name appears in the vSphere Client inventory panel. You must specify a name to continue with cluster creation. If this check box is selected, virtual machines are restarted on another host in the cluster if a host fails. You must turn on VMware HA to enable VMware Fault Tolerance on any virtual machine in the cluster. If this check box is selected, DRS balances the load of virtual machines across the cluster. DRS also places and migrates virtual machines when they are protected with HA.
22
VMware, Inc.
Host failures cluster tolerates Percentage of cluster resources reserved as failover spare capacity Specify a failover host
NOTE See Choosing an Admission Control Policy, on page 20 for more information about how VMware HA admission control works.
VM Restart Priority
VM restart priority determines the relative order in which virtual machines are restarted after a host failure. Such virtual machines are restarted sequentially on new hosts, with the highest priority virtual machines first and continuing to those with lower priority until all virtual machines are restarted or no more cluster resources are available. If the number of hosts failures or virtual machines restarts exceeds what admission control permits, the virtual machines with lower priority might not be restarted until more resources become available. Virtual machines are restarted on the failover host, if one is specified, or on the host with the highest percentage of available resources. The values for this setting are: Disabled, Low, Medium (the default), and High. If Disabled is selected, VMware HA is disabled for the virtual machine, meaning that it is not restarted on other ESX/ESXi hosts if its ESX/ESXi host fails. If Disabled is selected, this does not affect virtual machine monitoring, which means that if a virtual machine fails on a host that is functioning properly, that virtual machine is reset on that same host. You can change this property for individual virtual machines. The restart priority settings for virtual machines vary depending on user needs. VMware recommends that you assign higher restart priority to the virtual machines that provide the most important services.
VMware, Inc.
23
For example, in the case of a multitier application you might rank assignments according to functions hosted on the virtual machines.
n n n
High. Database servers that will provide data for applications. Medium. Application servers that consume data in the database and provide results on web pages. Low. Web servers that receive user requests, pass queries to application servers, and return results to users.
VM Monitoring
VM Monitoring restarts individual virtual machines if their VMware Tools heartbeats are not received within a set time. You can configure the degree to which VMware HA is sensitive to such non-responsiveness. If you select Enable VM Monitoring, the VM Monitoring service (using VMware Tools) evaluates whether each virtual machine in the cluster is running by checking for regular heartbeats from the VMware Tools process running inside the guest. If no heartbeats are received, this is most likely because the guest operating system has failed or VMware Tools is not being allocated any time to complete tasks. In such a case, the VM Monitoring service determines that the virtual machine has failed and the virtual machine is rebooted to restore service. You can also configure the level of monitoring sensitivity. Highly sensitive monitoring results in a more rapid conclusion that a failure has occurred. While unlikely, highly sensitive monitoring might lead to falsely identifying failures when the virtual machine in question is actually still working, but heartbeats have not been received due to factors such as resource constraints. Low sensitivity monitoring results in longer interruptions in service between actual failures and virtual machines being reset. Select an option that is an effective compromise for your needs. After failures are detected, VMware HA resets virtual machines. This helps ensure that services remain available. To avoid resetting virtual machines repeatedly for nontransient errors, by default virtual machines will be reset only three times during a certain configurable time interval. After virtual machines have been reset three times, VMware HA makes no further attempts to reset the virtual machines after any subsequent failures until after the specified time has elapsed. You can configure the number of resets using the Maximum per-VM resets custom setting.
24
VMware, Inc.
Occasionally, virtual machines that are still functioning properly stop sending heartbeats. To avoid unnecessarily resetting such virtual machines, the VM Monitoring service also monitors a virtual machine's I/O activity. If no heartbeats are received within the failure interval, the I/O stats interval (a cluster-level attribute) is checked. The I/O stats interval determines if any disk or network activity has occurred for the virtual machine during the previous two minutes (120 seconds). If not, the virtual machine is reset. This default value (120 seconds) can be changed using the advanced attribute das.iostatsInterval. NOTE The VM Monitoring settings cannot be configured though advanced attributes. Modify settings in the VM Monitoring page of the clusters Settings dialog box. The default settings for VM Monitoring sensitivity are described in the table. Table 2-1. VM Monitoring Settings
Setting High Medium Low Failure Interval (seconds) 30 60 120 Reset Period 1 hour 24 hours 7 days
You can specify custom values for both VM Monitoring sensitivity and the I/O stats interval, as described in Customizing VMware HA Behavior, on page 25.
das.usedefaultisolationaddress
das.failuredetectiontime
das.failuredetectioninterval
VMware, Inc.
25
das.isolationShutdownTimeout
das.slotMemInMB
das.slotCpuInMHz
das.vmMemoryMinMB
das.vmCpuMinMHz
das.iostatsInterval
NOTE If you change the value of any of the following advanced attributes, you must disable and then re-enable VMware HA before your changes take effect.
n n n n n
26
VMware, Inc.
Procedure 1 2 3 4 In the clusters Settings dialog box, select VMware HA. Click the Advanced Options button to open the Advanced Options (HA) dialog box. Enter each advanced attribute you want to change in a text box in the Option column and enter a value in the Valuecolumn. Click OK.
The virtual machines behavior now differs from the cluster defaults for each setting you changed.
VMware, Inc.
27
When making changes to the network(s) that your clustered ESX/ESXi hosts are on, VMware recommends that you suspend the Host Monitoring feature. Changing your network hardware or networking settings can interrupt the heartbeats that VMware HA uses to detect host failures, and this might result in unwanted attempts to fail over virtual machines. When you change the networking configuration on the ESX/ESXi hosts themselves, for example, adding port groups, or removing vSwitches, VMware recommends that in addition to suspending Host Monitoring, you place the host in maintenance mode.
NOTE Because networking is a vital component of VMware HA, if network maintenance needs to be performed the VMware HA administrator should be informed.
On ESX hosts in the cluster, VMware HA communications travel over all networks that are designated as service console networks. VMkernel networks are not used by these hosts for VMware HA communications. On ESXi hosts in the cluster, VMware HA communications, by default, travel over VMkernel networks, except those marked for use with VMotion. If there is only one VMkernel network, VMware HA shares it with VMotion, if necessary. With ESXi 4.0, you must also explicitly enable the Management Network checkbox for VMware HA to use this network.
28
VMware, Inc.
By default, the network isolation address is the default gateway for the host. There is only one default gateway specified, regardless of how many service console networks have been defined, so you should use the das.isolationaddress[...] advanced attribute to add isolation addresses for additional networks. For example, das.isolationAddress2 to add an isolation address for your second network, das.isolationAddress3 for the third, up to a maximum of das.isolationAddress9 for the ninth. When you specify additional isolation address, VMware recommends that you increase the setting for the das.failuredetectiontime advanced attribute to 20000 milliseconds (20 seconds) or greater. A node that is isolated from the network needs time to release its virtual machine's VMFS locks if the host isolation response is to fail over the virtual machines (not to leave them powered on.) This must happen before the other nodes declare the node as failed, so that they can power on the virtual machines, without getting an error that the virtual machines are still locked by the isolated node. For more information on VMware HA advanced attributes, see Customizing VMware HA Behavior, on page 25.
Port Group Names and Network Labels. Use consistent port group names and network labels on VLANs for public networks. Port group names are used to reconfigure access to the network by virtual machines. If you use inconsistent names between the original server and the failover server, virtual machines are disconnected from their networks after failover. Network labels are used by virtual machines to reestablish network connectivity upon restart.
VMware, Inc.
29
After you have added a NIC to a host in your VMware HA cluster, you must reconfigure VMware HA on that host.
30
VMware, Inc.
You can enable VMware Fault Tolerance for your virtual machines to ensure business continuity with higher levels of availability and data protection than is offered by VMware HA. Fault Tolerance is built on the ESX/ESXi host platform (using the VMware vLockstep functionality) and it provides continuous availability by having identical virtual machines run in virtual lockstep on separate hosts. To obtain the optimal results from Fault Tolerance you should be familiar with how it works, how to enable it for your cluster and virtual machines, the best practices for its usage, and troubleshooting tips. This chapter includes the following topics:
n n n n n n n n n n
How Fault Tolerance Works, on page 31 Fault Tolerance Use Cases, on page 32 Fault Tolerance Configuration Requirements, on page 33 Fault Tolerance Interoperability, on page 34 Preparing Your Cluster and Hosts for Fault Tolerance, on page 35 Turning On Fault Tolerance for Virtual Machines, on page 37 Viewing Information About Fault Tolerant Virtual Machines, on page 39 Fault Tolerance Best Practices, on page 40 VMware Fault Tolerance Configuration Recommendations, on page 41 Troubleshooting Fault Tolerance, on page 42
VMware, Inc.
31
The Primary and Secondary VMs continuously exchange heartbeats. This allows the virtual machine pair to monitor the status of one another to ensure that Fault Tolerance is continually maintained. A transparent failover occurs if the host running the Primary VM fails, in which case the Secondary VM is immediately activated to replace the Primary VM. A new Secondary VM is started and Fault Tolerance redundancy is reestablished within a few seconds. If the host running the Secondary VM fails, it is also immediately replaced. In either case, users experience no interruption in service and no loss of data. A fault tolerant virtual machine and its secondary copy are not allowed to run on the same host. Fault Tolerance uses anti-affinity rules, which ensure that the two instances of the fault tolerant virtual machine are never on the same host. This ensures that a host failure cannot result in the loss of both virtual machines. Fault Tolerance avoids "split-brain" situations, which can lead to two active copies of a virtual machine after recovery from a failure. Atomic file locking on shared storage is used to coordinate failover so that only one side continues running as the Primary VM and a new Secondary VM is respawned automatically. NOTE The anti-affinity check is performed when the Primary VM is powered on. It is possible that the Primary and Secondary VMs can be on the same host when they are both in a powered-off state. This is normal behavior and when the Primary VM is powered on, the Secondary VM is started on a different host at that time.
Applications that need to be available at all times, especially those that have long-lasting client connections that users want to maintain during hardware failure. Custom applications that have no other way of doing clustering. Cases where high availability might be provided through custom clustering solutions, which are too complicated to configure and maintain.
n n
32
VMware, Inc.
Cluster Prerequisites
Unlike VMware HA which, by default, protects every virtual machine in the cluster, VMware Fault Tolerance is enabled on individual virtual machines. For a cluster to support VMware Fault Tolerance, the following prerequisites must be met:
n
VMware HA must be enabled on the cluster. Host Monitoring should also be enabled. If it is not, when Fault Tolerance uses a Secondary VM to replace a Primary VM no new Secondary VM is created and redundancy is not restored. Host certificate checking must be enabled for all hosts that will be used for Fault Tolerance. See Enable Host Certificate Checking, on page 36. Each host must have a VMotion and a Fault Tolerance Logging NIC configured. See Configure Networking for Host Machines, on page 36. At least two hosts must have processors from the same compatible processor group. While Fault Tolerance supports heterogeneous clusters (a mix of processor groups), you get the maximum flexibility if all hosts are compatible. See the VMware knowledge base article at http://kb.vmware.com/kb/1008027 for information on supported processors. All hosts must have the same ESX/ESXi version and patch level. All hosts must have access to the virtual machines' datastores and networks.
n n
To confirm the compatibility of the hosts in the cluster to support Fault Tolerance, run profile compliance checks. NOTE VMware HA includes the resource usage of Fault Tolerance Secondary VMs when it performs admission control calculations. For the Host Failures Cluster Tolerates policy, a Secondary VM is assigned a slot, and for the Percentage of Cluster Resources policy, the Secondary VM's resource usage is accounted for when computing the usable capacity of the cluster. See VMware HA Admission Control, on page 15.
Host Prerequisites
A host can support fault tolerant virtual machines if it meets the following requirements.
n
A host must have processors from the FT-compatible processor group. See the VMware knowledge base article at http://kb.vmware.com/kb/1008027. A host must be certified by the OEM as FT-capable. Refer to the current Hardware Compatibility List (HCL) for a list of FT-supported servers (see http://www.vmware.com/resources/compatibility/search.php). The host configuration must have Hardware Virtualization (HV) enabled in the BIOS. Some hardware manufacturers ship their products with HV disabled. The process for enabling HV varies among BIOSes. See the documentation for your hosts' BIOSes for details on how to enable HV. If HV is not enabled, attempts to power on a fault tolerant virtual machine produce an error and the virtual machine does not power on.
Review the Host Configuration Section of Fault Tolerance Best Practices, on page 40 to select host options that best support VMware Fault Tolerance.
VMware, Inc.
33
Virtual machine files must be stored on shared storage. Acceptable shared storage solutions include Fibre Channel, (hardware and software) iSCSI, NFS, and NAS. Virtual machines must be stored in virtual RDM or virtual machine disk (VMDK) files that are thick provisioned with the Cluster Features option. If a virtual machine is stored in a VMDK file that is thin provisioned or thick provisioned without clustering features enabled and an attempt is made to enable Fault Tolerance, a message appears indicating that the VMDK file must be converted. Users can accept this automatic conversion (which requires the virtual machine to be powered off), allowing the disk to be converted and the virtual machine to be protected with Fault Tolerance. The amount of time needed for this conversion process can vary depending on the size of the disk and the host's processor type. Virtual machines must be running on one of the supported guest operating systems. See the VMware knowledge base article at http://kb.vmware.com/kb/1008027 for more information.
Snapshots. Snapshots must be removed or committed before Fault Tolerance can be enabled on a virtual machine. In addition, it is not possible to take snapshots of virtual machines on which Fault Tolerance is enabled. Storage VMotion. You cannot invoke Storage VMotion for virtual machines with Fault Tolerance turned on. To migrate the storage, you should temporarily turn off Fault Tolerance, and perform the storage VMotion action. When this is complete, you can turn Fault Tolerance back on. DRS features. A fault tolerant virtual machine is automatically configured as DRS-disabled. DRS does initially place a Secondary VM, however, DRS does not make recommendations or load balance Primary or Secondary VMs when load balancing the cluster. The Primary and Secondary VMs can be manually migrated during normal operation.
34
VMware, Inc.
Table 3-1. Features and Devices Incompatible with Fault Tolerance and Corrective Actions (Continued)
Incompatible Feature or Device USB and sound devices. N_Port ID Virtualization (NPIV). NIC passthrough. Network interfaces for legacy network hardware. Corrective Action Remove these devices from the virtual machine. Disable the NPIV configuration of the virtual machine. This feature is not supported by Fault Tolerance so it must be turned off. While some legacy drivers are not supported, Fault Tolerance does support the VMXNET2 driver. You might need to install VMware tools to access the VMXNET2 driver instead of vlance in certain guest operating systems. When you turn on Fault Tolerance, the conversion to the appropriate disk format is performed by default. The virtual machine must be in a powered-off state to take this action. The hot plug feature is automatically disabled for fault tolerant virtual machines. To hot plug devices, you must momentarily turn off Fault Tolerance, perform the hot plug, and then turn on Fault Tolerance. EPT/RVI is automatically disabled for virtual machines with Fault Tolerance turned on. Remove the VMXNET3 driver and replace it with a supported driver. A few examples of the supported drivers are the e1000 driver, the VMXNET2 driver, and the VMXNET driver. Remove the PVSCSI adapter and replace it with a supported adapter. A few examples of the supported adapters are the LSILogic adapter and the BusLogic driver.
Virtual disks backed with thin-provisioned storage or thickprovisioned disks that do not have clustering features enabled. Hot-plugging devices.
Enable host certificate checking (if you are upgrading from a previous version of Virtual Infrastructure) Configure networking for each host Create the VMware HA cluster, add hosts, and check compliance
After your cluster and hosts are prepared for Fault Tolerance, you are ready to turn on Fault Tolerance for your virtual machines. See Turn On Fault Tolerance for Virtual Machines, on page 38.
VMware, Inc.
35
36
VMware, Inc.
After you have created both a VMotion and Fault Tolerance logging virtual switch, you should add the host to the cluster and complete any steps needed to turn on Fault Tolerance. What to do next To confirm that you successfully enabled both VMotion and Fault Tolerance on the host, view its Summary tab in the vSphere Client. In the General pane, the fields VMotion Enabled and Fault Tolerance Enabled should show yes. NOTE If you configure networking to support Fault Tolerance but subsequently disable it, pairs of fault tolerant virtual machines that are already powered on remain so. However, if a failover situation occurs, when the Primary VM is replaced by its Secondary VM a new Secondary VM is not started, causing the new Primary VM to run in a Not Protected state.
The virtual machine resides on a host that does not have a license for the feature. The virtual machine resides on a host that is in maintenance mode or standby mode. The virtual machine is disconnected or orphaned (its .vmx file cannot be accessed). The user does not have permission to turn the feature on.
If the option to turn on Fault Tolerance is available, this task still must be validated and can fail if certain requirements are not met.
SSL certificate checking must be enabled in the vCenter Server settings. The host must be in a VMware HA cluster or a mixed VMware HA and DRS cluster. The host must have ESX/ESXi 4.0 or greater installed.
VMware, Inc.
37
n n n n
The virtual machine must not have multiple vCPUs. The virtual machine must not have snapshots. The virtual machine must not be a template. The virtual machine must not have VMware HA disabled.
A number of additional validation checks are performed for powered-on virtual machines (or those being powered on).
n
The BIOS of the hosts where the fault tolerant virtual machines reside must have Hardware Virtualization (HV) enabled. The host that supports the Primary VM must have a processor that supports Fault Tolerance. The host that supports the Secondary VM must have a processor that supports Fault Tolerance and is the same CPU family or model as the host that supports the Primary VM. The combination of the virtual machine's guest operating system and processor must be supported by Fault Tolerance (for example, 32-bit Solaris on AMD-based processors is not currently supported). The configuration of the virtual machine must be valid for use with Fault Tolerance (for example, it must not contain any unsupported devices).
n n
When your effort to turn on Fault Tolerance for a virtual machine passes the validation checks, the Secondary VM is created and the entire state of the Primary VM is copied. The placement and immediate status of the Secondary VM depends upon whether the Primary VM was powered-on or powered-off when you turned on Fault Tolerance. If the Primary VM is powered on:
n
The Secondary VM is created, placed on a separate compatible host, and powered on if it passes admission control. The Fault Tolerance Status displayed on the virtual machine's Summary tab in the vSphere Client is Protected.
The Secondary VM is immediately created and registered to a host in the cluster (it might be re-registered to a more appropriate host when it is powered on.) The Secondary VM is not powered on until after the Primary VM is powered on. The Fault Tolerance Status displayed on the virtual machine's Summary tab in the vSphere Client is Not Protected, VM not Running. When you attempt to power on the Primary VM after Fault Tolerance has been turned on, the additional validation checks listed above are performed. To power on properly, the virtual machine must not use paravirtualization (VMI). After these checks are passed, the Primary and Secondary VMs are powered on, placed on separate, compatible hosts and the Fault Tolerance Status displayed on the virtual machine's Summary tab in the vSphere Client is Protected.
n n
38
VMware, Inc.
Connect vSphere Client to vCenter Server using an account with cluster administrator permissions. Procedure 1 2 Select the Hosts & Clusters view. Right-click a virtual machine and select Fault Tolerance > Turn On Fault Tolerance.
The specified virtual machine is designated as a Primary VM and a Secondary VM is established on another host. The Primary VM is now fault tolerant.
Protected. Indicates that the Primary and Secondary VMs are powered on and running as expected. Not Protected. Indicates that the Secondary VM is not running. Possible reasons are listed in the table. Table 3-2. Reasons for Primary VM Not Protected Status
Reason for Not Protected Status Starting Description Fault Tolerance is in the process of starting the Secondary VM. This message is only visible for a short period of time. The Primary VM is running without a Secondary VM, so the Primary VM is currently not protected. This generally occurs when there is no compatible host in the cluster available for the Secondary VM. Correct this by bringing a compatible host online. If there is a compatible host online in the cluster, further investigation might be required. Under certain circumstances, disabling Fault Tolerance and then reenabling it corrects this problem. Fault Tolerance is currently disabled (no Secondary VM is running). This happens when Fault Tolerance is disabled by the user or when vCenter Server disables Fault Tolerance after being unable to power on the Secondary VM. Fault Tolerance is enabled but the virtual machine is powered off. Power on the virtual machine to reach Protected state.
Need Secondary VM
Disabled
VM not Running
Secondary location
VMware, Inc.
39
Indicates the CPU usage of the Secondary VM, displayed in MHz. Indicates the memory usage of the Secondary VM, displayed in MB. The time interval (displayed in seconds) needed for the Secondary VM to match the current execution state of the Primary VM. Typically, this interval is less than one-half of one second. The amount of network capacity being used for sending VMware Fault Tolerance log information from the host running the Primary VM to the host running the Secondary VM.
Log Bandwidth
Host Configuration
Observe the following best practices when configuring your hosts.
n
Hosts running the Primary and Secondary VMs should operate at approximately the same processor frequencies, otherwise the Secondary VM might be restarted more frequently. Platform power management features which do not adjust based on workload (for example, power capping and enforced low frequency modes to save power) can cause processor frequencies to vary greatly. If Secondary VMs are being restarted on a regular basis, disable all power management modes on the hosts running fault tolerant virtual machines or ensure that all hosts are running in the same power management modes. Apply the same instruction set extension configuration (enabled or disabled) to all hosts. The process for enabling or disabling instruction sets varies among BIOSes. See the documentation for your hosts' BIOSes for details on how to configure instruction sets.
Homogeneous Clusters
VMware Fault Tolerance can function in clusters with non-uniform hosts, but it works best in clusters with compatible nodes. When constructing your cluster, all hosts should have the following:
n n n n n
Processors from the same compatible processor group. Common access to datastores used by the virtual machines. The same virtual machine network configuration. The same ESX/ESXi version. The same BIOS settings for all hosts.
Performance
To increase the bandwidth available for the logging traffic between Primary and Secondary VMs use a 10Gbit NIC rather than 1Gbit NIC, and enable the use of jumbo frames.
40
VMware, Inc.
For virtual machines with Fault Tolerance enabled, you might use ISO images that are accessible only to the Primary VM. In such a case, the Primary VM is able to access the ISO, but if a failover occurs, the CD-ROM reports errors as if there is no media. This situation might be acceptable if the CD-ROM is being used for a temporary, non-critical operation such as an installation.
In addition to non-fault tolerant virtual machines, you should have no more than four fault tolerant virtual machines (primaries or secondaries) on any single host. The number of fault tolerant virtual machines that you can safely run on each host is based on the sizes and workloads of the ESX/ESXi host and virtual machines, all of which can vary. If you are using NFS to access shared storage, use dedicated NAS hardware with at least a 1Gbit NIC to obtain the network performance required for Fault Tolerance to work properly. Ensure that a resource pool containing fault tolerant virtual machines has excess memory above the memory size of the virtual machines. Fault tolerant virtual machines use their full memory reservation. Without this excess in the resource pool, there might not be any memory available to use as overhead memory. VMware recommends that you use a maximum of 16 virtual disks per fault tolerant virtual machine. To ensure redundancy and maximum Fault Tolerance protection, VMware recommends that you have a minimum of three hosts in the cluster. In a failover situation, this provides a host that can accommodate the new Secondary VM that is created.
n n
VMware, Inc.
41
Too Much Activity on VMFS Volume Can Lead to Virtual Machine Failovers
When a number of file system locking operations, virtual machine power ons, power offs, or VMotion migrations occur on a single VMFS volume, this can trigger fault tolerant virtual machines to be failed over. A symptom that this might be occurring is receiving many warnings about SCSI reservations in the VMkernel log. To resolve this problem, reduce the number of file system operations or ensure that the fault tolerant virtual machine is on a VMFS volume that does not have an abundance of other virtual machines that are regularly being powered on, powered off, or migrated using VMotion.
42
VMware, Inc.
This can occur for a variety of reasons including that there are no other hosts in the cluster, there are no other hosts with HV enabled, data stores are inaccessible, there is no available capacity, or hosts are in maintenance mode. If there are insufficient hosts, add more hosts to the cluster. If there are hosts in the cluster, ensure they support HV and that HV is enabled. The process for enabling HV varies among BIOSes. See the documentation for your hosts' BIOSes for details on how to enable HV. Check that hosts have sufficient capacity and that they are not in maintenance mode.
VMware, Inc.
43
To resolve this problem, before you enable Fault Tolerance, power off the virtual machine and increase its timeout window by adding the following line to the vmx file of the virtual machine:
ft.maxSwitchoverSeconds = "30"
where 30 is the timeout window in number in seconds. Enable Fault Tolerance and power the virtual machine back on. This solution should work except under conditions of very high network activity. NOTE If you increase the timeout to 30 seconds, the fault tolerant virtual machine might become unresponsive for a longer period of time (up to 30 seconds) when enabling FT or when a new Secondary VM is created after a failover.
44
VMware, Inc.
You might encounter error messages when trying to use VMware Fault Tolerance (FT). The table lists some of these error messages. For each error message there is a description and information about resolving the error, if applicable. Table A-1. Fault Tolerance Error Messages
Error Message This host contains virtual machines (VMs) with Fault Tolerance turned On; therefore, this host cannot be moved out of its current cluster. To move the host to another cluster, first migrate the VMs with Fault Tolerance turned On to a different host Cannot add a host with virtual machines that have Fault Tolerance turned On to a non-HA enabled cluster Cannot add a host with virtual machines that have Fault Tolerance turned On as a stand-alone host Fault Tolerance is enabled on one or more VMs on this host and must be disabled to move the host out of the current cluster Fault Tolerance is enabled on VM {vmName}. Disable Fault Tolerance to move the VM from the current [Resource pool, Cluster] The host {hostName} has VMs with Fault Tolerance turned On. Before disconnecting the host, the host should be put into maintenance mode or turn Off Fault Tolerance protection on these VMs Virtual machines in the same Fault Tolerance pair cannot be on the same host Description and Solution This host cannot be moved out of the cluster because it contains virtual machines with FT turned on. To move the host to another cluster, first migrate the fault tolerant virtual machines to a different host.
FT requires the cluster to be enabled for VMware HA. Edit your cluster settings and turn on VMware HA.
FT cannot be enabled on a stand-alone host. While the host is in the VMware HAenabled cluster, right-click each virtual machine on the host and select Turn Off Fault Tolerance. Once FT is disabled, the host can be made into a stand-alone host. This host cannot be moved out of the cluster until FT is turned off. To turn off FT, right-click the fault tolerant virtual machines and select Turn Off Fault Tolerance.
To move the virtual machine to another cluster or to a standalone host, first turn off FT.
This host cannot be disconnected until it is placed in maintenance mode or until FT is turned off. To turn off FT, right-click the fault tolerant virtual machines and select Turn Off Fault Tolerance.
You have attempted to VMotion a Secondary VM to the same host a Primary VM is on. A Primary VM and its Secondary VM cannot reside on the same host. Select a different destination host for the Secondary VM.
VMware, Inc.
45
There are configuration issues for the Fault Tolerance operation. Refer to the errors and warnings list for details This operation is not supported on a Secondary VM of a Fault Tolerant pair The Secondary VM with instanceUuid '{instanceUuid}' has already been enabled The Secondary VM with instanceUuid '{instanceUuid}' has already been disabled Cannot power On the Fault Tolerance Secondary VM for virtual machine {vmName}. Refer to the errors list for details Host {hostName} does not support virtual machines with Fault Tolerance turned on. This VMware product does not support Fault Tolerance Host {hostName} does not support virtual machines with Fault Tolerance turned on. This product supports Fault Tolerance, but the host processor does not Host {hostName} has some Fault Tolerance issues for virtual machine {vmName}. Refer to the errors list for details No suitable host can be found to place the Fault Tolerance Secondary VM for virtual machine {vmName}
vCenter Server has detected FT issues on the host. To troubleshoot this issue, in the vSphere Client select the failed FT operation in either the Recent Tasks pane or the Tasks & Events tab and click the View details link that appears in the Details column. FT requires that the hosts for the Primary and Secondary VMs use the same CPU model or family and have the same ESX/ESXi host version and patch level. Enable FT on a virtual machine registered to a host with a matching CPU model or family within the cluster. If no such hosts exist, you must add one.
46
VMware, Inc.
The Fault Tolerance Secondary VM was not powered On because the Fault Tolerance Primary VM could not be powered On
DRS Disabled is the only supported DRS behavior for Fault Tolerance virtual machine {vmName} Host CPU is incompatible with the virtual machine's requirements mismatch detected for these features: CPU does not match Record/Replay is not supported for Guest OS XP/PRO on this CPU The Fault Tolerance configuration of the entity {entityName} has an issue: HA is not enabled on the virtual machine The Fault Tolerance configuration of the entity {entityName} has an issue: Secondary VM already exists The Fault Tolerance configuration of the entity {entityName} has an issue: Template virtual machine The Fault Tolerance configuration of the entity {entityName} has an issue: Virtual machine with multiple virtual CPUs The Fault Tolerance configuration of the entity {entityName} has an issue: Host is inactive The Fault Tolerance configuration of the entity {entityName} has an issue: Fault Tolerance not supported by host hardware The Fault Tolerance configuration of the entity {entityName} has an issue: Fault Tolerance not supported by VMware Server 2.0 The Fault Tolerance configuration of the entity {entityName} has an issue: No VMotion license or no virtual NIC configured for VMotion
The Primary VM already has a Secondary VM. Do not attempt to create multiple Secondary VMs for the same Primary VM. FT cannot be enabled on virtual machines which are templates. Use a non-template virtual machine for FT. FT is only supported on virtual machines with a single vCPU configured. Use a single vCPU virtual machine for FT.
You must enable FT on an active host. An inactive host is one that is disconnected, in maintenance mode, or in standby mode. FT is only supported on specific processors and BIOS settings with Hardware Virtualization (HV) enabled. To resolve this issue, use hosts with supported CPU models and BIOS settings. Upgrade to VMware ESX or ESXi 4.0 or later.
Verify that you have correctly configured networking on the host. See Configure Networking for Host Machines, on page 36. If it is, then you might need to acquire a VMotion license.
VMware, Inc.
47
The "check host certificates" box is not checked in the SSL settings for vCenter Server. You must check that box. See Enable Host Certificate Checking, on page 36.
FT does not support virtual machines with snapshots. Enable FT on a virtual machine without snapshots or use the snapshot manager to delete all snapshots associated with this virtual machine. vCenter Server has no information about the configuration of the virtual machine. Determine if it is misconfigured. You can try removing the virtual machine from the inventory and re-registering it. Upgrade the hardware the virtual machine is running on and then turn on FT. Potential configuration issues include: n Software virtualization with FT is unsupported. n FT is not supported for SMP virtual machines. n Paravirtualization (VMI) with FT is not supported. n VM has device that is not supported with FT. n Combination of guest operating system, CPU type and configuration options is incompatible with FT. See Fault Tolerance Interoperability, on page 34 for more details about these requirements. This error occurs when you attempt to turn on FT for a powered-on virtual machine that does not meet all of the configuration requirements for FT. Power off the virtual machine, address the configuration issue, then Turn On Fault Tolerance. Potential configuration issues include: n Software virtualization with FT is unsupported. n FT is not supported for SMP virtual machines. n Paravirtualization (VMI) with FT is not supported. n VM has device that is not supported with FT. n Combination of guest operating system, CPU type and configuration options is incompatible with FT. See Fault Tolerance Interoperability, on page 34 for more details about these requirements. This error occurs when you attempt to reconfigure a Primary VM with more than one vCPU. You must modify the number of vCPUs to one. FT is not supported on a virtual machine with a virtual floppy device that has file backing not accessible to the host upon which the Secondary VM resides. To turn on FT for this virtual machine, first remove the unsupported device. FT is not supported on a virtual machine with a virtual CDROM device that has file backing not accessible to the host upon which the Secondary VM resides. To turn on FT for this virtual machine, first remove the unsupported device. FT is not supported on a virtual machine with a virtual serial port device that has file backing not accessible to the host upon which the Secondary VM resides. To turn on FT for this virtual machine, first remove the unsupported device. FT is not supported on a virtual machine with a virtual parallel port device that has file backing not accessible to the host upon which the Secondary VM resides. To turn on FT for this virtual machine, first remove the unsupported device.
The Fault Tolerance configuration of the entity {entityName} has an issue: The virtual machine's current configuration does not support Fault Tolerance
The virtual machine has {numCpu} virtual CPUs and is not supported for reason: Fault Tolerance The file backing ({backingFilename}) for device Virtual Floppy is not supported for Fault Tolerance The file backing ({backingFilename}) for device Virtual CDROM is not supported for Fault Tolerance The file backing ({backingFilename}) for device Virtual serial port is not supported for Fault Tolerance The file backing ({backingFilename}) for device Virtual parallel port is not supported for Fault Tolerance
48
VMware, Inc.
You might be experiencing network latency that is causing the timeout. See Troubleshooting Fault Tolerance, on page 42. Fault Tolerance has detected a difference between the Primary and Secondary VMs. This can be caused by transient events which occur due to hardware or software differences between the two hosts. FT has automatically started a new Secondary VM, and no action is required. If you see this message frequently, you should alert support to determine if there is an issue.
NOTE For errors related to CPU compatibility, see the VMware knowledge base article at http://kb.vmware.com/kb/1008027 for information on supported processors.
VMware, Inc.
49
50
VMware, Inc.
Index
A
admission control enabling 23 policy 23 types 15 VMware HA 15 admission control policy choosing 20 Host Failures Cluster Tolerates 15 Percentage of Cluster Resources Reserved 18 Specify a Failover Host 20 advanced attributes, VMware HA 25 Advanced Runtime Info 15 affinity rules 31 anti-affinity rules 31
das.vmCpuMinMHz 15, 18, 25 das.vmMemoryMinMB 25 default gateway 28 Distributed Power Management (DPM) 13, 15 Distributed Resource Scheduler (DRS) and Fault Tolerance 34 Fault Tolerance errors 45 turning on 22 using with VMware HA 13 downtime planned 9 unplanned 10
E
educational support 7 error messages, Fault Tolerance 45 events and alarms, setting 27 Extended Page Tables (EPT) 34
B
best practices Fault Tolerance 40 VMware HA clusters 27 VMware HA networking 28 business continuity 9
F
failover host 20 Fault Tolerance anti-affinity rules 31 best practices 40 compliance check 37 configuration recommendations 41 continuous availability 11 enabling 35 error messages 45 interoperability 34 Log Bandwidth 39 logging 36, 42 networking configuration 36 overview 31 prerequisites 33 restrictions for turning on 37 secondary location 39 Total Secondary CPU 39 Total Secondary Memory 39 troubleshooting 42, 43 turning on 38 use cases 32 validation checks 37 vLockstep Interval 39 vSphere configuration 33
C
cluster settings 21 cluster validity 27 compliance check, Fault Tolerance 37 Configured Failover Capacity 15, 18 configuring VMware HA advanced options 26 creating a VMware HA cluster 21 Current Failover Capacity 15, 18 Current Failover Host 20 customizing VMware HA 25
D
das.defaultfailoverhost 25 das.failuredetectioninterval 25 das.failuredetectiontime 25, 28 das.iostatsInterval 24, 25 das.isolationaddress 25, 28 das.isolationShutdownTimeout 23, 25 das.slotCpuInMHz 15, 25 das.slotMemInMB 15, 25 das.usedefaultisolationaddress 25
VMware, Inc.
51
Fault Tolerance status Disabled 39 Need Secondary VM 39 Starting 39 VM not Running 39 firewall ports 28 ft.maxSwitchoverSeconds 43
R
Rapid Virtualization Indexing (RVI) 34 RDM 33, 34 resource fragmentation 20
S
secondary hosts in clusters 13 slot 15 slot size calculation 15 snapshots 34 Specify a Failover Host 20 storage iSCSI 33 NAS 33, 41 NFS 33, 41 Storage VMotion 9, 34 suspending VMware HA 22 Symmetric multiprocessor (SMP) 34
H
Hardware Virtualization (HV) 33, 37, 43 host certificate checking 33, 36 Host Failures Cluster Tolerates 15 Host Isolation Response setting 23 Host Monitoring 33 Host Monitoring feature 22, 28 hosts maintenance mode 13 network isolation 13
I
I/O stats interval 24 interoperability, Fault Tolerance 34 iSCSI SAN 33 ISO images 40
T
technical support 7 tolerating host failures 15 transparent failover 11, 31 troubleshooting Fault Tolerance 42 turning on VMware HA 22
M
Maximum per-VM resets 24 minimizing downtime 9 modifying cluster settings 21 monitoring VMware HA 27
U
unplanned downtime 10 updated information 5 upgrading hosts with FT virtual machines 41 use cases, Fault Tolerance 32
N
N_Port ID Virtualization (NPIV) 34 network isolation address 28 network labels 28 networking configuration, Fault Tolerance 36 NIC teaming 29
V
validation checks 37 virtual machine overrides 23, 27 Virtual Machine Startup and Shutdown feature 21 VM Monitoring 24 VM Monitoring sensitivity 24 VM Restart Priority setting 23 VMDK 33 VMFS 13, 28, 42 VMware HA advanced attributes 25 advantages 10 cluster settings 21 customizing 25 monitoring 27 recovery from outages 10 suspending 22 turning on 22
O
On-Demand Fault Tolerance 32
P
paravirtualization 34 Percentage of Cluster Resources Reserved 18 planned downtime 9 planning a VMware HA cluster 13 port group names 28 PortFast 28 prerequisites, Fault Tolerance 33 primary hosts in clusters 13
52
VMware, Inc.
Index
VMware HA cluster admission control 15 best practices 27 creating 21, 37 heterogeneity 20 planning 13 primary hosts 13 secondary hosts 13 VMware HA networking best practices 28 path redundancy 29 VMware Tools 24 VMware vLockstep 11, 31
VMware, Inc.
53
54
VMware, Inc.