VMware® vSAN™ Design and Sizing Guide (PDFDrive)
VMware® vSAN™ Design and Sizing Guide (PDFDrive)
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1
Virtual SAN 6.2 Design and Sizing Guide
Contents
INTRODUCTION.............................................................................................................. 5
HEALTH SERVICES........................................................................................................... 6
VIRTUAL SAN READY NODES................................................................................... 6
VMWARE VXRAIL .......................................................................................................... 7
VIRTUAL SAN DESIGN OVERVIEW.......................................................................... 8
FOLLOW THE COMPAT IBILITY GUIDE (VCG) PRECISELY ............................................... 8
Hardware, drivers, firmware .................................................................................. 8
USE SUPPORTED VSPHERE SOFTWARE VERSIONS.......................................................... 8
BALANCED CONFIGURATIONS ......................................................................................... 9
LIFECYCLE OF THE VIRTUAL SAN CLUSTER ................................................................... 9
SIZING FOR CAPACITY, MAINTENANCE AND AVAILAB ILITY ............................................. 11
SUMMARY OF DESIGN OVERVIEW CONS IDERATIONS ...................................................... 11
HYBRID AND ALL-FLASH DIFFERENCES .............................................................12
ALL-FLASH CONSIDERATIONS................................................................................12
VIRTUAL SAN LIMITS ................................................................................................. 14
MINIMUM NUMBER OF ESXI HOSTS REQUIRED ...............................................................14
MAXIMUM NUMBER OF ESXI HOSTS ALLOWED ..............................................................14
MAXIMUM NUMBER OF VIRTUAL MACHINES ALLOWED ..................................................14
MAXIMUM NUMBER OF VIRTUAL MACHINES PROTECTED BY VSPHERE HA.................... 15
DISKS, DISK GROUP AND FLASH DEVICE MAX IMUMS ...................................................... 15
COMPONENTS MAXIMUMS ..............................................................................................16
VM STORAGE POLICY MAXIMUMS ................................................................................... 17
MAXIMUM VMDK SIZE ................................................................................................... 18
SUMMARY OF DESIGN CONSIDERATIONS AROUND LIMITS..............................................19
NETWORK DESIGN CONSIDERATIONS ............................................................... 20
NETWORK INTERCONNECT - 1GB/10GB...................................................................... 20
ALL-FLASH BANDWIDTH REQUIREMENTS .................................................................... 20
NIC TEAMING FOR REDUNDANCY .................................................................................. 21
MTU AND JUMBO FRAMES CONSIDERATIONS ................................................................ 21
MULTICAST CONSIDERAT IONS ........................................................................................ 21
NETWORK QOS VIA NETWORK I/O CONTROL ............................................................ 22
SUMMARY OF NETWORK DESIGN CONSIDER ATIONS ..................................................... 22
VIRTUAL SAN NETWORK DESIGN GUIDE ...................................................................... 22
STORAGE DESIGN CONSIDERATIONS .................................................................24
DISK GROUPS.................................................................................................................24
CACHE SIZING OVERVIEW .............................................................................................24
FLASH DEVICES IN VIRTUAL SAN................................................................................. 25
Client Cache............................................................................................................ 25
Purpose of read cache.......................................................................................... 25
Purpose of write cache.........................................................................................26
PCIE FLASH DEVICES VERSUS SOLID STATE DRIVES (SSDS) ......................................26
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 2
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 3
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 4
Virtual SAN 6.2 Design and Sizing Guide
Introduction
VMware® Virtual SAN™ is a hypervisor-converged, software-defined
storage platform that is fully integrated with VMware vSphere®. Virtual
SAN aggregates locally attached disks of hosts that are members of a
vSphere cluster, to create a distributed shared storage solution. Virtual
SAN enables the rapid provisioning of storage within VMware vCenter™
as part of virtual machine creation and deployment operations. Virtual
SAN is the first policy-driven storage product designed for vSphere
environments that simplifies and streamlines storage provisioning and
management. Using VM-level storage policies, Virtual SAN
automatically and dynamically matches requirements with underlying
storage resources. With Virtual SAN, many manual storage tasks are
automated - delivering a more efficient and cost-effective operational
model.
There are a wide range of options for selecting a host model, storage
controller as well as flash devices and magnetic disks. It is therefore
extremely important that the VMware Compatibility Guide (VCG) is
followed rigorously when selecting hardware components for a Virtual
SAN design.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 5
Virtual SAN 6.2 Design and Sizing Guide
Health Services
Virtual SAN 6.2 comes with the Health Services UI. This feature checks a
range of different health aspects of Virtual SAN, and provides insight
into the root cause of many potential Virtual SAN issues. The
recommendation when deploying Virtual SAN is to also deploy the
Virtual SAN Health Services at the same time. Once an issue is detected,
the Health Services highlights the problem and directs administrators to
the appropriate VMware knowledgebase article to begin problem
solving.
Please refer to the Virtual SAN Health Services Guide for further details
on how to get the Health Services components, how to install them and
how to use the feature for validating a Virtual SAN deployment and
troubleshooting common Virtual SAN issues.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 6
Virtual SAN 6.2 Design and Sizing Guide
VMware VxRAIL
Another option available to customer is VxRAIL™. VxRAIL combines
VMware compute, networking, and storage resources into a hyper-
converged infrastructure appliance to create a simple, easy to deploy,
all-in-one solution offered by our partner VCE. VxRAIL software is fully
loaded onto a partners’ hardware appliance and includes VMware
Virtual SAN. Further details on VxRAIL can be found here:
http://www.vce.com/products/hyper-converged/vxrail
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 7
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 8
Virtual SAN 6.2 Design and Sizing Guide
Balanced configurations
As a best practice, VMware recommends deploying ESXi hosts with
similar or identical configurations across all cluster members, including
similar or identical storage configurations. This will ensure an even
balance of virtual machine storage components across the disks and
hosts cluster. While hosts that do not contribute storage can still
leverage the Virtual SAN datastore if they are part of the same vSphere
cluster, it may result in additional support effort if a problem is
encountered. For this reason, VMware is not recommending
unbalanced configurations.
Best practice: Similarly configured and sized ESXi hosts should be used
for the Virtual SAN cluster.
When choosing hardware for Virtual SAN, keep in mind that adding
capacity, either for hybrid configurations or all flash configurations, is
usually much easier than increasing the size of the flash devices in the
cache layer.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 9
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 0
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 1
Virtual SAN 6.2 Design and Sizing Guide
In hybrid clusters (which uses magnetic disks for the capacity layer and
flash for the cache layer), the caching algorithm attempts to maximize
both read and write performance. 70% of the available cache is
allocated for storing frequently read disk blocks, minimizing accesses to
the slower magnetic disks. 30% of available cache is allocated to writes.
Multiple writes are coalesced and written sequentially if possible, again
maximizing magnetic disk performance.
All-flash clusters have two types of flash: very fast and durable write
cache, and more capacious and cost-effective capacity flash. Here
cache is 100% allocated for writes, as read performance from capacity
flash is more than sufficient. Many more writes are held by the cache
and written to the capacity layer only when needed, extending the life
of the capacity flash tier.
All-flash considerations
• All-flash is available in Virtual SAN 6.0 only
• It requires a 10Gb network; it is not supported with 1Gb NICs
• The maximum number of all-flash nodes is 64
• Flash devices are used for both cache and capacity
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 2
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 3
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 4
Virtual SAN 6.2 Design and Sizing Guide
Best practice: Enable vSphere HA on the Virtual SAN cluster for the
highest level of availability.
Caution: Virtual SAN does not support the mixing of all-flash disk
groups and hybrid disk groups in the same cluster. Mixing disk group
types can lead to erratic performance.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 5
Virtual SAN 6.2 Design and Sizing Guide
Components maximums
Virtual machines deployed on Virtual SAN are made up of a set of
objects. For example, a VMDK is an object, a snapshot is an object, VM
swap space is an object, and the VM home namespace (where the .vmx
file, log files, etc. are stored) is also an object. Each of these objects is
comprised of a set of components, determined by capabilities placed in
the VM Storage Policy. For example, if the virtual machine is deployed
with a policy to tolerate one failure, then objects will be made up of two
replica components. If the policy contains a stripe width, the object will
be striped across multiple devices in the capacity layer. Each of the
stripes is a component of the object. The concepts of objects and
components will be discussed in greater detail later on in this guide, but
suffice to say that there is a maximum of 3,000 components per ESXi
host in Virtual SAN version 5.5, and with Virtual SAN 6.0 (with on-disk
format v2), the limit is 9,000 components per host. When upgrading
from 5.5 to 6.0, the on-disk format also needs upgrading from v1 to v2
to get the 9,000 components maximum. The upgrade procedure is
documented in the Virtual SAN Administrators Guide. Virtual SAN 6.1
introduced stretched clustering with a maximum of 45,000 witness
components.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 6
Virtual SAN 6.2 Design and Sizing Guide
Design decision: Ensure there are enough hosts (and fault domains) in
the cluster to accommodate a desired NumberOfFailuresToTolerate
requirement.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 7
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 8
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 9
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 2 0
Virtual SAN 6.2 Design and Sizing Guide
VMware testing finds that using jumbo frames can reduce CPU
utilization and improve throughput. The gains are minimal because
vSphere already uses TCP Segmentation Offload (TSO) and Large
Receive Offload (LRO) to deliver similar benefits.
In data centers where jumbo frames are already enabled in the network
infrastructure, jumbo frames are recommended for Virtual SAN
deployment. Otherwise, jumbo frames are not recommended as the
operational cost of configuring jumbo frames throughout the network
infrastructure could outweigh the limited CPU and performance benefits.
The biggest gains for Jumbo Frames will be found in all flash
configurations.
Multicast considerations
Multicast is a network requirement for Virtual SAN. Multicast is used to
discover ESXi hosts participating in the cluster as well as to keep track
of changes within the cluster. It is mandatory to ensure that multicast
traffic is allowed between all the nodes participating in a Virtual SAN
cluster.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 2 1
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 2 2
Virtual SAN 6.2 Design and Sizing Guide
A link to the guide can be found in the further reading section of this
guide, and is highly recommended.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 2 3
Virtual SAN 6.2 Design and Sizing Guide
Disk groups
The more cache to capacity, then the more cache is available to virtual
machines for accelerated performance. However, this leads to
additional costs.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 2 4
Virtual SAN 6.2 Design and Sizing Guide
Client Cache
The Client Cache, introduced in Virtual SAN 6.2, used on hybrid and all
flash Virtual SAN configurations, leverages DRAM memory local to the
virtual machine to accelerate read performance. The amount of
memory allocated is .4% up to 1GB per host.
As the cache is local to the virtual machine, it can properly leverage the
latency of memory by avoiding having to reach out across the network
for the data. In testing of read cache friendly workloads it was able to
significantly reduce read latency.
For a given virtual machine data block, Virtual SAN always reads from
the same replica/mirror. However, when there are multiple replicas (to
tolerate failures), Virtual SAN divides up the caching of the data blocks
evenly between the replica copies.
If the block being read from the first replica is not in cache, the
directory service is referenced to find if the block is in the cache of
another mirror (on another host) in the cluster. If it is found there, the
data is retrieved from there. If it isn’t in cache on the other host, then
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 2 5
Virtual SAN 6.2 Design and Sizing Guide
there is a read cache miss. In that case the data is retrieved directly
from magnetic disk.
The write cache, found on both hybrid and all flash configurations,
behaves as a non-volatile write buffer. This greatly improves
performance in both hybrid and all-flash configurations, and also
extends the life of flash capacity devices in all-flash configurations.
When writes are written to flash, Virtual SAN ensures that a copy of the
data is written elsewhere in the cluster. All virtual machines deployed to
Virtual SAN have a default availability policy setting that ensures at
least one additional copy of the virtual machine data is available. This
includes making sure that writes end up in multiple write caches in the
cluster.
This means that in the event of a host failure, we also have a copy of the
in-cache data and no data loss will happen to the data; the virtual
machine will simply reuse the replicated copy of the cache as well as
the replicated capacity data.
Most solid-state disks use a SATA interface. Even as the speed of flash
is increasing, SSDs are still tied to SATA’s 6Gb/s standard. In
comparison, PCIe, or Peripheral Component Interconnect Express, is a
physical interconnect for motherboard expansion. It can provide up to
16 lanes for data transfer, at ~1Gb/s per lane in each direction for PCIe
3.x devices. This provides a total bandwidth of ~32Gb/s for PCIe devices
that can use all 16 lanes.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 2 6
Virtual SAN 6.2 Design and Sizing Guide
NVMe device support was introduced in Virtual SAN 6.1. NVMe offers
low latency, higher performance, and lower CPU overhead for IO
operations.
When sizing, ensure that there is sufficient tier-1 flash cache versus
capacity (whether the capacity layer is magnetic disk or flash). Once
again cost will play a factor.
For Virtual SAN 6.0, the endurance class has been updated to use
Terabytes Written (TBW), over the vendor’s drive warranty. Previously
the specification was full Drive Writes Per Day (DWPD).
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 2 7
Virtual SAN 6.2 Design and Sizing Guide
By changing the specification to 2 TBW per day for example, both the
200GB drive and 400GB drives are qualified - 2 TBW per day is the
equivalent of 5 DWPD for the 400GB drive and is the equivalent of 10
DWPD for the 200GB drive.
For All-Flash Virtual SAN running high workloads, the flash cache
device specification is 4 TBW per day. This is equivalent to 7300 TB
Writes over 5 years.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 2 8
Virtual SAN 6.2 Design and Sizing Guide
All-Flash Virtual SAN still has a write cache, and all VM writes hit this
cache device. The major algorithm change, apart from the lack of read
cache, is how the write cache is used. The write cache is now used to
hold ‘‘hot’’ blocks of data (data that is in a state of change). Only when
the blocks become ‘‘cold’’ (no longer updated/written) are they are
moved to the capacity layer.
Note: In version 6.0 of Virtual SAN, if the flash device used for the
caching layer in all-flash configurations is less than 600GB, then 100% of
the flash device is used for cache. However, if the flash cache device is
larger than 600GB, then only 600GB is used in caching. This is a per-
disk group basis.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 2 9
Virtual SAN 6.2 Design and Sizing Guide
In a nutshell, the objective behind the 10% tier1 vs tier2 flash sizing
recommendation is to try to have both tiers wear out around the same
time. This means that the write ratio should match the endurance ratio.
What we aim for is to have the two tiers have the same ‘‘life time’’ or LT.
We can write this as the life time of tier should be equal to the life time
of tier2:
LT1 = LT2
TW1/TWPD1 = TW2/TWPD2
And to keep the objective of having both tiers wear out at the same
time, we get:
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 3 0
Virtual SAN 6.2 Design and Sizing Guide
So now we have the lifespan of the SSDs based on their capacities. Let’s
now take a working example using some Intel SSDs and see how this
works out. Lets take the Intel S3700 for tier1 and S3500 for tier2. The
ratings are as follows:
This means for an all-flash Virtual SAN configuration where we wish for
the lifetime of the cache layer and the capacity layer to be pretty
similar, we should deploy a configuration that has at least 5% cache
when compared to the capacity size. However, this is the raw ratio at
the disk group level. But we have not yet factored in the Overwrite
Ratio (OR) which is the number of times that a block is overwritten in
tier1 before we move it from the tier1 cache layer to the tier2 capacity
layer.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 3 1
Virtual SAN 6.2 Design and Sizing Guide
However, if you have better insight into your workload, additional, more
granular calculations are possible:
1. If the first tier has really high endurance rating compared to the
second tier, then a smaller first tier would be fine, and vice-verse.
2. If you have a very active data-set, such as database, then you
would expect to see a large overwrite ratio before the data is
‘‘cold’’ and moved to the second tier. In this case, you would need
a much bigger ratio than 5% raw, to give you both the
performance as well as better endurance protection of the first
tier.
3. If you know your IOPS requirement is quite low, but you want all-
flash to give you a low latency variance, then neither the first nor
the second tier SSDs are in danger of wearing out in a 5 year
period. In this case, one can have a much smaller first tier flash
device. VMware Horizon View may fall into this category.
4. If the maximum IOPS is known or can be predicted somewhat
reliably, customers can just calculate the cache tier1 size directly
from the endurance requirement, after taking into account write
amplification.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 3 2
Virtual SAN 6.2 Design and Sizing Guide
space, thin provisioned. However, they anticipate that over time, the
consumed storage capacity per virtual machine will be an average of
20GB.
The optimal value of the target flash capacity percentage is based upon
actual workload characteristics, such as the size of the working set of
the data on disk. 10% is a general guideline to use as the initial basis for
further refinement.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 3 3
Virtual SAN 6.2 Design and Sizing Guide
directly from the flash capacity layer unless the data block is already in
the write cache.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 3 4
Virtual SAN 6.2 Design and Sizing Guide
Once again, the cache layer will be sized to 10% of 7.5TB, implying that
750GB of flash is required at a minimum. With a 4-node cluster, this
cluster would need a flash device that is at least 187.5GB in size in each
host.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 3 5
Virtual SAN 6.2 Design and Sizing Guide
If a vendor uses full Drive Writes Per Day (DWPD) in their specification,
by doing the conversion shown here, one can obtain the endurance in
Terabytes Written (TBW). For Virtual SAN, what matters from an
endurance perspective is how much data can be written to an SSD over
the warranty period of the drive (in this example, it is a five-year
period).
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 3 6
Virtual SAN 6.2 Design and Sizing Guide
Best practice: Check the VCG and ensure that the flash devices are (a)
supported and (b) provide the endurance characteristics that are
required for the Virtual SAN design.
The same is true if both cache and capacity are being scaled up at the
same time through the addition of a new disk group. An administrator
can simply add one new tier-1 flash device for cache, and at least one
additional magnetic disk or flash devices for the capacity tier and build
a new disk group.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 3 7
Virtual SAN 6.2 Design and Sizing Guide
also an easier approach than trying to simply update the existing flash
cache device in an existing disk group.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 3 8
Virtual SAN 6.2 Design and Sizing Guide
Magnetic Disks
Magnetic disks have two roles in hybrid Virtual SAN configurations.
They make up the capacity of the Virtual SAN datastore in hybrid
configurations.
The number of magnetic disks is also a factor for stripe width. When
stripe width is specified in the VM Storage policy, components making
up the stripe will be placed on separate disks. If a particular stripe width
is required, then there must be the required number of disks available
across hosts in the cluster to meet the requirement. If the virtual
machine also has a failure to tolerate requirement in its policy, then
additional disks will be required on separate hosts, as each of the stripe
components will need to be replicated.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 3 9
Virtual SAN 6.2 Design and Sizing Guide
capacity and price. There are three magnetic disk types supported for
Virtual SAN:
SATA drives provide greater capacity than SAS drives for hybrid Virtual
SAN configurations. On the VCG for Virtual SAN currently, there are
4TB SATA drives available. The maximum size of a SAS drive at the
time of writing is 1.2TB. There is definitely a trade-off between the
numbers of magnetic disks required for the capacity layer, and how well
the capacity layer will perform. As previously mentioned, although they
provide more capacity per drive, SAS magnetic disks should be chosen
over SATA magnetic disks in environments where performance is
desired. SATA tends to less expensive, but do not offer the
performance of SAS. SATA drives typically run at 7200 RPM or slower.
SAS disks tend to be more reliable and offer more performance, but at a
cost. These are usually available at speeds up to 15K RPM (revolutions
per minute). The VCG lists the RPM (drive speeds) of supported drives.
This allows the designer to choose the level of performance required at
the capacity layer when configuring a hybrid Virtual SAN. While there is
no need to check drivers/firmware of the magnetic disks, the SAS or
SATA drives must be checked to ensure that they are supported.
Since SAS drives can perform much better than SATA, for performance
at the magnetic disk layer in hybrid configurations, serious
consideration should be given to the faster SAS drives.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 4 0
Virtual SAN 6.2 Design and Sizing Guide
Similarly, hybrid Virtual SAN configurations target a 90% read cache hit
rate. That means 10% of reads are going to be read cache misses, and
these blocks will have to be retrieved from the spinning disks in the
capacity layer. Once again, having multiple disk spindles can speed up
these read operations.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 4 1
Virtual SAN 6.2 Design and Sizing Guide
At this point, capacity is being sized for failure. However, there may be
a desire to have enough capacity so that, in the event of a failure,
Virtual SAN can rebuild the missing/failed components on the
remaining capacity in the cluster. In addition, there may be a desire to
have full availability of the virtual machines when a host is taken out of
the cluster for maintenance.
Note that this will only be possible if there are more than 3 nodes in the
cluster. If it is a 3-node cluster only, then Virtual SAN will not be able to
rebuild components in the event of a failure. Note however that Virtual
SAN will handle the failure and I/O will continue, but the failure needs to
be resolved before Virtual SAN can rebuild the components and
become fully protected again.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 4 2
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 4 3
Virtual SAN 6.2 Design and Sizing Guide
All the disks in a disk group are formatted with an on-disk file system. If
the on-disk format is version 1, formatting consumes a total of 750 MB
to 1GB of capacity per disk. In Virtual SAN 6.0, administrators can use
either v1 (VMFS-L) or v2 (VirstoFS). Formatting overhead is the same
for on-disk format v1 in version 6.0, but overhead for on-disk format v2
is different and is typically 1% of the drive’s capacity. This needs to be
considered when designing Virtual SAN capacity requirements. The
following table provides an estimation on the overhead required.
There is no support for the v2 on-disk format with Virtual SAN version
5.5. The v2 format is only supported on Virtual SAN version 6.0. This
overhead for v2 is very much dependent on how fragmented the user
data is on the file system. In practice what has been observed is that the
metadata overhead is typically less than 1% of the physical disk capacity.
Virtual SAN v3 introduces deduplication. Metadata overhead is highly
variable and will depend on your data set.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 4 4
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 4 5
Virtual SAN 6.2 Design and Sizing Guide
In Virtual SAN 6.0, and on-disk format (v2), there have been major
enhancements to the snapshot mechanism, making virtual machine
snapshots far superior than before. Virtual SAN 6.0 fully supports 32
snapshots per VMDK with the v2 on-disk format. The new snapshot
mechanism on v2 uses a new ‘‘vsanSparse’’ format. However, while
these new snapshots outperform the earlier version, there are still some
design and sizing concerns to consider.
When sizing cache for Virtual SAN 6.0 hybrid configurations, a design
must take into account potential heavy usage of snapshots. Creating
multiple, active snapshots may exhaust cache resources quickly,
potentially impacting performance. The standard guidance of sizing
cache to be 10% of consumed capacity may need to be increased to 15%
or greater, especially with demanding snapshot usage.
Cache usage for virtual machine snapshots is not a concern for Virtual
SAN 6.0 all-flash configurations.
If the on-disk format is not upgraded to v2 when Virtual SAN has been
upgraded from version 5.5 to 6.0, and the on-disk format remains at v1,
then the older (redo log) snapshot format is used, and the
considerations in VMware KB article 1025279 continue to apply.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 4 6
Virtual SAN 6.2 Design and Sizing Guide
Virtual SAN supports multiple controllers per ESXi host. The maximum
number of disks per host is 35 (7 disks per disk group, 5 disk groups per
host). Some controllers support 16 ports and therefore up to 16 disks
can be placed behind one controller. The use of two such controllers in
one host will get close to the maximums. However, some controllers
only support 8 ports, so a total of 4 or 5 controllers would be needed to
reach the maximum.
With a single controller, all devices in the host will be behind the same
controller, even if there are multiple disks groups deployed on the host.
Therefore a failure of the controller will impact all storage on this host.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 4 7
Virtual SAN 6.2 Design and Sizing Guide
does this reduce the failure domain should a single controller fail, but
this configuration also improves performance.
Design decision: Multiple storage I/O controllers per host can reduce
the failure domain, but can also improve performance.
There are two important items displayed by the VCG for storage I/O
controllers that should be noted. The first of these is ‘‘features’’ and the
second is queue depth.
The second important item is the ‘‘feature’’ column that displays how
Virtual SAN supports physical disk presentation to Virtual SAN. There
are entries referring to RAID 0 and pass-through. Pass-through means
that this controller can work in a mode that will present the magnetic
disks directly to the ESXi host. RAID 0 implies that each of the magnetic
disks will have to be configured as a RAID 0 volume before the ESXi
host can see them. There are additional considerations with RAID 0. For
example, an administrator may have to take additional manual steps
replacing a failed drive. These steps include rebuilding a new RAID 0
volume rather than simply plugging in a replacement empty disk into
the host and allowing Virtual SAN to claim it.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 4 8
Virtual SAN 6.2 Design and Sizing Guide
• Avoiding a known drive failure issue when Dell PERC H730 controller is used
with VMware Virtual SAN 5.5 or 6.0 (2135494)
• Using a Dell Perc H730 controller in an ESXi 5.5 or ESXi 6.0 host displays IO
failures or aborts, and reports unhealthy VSAN disks (2109665)
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 4 9
Virtual SAN 6.2 Design and Sizing Guide
Therefore, having one very large disk group with a large flash device
and lots of capacity might mean that a considerable amount of data
needs to be rebuilt in the event of a failure. This rebuild traffic could
impact the performance of the virtual machine traffic. The length of
time to rebuild the components is also a concern because virtual
machines that have components that are being rebuilt are exposed to
another failure occurring during this time.
Often times the cost of implementing multiple disk groups is not higher.
If the cost of 2 x 200GB solid-state devices is compared to 1 x 400GB
solid-state device, the price is very often similar. Also worth considering
is that two cache devices in two disk groups on the same host can
provide significantly higher IOPS than one cache device in one disk
group.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 5 0
Virtual SAN 6.2 Design and Sizing Guide
However, if there were multiple disk groups per host, and if there is
sufficient capacity in the other disk group on the host when the flash
cache device fails, Virtual SAN would be able to rebuild the affected
components in the remaining disk group. This is another consideration
to keep in mind if planning to deploy 2node and 3-node Virtual SAN
clusters.
One other consideration is that although Virtual SAN might have the
aggregate space available on the cluster to accommodate this large
size VMDK object, it will depend on where this space is available and
whether or not this space can be used to meet the requirements in the
VM storage policy.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 5 1
Virtual SAN 6.2 Design and Sizing Guide
For example, in a 3 node cluster which has 200TB of free space, one
could conceivably believe that this should accommodate a VMDK with
62TB that has a NumberOfFailuresToTolerate=1 (2 x 62TB = 124TB).
However if one host has 100TB free, host two has 50TB free and host
three has 50TB free, then this Virtual SAN will not be able to
accommodate this request.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 5 2
Virtual SAN 6.2 Design and Sizing Guide
In the case of a flash cache device failure, since this impacts the whole
of the disk group, Virtual SAN will need additional capacity in the
cluster to rebuild all the components of that disk group. If there are
other disk groups on the same host, it may try to use these, but it may
also use disk groups on other hosts in the cluster. Again, the aim is for a
balanced cluster. If a disk group fails, and it has virtual machines
consuming a significant amount of disk space, a lot of spare capacity
needs to be found in order to rebuild the components to meet the
requirements placed in the VM Storage Policy.
Since the most common failure is a host failure, that is what should be
sized for from a capacity perspective.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 5 3
Virtual SAN 6.2 Design and Sizing Guide
group via the UI (which includes a disk evacuation option in version 6.0)
and then the drive can be ejected and replaced with a new one. Certain
controllers, especially when they are using RAID 0 mode rather than
pass-through mode, require additional steps to get the drive discovered
when it the original is ejected and a new drive inserted. This operation
needs to be as seamless as possible, so it is important to consider
whether or not the controller chosen for the Virtual SAN design can
support plug-n-play operations.
If any physical device in the capacity layer reaches an 80% full threshold,
Virtual SAN will automatically instantiate a rebalancing procedure that
will move components around the cluster to ensure that all disks remain
below the 80% threshold. This procedure can be very I/O intensive, and
may impact virtual machine I/O while the rebalance is running.
Best practice: Try to maintain at least 30% free capacity across the
cluster to accommodate the remediation of components when a failure
occurs or a maintenance task is required. This best practice will also
avoid any unnecessary rebalancing activity.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 5 4
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 5 5
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 5 6
Virtual SAN 6.2 Design and Sizing Guide
In Virtual SAN 6.0, how quorum is computed has been changed. The
rule is no longer "more than 50% of components". Instead, in 6.0, each
component has a number of votes, which may be 1 or more. Quorum is
now calculated based on the rule that "more than 50% of votes" is
required. It then becomes a possibility that components are distributed
in such a way that Virtual SAN can still guarantee failures-to-tolerate
without the use of witnesses. However many objects will still have a
witness in 6.0.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 5 7
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 5 8
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 5 9
Virtual SAN 6.2 Design and Sizing Guide
This next screenshot is taken from the VMDK --- Hard disk 1. It
implements both the stripe width (RAID 0) and the failures to tolerate
(RAID 1) requirements. There are a total of 5 components making up
this object; two components are striped, and then mirrored to another
two-way stripe. Finally, the object also contains a witness component
for quorum decisions.
Note: The location of the Physical Disk Placement view has changed
between versions 5.5 and 6.0. In 5.5, it is located under the Manage tab.
In 6.0, it is under the Monitor tab.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 6 0
Virtual SAN 6.2 Design and Sizing Guide
this striping would be across magnetic disks. In the case of all-flash, the
striping would be across whatever flash devices are making up the
capacity layer.
However, for the most part, VMware recommends leaving striping at the
default value of 1 unless performance issues that might be alleviated by
striping are observed. The default value for the stripe width is 1 whereas
the maximum value is 12.
There are two main sizing considerations when it comes to stripe width.
The first of these considerations is if there are enough physical devices
in the various hosts and across the cluster to accommodate the
requested stripe width, especially when there is also a
NumberOfFailuresToTolerate value to accommodate.
The second consideration is whether the value chosen for stripe width is
going to require a significant number of components and consume the
host component count. Both of these should be considered as part of
any Virtual SAN design, although considering the increase in maximum
component count in 6.0 with on-disk format v2, this realistically isn’t a
major concern anymore. Later, some working examples will be looked
at which will show how to take these factors into consideration when
designing a Virtual SAN cluster.
Previously we mentioned the 10% rule for flash cache sizing. This is
used as a read cache and write buffer in hybrid configurations, and as a
write buffer only for all-flash configurations, and is distributed fairly
amongst all virtual machines. However, through the use of VM Storage
Policy setting FlashReadCacheReservation, it is possible to dedicate a
portion of the read cache to one or more virtual machines.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 6 1
Virtual SAN 6.2 Design and Sizing Guide
For hybrid configurations, this setting defines how much read flash
capacity should be reserved for a storage object. It is specified as a
percentage of the logical size of the virtual machine disk object. It
should only be used for addressing specifically identified read
performance issues. Other virtual machine objects do not use this
reserved flash cache capacity.
Unreserved flash is shared fairly between all objects, so for this reason
VMware recommends not changing the flash reservation unless a
specific performance issue is observed. The default value is 0%,
implying the object has no read cache reserved, but shares it with other
virtual machines. The maximum value is 100%, meaning that the amount
of reserved read cache is the same size as the storage object (VMDK).
In this hybrid Virtual SAN example, the customer has set the VM
Storage Policy --- FlashReadCacheReservation to 5% for all the virtual
machine disks. Remember that 70% of flash is set aside for read cache
in hybrid configurations.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 6 2
Virtual SAN 6.2 Design and Sizing Guide
For ‘‘n’’ failures tolerated, "n+1" copies of the object are created and
"2n+1" hosts contributing storage are required. The default value for
NumberOfFailuresToTolerate is 1. This means that even if a policy is not
chosen when deploying a virtual machine, there will still be one replica
copy of the virtual machine’s data. The maximum value for
NumberOfFailuresToTolerate is 3.
Virtual SAN 6.0 introduces the concept of fault domains. This allows
Virtual SAN to tolerate not just host failures, but also environmental
failures such as rack, switch and power supply failures by locating
replica copies of data in different locations. When working with fault
domains, to tolerate ‘‘n’’ number of failures, "n+1" copies of the object
are once again created but now "2n+1" fault domains are required. Each
fault domain must contain at least one host contributing storage. Fault
domains will be discussed in more detail shortly.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 6 3
Virtual SAN 6.2 Design and Sizing Guide
across the cluster. If the number is set to 2, three mirror copies are
created; if the number is set to 3, four copies are created.
Force Provisioning
Virtual SAN will attempt to find a placement that meets all requirements.
If it cannot, it will attempt a much simpler placement with requirements
reduced to FTT=0, SW=1, FRCR=0. This means Virtual SAN will attempt
to create an object with just a single mirror. Any
ObjectSpaceReservation (OSR) policy setting is still honored.
Virtual SAN does not gracefully try to find a placement for an object
that simply reduces the requirements that can't be met. For example, if
an object asks for FTT=2, if that can't be met, Virtual SAN won't try
FTT=1, but instead immediately tries FTT=0.
Similarly, if the requirement was FTT=1, SW=10, but Virtual SAN doesn't
have enough capacity devices to accommodate SW=10, then it will fall
back to FTT=0, SW=1, even though a policy of FTT=1, SW=1 may have
succeeded.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 6 4
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 6 5
Virtual SAN 6.2 Design and Sizing Guide
The Monitor > Virtual SAN > Physical Disks view will display the amount
of used capacity in the cluster. This screen shot is taken from a 5.5
configuration. Similar views are available on 6.0.
There are cases where an administrator will want to limit the maximum
amount of IOPS that are available to an object or virtual machine. There
are two key use cases for this functionality
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 6 6
Virtual SAN 6.2 Design and Sizing Guide
Prior to Virtual SAN 6.2, RAID-1 (Mirroring) was used as the failure
tolerance method. Virtual SAN 6.2 adds RAID-5/6 (Erasure Coding) to
all-flash configurations. While mirroring techniques excel in workloads
where performance is the most important factor, they are expensive in
terms of capacity required. RAID-5/6 (Erasure Coding) data layout can
be configured to help ensure the same levels of availability, while
consuming less capacity than RAID-1 (Mirroring)
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 6 7
Virtual SAN 6.2 Design and Sizing Guide
Note that the failure tolerance method in the rule set must be set to
RAID5/6 (Erasure Coding).
For FTT=1 the storage overhead will be 1.33X rather than 2X. In this case
a 20GB VMDK would use on 27GB instead of the 40GB traditionally
used by RAID-1.
For FTT=2 the storage overhead will be 2X rather than 3X. In this case
as 20GB VMDK will use 40GB instead of 60GB.
For more guidance on which workloads will benefit from erasure coding,
see VMware Virtual SAN 6.2 Space Efficiency Technologies.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 6 8
Virtual SAN 6.2 Design and Sizing Guide
VM Home Namespace
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 6 9
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 7 0
Virtual SAN 6.2 Design and Sizing Guide
why a minimum of three hosts with local storage is required for Virtual
SAN.
VM Swap
The virtual machine swap object also has its own default policy, which is
to tolerate a single failure. It has a default stripe width value, is thickly
provisioned, and has no read cache reservation.
The VM Swap object does not inherit any of the setting in the VM
Storage Policy. With one exception it always uses the following settings:
Note that the VM Swap object is not visible in the UI when VM Storage
Policies are examined. Ruby vSphere Console (RVC) commands are
required to display policy and capacity information for this object.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 7 1
Virtual SAN 6.2 Design and Sizing Guide
Delta disks, which are created when a snapshot is taken of the VMDK
object, inherit the same policy settings as the base disk VMDK.
Note that delta disks are also not visible in the UI when VM Storage
Policies are examined. However the VMDK base disk is visible and one
can deduce the policy setting for the snapshot delta disk from the
policy of the base VMDK disk. This will also be an important
consideration when correctly designing and sizing Virtual SAN
deployments.
Snapshot memory
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 7 2
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 7 3
Virtual SAN 6.2 Design and Sizing Guide
For example, Virtual SAN will not move components around hosts or
disks groups to allow for the provisioning of a new replica, even though
this might free enough space to allow the new virtual machine to be
provisioned. . Therefore, even though there may be enough free space
overall in the cluster, most of the free space may be on one node, and
there may not be enough space on the remaining nodes to satisfy the
replica copies for NumberOfFailuresToTolerate.
Best practice: In Virtual SAN 5.5, always deploy virtual machines with a
policy. Do not use the default policy if at all possible. This is not a
concern for Virtual SAN 6.0, where the default policy has settings for all
capabilities.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 7 4
Virtual SAN 6.2 Design and Sizing Guide
CPU considerations
- Desired sockets per host
- Desired cores per socket
- Desired number of VMs and thus how many virtual CPUs (vCPUs)
required
- Desired vCPU-to-core ratio
- Provide for a 10% CPU overhead for Virtual SAN
Memory considerations
- Desired memory for VMs
- A minimum of 32GB is required per ESXi host for full Virtual SAN
functionality (5 disk groups, 7 disks per disk group)
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 7 5
Virtual SAN 6.2 Design and Sizing Guide
- When USB and SD devices are used for boot devices, the logs and
traces reside in RAM disks which are not persisted during reboots
o Consider redirecting logging and traces to persistent
storage when these devices are used as boot devices
o VMware does not recommend storing logs and traces on
the Virtual SAN datastore. These logs may not be
retrievable if Virtual SAN has an issue which impacts access
to the Virtual SAN datastore. This will hamper any
troubleshooting effort.
o VMware KB article 1033696 has details on how to redirect
scratch to a persistent datastore.
o To redirect Virtual SAN traces to a persistent datastore,
esxcli vsan trace set command can be used. Refer to the
vSphere command line documentation for further
information.
- Virtual SAN traces are written directly to SATADOMs devices;
there is no RAMdisk used when SATADOM is the boot device.
Therefore, the recommendation is to use an SLC class device for
performance and more importantly endurance.
Assume a six-node cluster, and that there are 100 virtual machines
running per ESXi host in a cluster, and overall they consume 2,000
components each. In Virtual SAN 5.5, there is a limit of 3000
components that a host can produce. If all hosts in the cluster were to
equally consume components, all hosts would consume ~2,000
components to have 100 running VMs in the above example. This will
not give rise to any issues.
Now assume that in the same six-node Virtual SAN cluster, only three
hosts has disks contributing to the Virtual SAN datastore and that the
other three hosts are compute-only. Assuming Virtual SAN achieves
perfect balance, every host contributing storage would now need to
produce 4,000 components for such a configuration to work. This is
not achievable in Virtual SAN 5.5, so care must be taken when
deploying virtual machines to Virtual SAN clusters in which not all hosts
contribute storage.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 7 6
Virtual SAN 6.2 Design and Sizing Guide
While the number of components per host has been raised to 9,000 in
Virtual SAN 6.0, the use of compute-only hosts can lead to unbalanced
configurations, and the inability to provision the maximum number of
virtual machines supported by Virtual SAN.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 7 7
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 7 8
Virtual SAN 6.2 Design and Sizing Guide
number of disk slots available on the hosts. The same is true for rack
mount hosts that are limited by disk slots by the way.
Once again, if the plan is to use external storage enclosures with Virtual
SAN, ensure the VCG is adhered to with regards to versioning for these
devices.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 7 9
Virtual SAN 6.2 Design and Sizing Guide
3-node configurations
While Virtual SAN fully supports 2-node and 3-node configurations,
these configurations can behave differently than configurations with 4
or greater nodes. In particular, in the event of a failure, there are no
resources to rebuild components on another host in the cluster to
tolerate another failure. Also with a 2-node and 3-node configurations,
there is no way to migrate all data from a node during maintenance.
vSphere HA considerations
Virtual SAN, in conjunction with vSphere HA, provide a highly available
solution for virtual machine workloads. If the host that fails is not
running any virtual machine compute, then there is no impact to the
virtual machine workloads. If the host that fails is running virtual
machine compute, vSphere HA will restart those VMs on the remaining
hosts in the cluster.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 8 0
Virtual SAN 6.2 Design and Sizing Guide
Administrators should note that Virtual SAN does not interoperate with
vSphere HA to ensure that there is enough free disk space on the
remaining hosts in the cluster. Instead, after a period of time (60
minutes by default) has elapsed after a host failure, Virtual SAN will try
to use all the remaining space on the remaining hosts and storage in the
cluster to make the virtual machines compliant. This could involve the
creation of additional replicas & stripes. Caution and advanced planning
is imperative on Virtual SAN designs with vSphere HA as multiple
failures in the Virtual SAN cluster may fill up all the available space on
the Virtual SAN due to over-commitment of resources.
Best practice: Enable HA with Virtual SAN for the highest possible level
of availability. However, any design will need to include additional
capacity for rebuilding components
Fault Domains
The idea behind fault domains is that we want to be able to tolerate
groups of hosts (chassis or racks) failing without requiring additional
data copies. The implementation allows Virtual SAN to save replica
copies of the virtual machine data in different domains, for example,
different racks of compute.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 8 1
Virtual SAN 6.2 Design and Sizing Guide
In Virtual SAN 6.2 RAID 5-6 Fault tolerance methods added addition
considerations. (Explain, use chart)
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 8 2
Virtual SAN 6.2 Design and Sizing Guide
Take the following example where there are 8 hosts in the Virtual SAN
cluster, split across 4 racks. Let’s assume that there are 2 ESXi hosts in
each rack. When a virtual machine that tolerates 1 failure is deployed, it
is possible for both replicas to be deployed to different hosts in the
same rack.
The same holds true in Virtual SAN 6.0 when fault domains are not
enabled. However if fault domains are enabled, this allows hosts to be
grouped together to form a fault domain. This means that no two
copies/replicas of the virtual machine’s data will be placed in the same
fault domain. To calculate the number of fault domains required to
tolerate failures, use the same equation as before; when deploying a
virtual machine with a NumberOfFailuresToTolerate = 1 on a cluster with
fault domains, 2n + 1 fault domains (containing 1 or more hosts
contributing storage) is required.
Let’s consider the previous example, but now with 4 fault domains
configured.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 8 3
Virtual SAN 6.2 Design and Sizing Guide
Previously the need to plan for 1 host failure was discussed, where 1 host
worth of additional space is need to rebuild failed or missing
components. With fault domain failures, one additional fault domain
worth of additional space is needed to rebuild missing components.
This is true for compute as well. In such a scenario, 1 fault domain worth
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 8 4
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 8 5
Virtual SAN 6.2 Design and Sizing Guide
But not all applications are cache-friendly all of the time. An example
could be a full database scan, a large database load, a large content
repository, backups and restores, and similar workload profiles.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 8 6
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 8 7
Virtual SAN 6.2 Design and Sizing Guide
For more information and architectural details, please refer to the paper
here at http://labs.vmware.com/academic/publications/view-vmtj-
winter2012
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 8 8
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 8 9
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 9 0
Virtual SAN 6.2 Design and Sizing Guide
The estimation is that the Guest OS and application will consume 50%
of the storage. However, the requirement is to have enough storage to
allow VMs to consume 100% of the storage eventually.
Taking into account the considerations above, the calculation for a valid
configuration would be as follows:
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 9 1
Virtual SAN 6.2 Design and Sizing Guide
Since all VMs are thinly provisioned on the Virtual SAN datastore, the
estimated storage consumption should take into account the thin
provisioning aspect before the flash requirement can be calculated:
• Required slack space: 30% (slack space should be 30% of the raw
storage requirements after the disks have been formatted)
• Raw Formatted Storage Capacity = Raw Storage Requirements +
30% Slack Space
o Or written another way:
• Raw Storage Requirement = 70% of Raw Formatted
Requirements
• Raw Formatted Storage Capacity = Raw Storage
Requirements/0.7
• Raw Formatted Storage Capacity = 21.6/0.7
• Raw Formatted Storage Capacity = 30.9TB
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 9 2
Virtual SAN 6.2 Design and Sizing Guide
CPU Configuration
This is more that enough for our 44 core requirement across 3 servers.
It also meets the requirements of our virtual machines should one host
fail, and all VMs need to run on just two hosts without any impact to
their CPU performance.
Memory Configuration
This also provides a 10% overhead for ESXi and Virtual SAN from a
memory perspective. Virtual SAN designers will need to ensure that the
server has enough DIMM slots for this memory requirement.
Storage Configuration
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 9 3
Virtual SAN 6.2 Design and Sizing Guide
The next consideration is setting aside some space for rebuilding the
virtual machine objects and components in the event of a failure. Since
there are only have 3 hosts in this cluster, components cannot be rebuilt
since there are not enough hosts. This would definitely be a
consideration for larger configurations, where rebuilding components
could create additional copies and once again allow the cluster to
tolerate host failures. But in a 3-node cluster where one node has
already failed, we cannot tolerate another failure. If we wish to proceed
with this requirement, one additional host with matching capacity
would need to be added to the cluster.
At this point there is some leeway over how to configure the hosts;
design decisions include whether or not there is a desire for one or
more disk groups; how many magnetic disks per disk group; and so on.
Also one should consider whether to use SAS, SATA or NL-SAS
magnetic disks. Also should PCIe flash devices or solid state drives be
chosen. As previously mentioned, SAS and SATA offer performance
traded off against price. A similar argument could be made for PCIe
flash versus SSD.
• 10.5TB Magnetic Disk required => 11 x 1TB SAS 10K RPM per host
• 200GB Flash required => 2 x 100GB SAS SSD per host
Why did we choose 2 x 100GB flash devices rather than 1 x 200GB flash
device? The reason is that we can only have a maximum of seven
capacity devices in a disk group. In this configuration, we have more
than seven capacity devices, thus we need two disk groups. Each disk
group must contain a flash device, thus we choose two smaller devices.
Component Count
The next step is to check whether or not the component count of this
configuration would exceed the 3,000 components per host maximum
in Virtual SAN 5.5, or the 9,000 components per host maximum in
Virtual SAN 6.0 (disk format v2). This 3-node Virtual SAN cluster
supports running 100 virtual machines, each virtual machine containing
a single VMDK. There is no snapshot requirement in this deployment.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 9 4
Virtual SAN 6.2 Design and Sizing Guide
This means that each virtual machine will have the following objects:
• 1 x VM Home Namespace
• 1 x VMDK
• 1 x VM Swap
• 0 x Snapshot deltas
This implies that there 3 objects per VM. Now we need to work out how
many components per object, considering that we are using a VM
Storage Policy setting that contains Number of Host Failures to Tolerate
= 1 (FTT). It should be noted that only the VM Home Namespace and
the VMDK inherit the FTT setting; the VM Swap Object ignores this
setting but still uses FTT=1. Therefore when we look at the number of
components per object on each VM, we get the following:
• 2 x VM Home Namespace + 1 witness
• 2 x VMDK + 1 witness
• 2 x VM Swap + 1 witness
• 0 x Snapshot deltas
The estimation is that the Guest OS and application will consume 75% of
the storage. The VM Storage Policy setting is HostFailuresToTolerate
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 9 5
Virtual SAN 6.2 Design and Sizing Guide
(FTT) set to 1 and StripeWidth set to 2. All other policy settings are left
at the defaults. The ESXi hosts will boot from disk.
Taking into account the considerations above, the calculation for a valid
configuration would be as follows:
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 9 6
Virtual SAN 6.2 Design and Sizing Guide
CPU Configuration
In this example, the customer requires 100 cores overall. If we take the
10% Virtual SAN overhead, this brings the total number of cores to 110.
The customer has sourced servers that contain 12 cores per socket, and
a dual socket system provides 24 cores. That gives a total of 120 cores
across the 5-node cluster. This is more that enough for our 110 core
requirement. However, this does not meet the requirements of our
virtual machines should one host fail, and all VMs need to run on just
four hosts without any impact to their CPU performance. Therefore, a
customer may decide that a 6-node cluster is preferable in this
configuration. But this will be highly dependent on whether this number
of nodes can accommodate the large storage capacity requirement in
this design.
Memory Configuration
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 9 7
Virtual SAN 6.2 Design and Sizing Guide
per server. This also provides a 10% overhead for ESXi and Virtual SAN
from a memory perspective. Designers will need to ensure that the
server has enough DIMM slots for this memory requirement.
A choice needs to be made --- the design will need to choose between
SAS, SATA or NL-SAS for magnetic disks. However, SATA may not be
suitable choice if performance of the magnetic disk layer is a
requirement. Another design decision is the size that should be chosen.
Finally, a server will need to be chosen that can accommodate the
number of disks needed to meet the capacity requirement.
If multiple disk groups are required, the design will need to ensure that
no limits are hit with number disks per diskgroup, or the number of disk
groups per host limit. Refer back to the limits section in the earlier part
of this document for actual maximums.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 9 8
Virtual SAN 6.2 Design and Sizing Guide
In option 1, the capacity requirement for this Virtual SAN design could
be achieved by using 16 x 4TB SATA 7200 RPM per host. However
these drives may not achieve the desired performance for the end-
solution.
Again, there are choices to be made with regards to disk types. Options
supported for Virtual SAN are SAS, SATA or NL-SAS as already
mentioned.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 9 9
Virtual SAN 6.2 Design and Sizing Guide
o This will mean a total of 315 x 1.2TB SAS 10K RPM drives are
needed across the cluster. This is now the most important
consideration for the design. One needs to consider how
many hosts are needed to accommodate this storage
requirement.
o With a maximum of 7 disks per disk group and 5 disk
groups per host, this equates to 35 x 1.2TB of storage that
can be provided per host. This equates to 42TB per host.
o At a minimum, 10 hosts would be required to meet this
requirement. Of course, CPU and memory requirements
now need to be revisited and recalculated. This implies that
a less powerful host could be used for the cluster design.
o This many disks may entail the purchase of additional
controllers, or SAS extenders. Multiple controllers will offer
superior performance, but at a cost.
o Another design consideration is to use an external storage
enclosure to accommodate this many disks. Support for this
is introduced in version 6.0.
• 9TB cache required per cluster
o Given the fact that there are now 10 hosts in the cluster,
there will be ~1TB of flash per host distributed across the
cluster.
o Since there are 5 disk groups in each host, this requirement
could be easily met using 5 x 200GB flash devices for each
of the above disk groups.
o For future growth, we can give consideration to using a
larger flash device, as shown here.
o With 5 x 200GB SSD per host, a total of 40 disk slots is now
needed per host
• ESXi hosts boot from disk
o 41 disk slots per host now required
This design now needs servers that contain 41 disk slots for this rather
large configuration. In all likelihood, this design is looking at either
additional controllers, SAS externals or an external storage enclosure to
meet this design requirement. Support for external storage enclosures
was introduced in 6.0. The alternative is to purchase even more servers,
and distribute the storage across these servers. This would require a
redesign however.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 0 0
Virtual SAN 6.2 Design and Sizing Guide
Again, there are choices to be made with regards to flash disk types.
Options supported for Virtual SAN are SAS, SATA, PCI-Express and
NVMe.
Deduplication and compression will be used for the cluster and RAID-
5/6 will be chosen for the data disk.
The expected deduplication and compression ratio are 4X for the boot
disk as the virtual machines are cloned from a common template, and
2X for the data disks. In this case the application provider has required
memory overconsumption will not be used, so sparse swap will be
enabled to reclaim space used for swap.
• Raw Storage Requirements for the boot disk (with FTT=1 RAID-5,
dedupe and compression Ratio =4X): *
o = 100GB*1.33 (FTT=1 RAID-5)
o = 133GB/4 (Dedupe and Compression = 4X)
o = 33.25GB
• Raw Storage Requirements for the Data disk (with FTT=1 RAID-5,
no deduplication or compression): *
o = 200GB*1.33 (FTT=1,RAID-5)
o = 266GB/2 (Dedupe and Compression = 2X)
o = 133GB
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 0 1
Virtual SAN 6.2 Design and Sizing Guide
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 0 2
Virtual SAN 6.2 Design and Sizing Guide
This design now needs servers that contain 7 disk slots in keeping with
the 7 host requirement. This design would allow dense (4 server per
2RU) configurations.
Component Count
The final check is to see whether or not the component count of this
configuration would exceed the 3,000 components per host maximum
in Virtual SAN 5.5 or the 9,000 components per host maximum in
Virtual SAN 6.0.
This Virtual SAN cluster has a requirement to run 400 virtual machines,
each virtual machine containing a single VMDK. There is also a 2
snapshot per virtual machine requirement in this deployment.
This means that each virtual machine will have the following object:
• 1 x VM Home Namespace
• 2 x VMDK
• 1 x VM Swap
• 2 x Snapshot deltas
This implies that there 6 objects per VM. Now we need to work out how
many components per object, considering that we are using a VM
Storage Policy setting that contains Number of Host Failures to Tolerate
= 1 (FTT) and Stripe Width = 2.
It should be noted that only the VM Home Namespace, the VMDK and
the snapshot deltas inherit the FTT setting; the VM Swap Object ignores
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 0 3
Virtual SAN 6.2 Design and Sizing Guide
this setting. Only the VMDK and snapshot deltas inherit the VM Storage
Policy
Taking option 1, the smallest configuration, and split this across the 7
hosts in the Virtual SAN cluster, the calculation shows that there are
1,943 components per host. This is well within the limits of 3,000
components per host in 5.5 and 9,000 components per host in 6.0, so
we are good.
The next step is to check if the cluster can still handle the same number
of components in the event of a host failure. 12,800 components spread
across 6 hosts implies a total of 2,267 components per host so we see
that the design can tolerate a host failure and a rebuild of the missing
components on the remaining 6 hosts in the cluster.
Server choice
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 0 4
Virtual SAN 6.2 Design and Sizing Guide
For Option (3) a customer’s server choice is a Dell FX2 Utilizing FC The
FC 630 paired with FD332’s can meet the drive bay, core and DIMM
requirements. Additionally, if fewer hosts are to be considered the
FC830 quad socket system could be leveraged for denser RU/compute
configurations.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 0 5
Virtual SAN 6.2 Design and Sizing Guide
Conclusion
Although most Virtual SAN design and sizing exercises are
straightforward, careful planning at the outset can avoid problems later.
- Not properly sizing cache for capacity growth (e.g. thin volumes
progressively getting fatter), resulting in declining performance
over time.
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 0 6
Virtual SAN 6.2 Design and Sizing Guide
Further Information
VMware Ready Nodes
• http://www.vmware.com/resources/compatibility/search.php?de
viceCategory=vsan
Key bloggers
• http://cormachogan.com/vsan/
• http://www.yellow-bricks.com/virtual-san/
• http://www.virtuallyghetto.com/category/vsan
• http://www.punchingclouds.com/tag/vsan/
• http://blogs.vmware.com/vsphere/storage
• http://www.thenicholson.com/vsan
VMware support
• https://my.vmware.com/web/vmware/login
• http://kb.vmware.com/kb/2006985 - How to file a Support
Request
• http://kb.vmware.com/kb/1021806 - Location of VMware Product
log files
• http://kb.vmware.com/kb/2032076 - Location of ESXi 5.x log file
• http://kb.vmware.com/kb/2072796 - Collecting Virtual SAN
support logs
Additional Reading
• http://blogs.vmware.com/vsphere/files/2014/09/vsan-sql-
dvdstore-perf.pdf - Microsoft SQL Server Performance Study
V M w a re S to ra g e a n d A v a i l a b i l i ty Do c u m e n t a ti o n / 1 0 7
Virtual SAN 6.2 Design and Sizing Guide
• http://www.vmware.com/files/pdf/products/vsan/VMW-TMD-
Virt-SAN-Dsn-Szing-Guid-Horizon-View.pdf - Design & Sizing
Guide for Horizon View VDI
• http://www.vmware.com/files/pdf/products/vsan/VMware-
Virtual-SAN-Network-Design-Guide.pdf - Virtual SAN Network
Design Guide
VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-92 73 Fax 650-427-5001 www.vm wa re.co m
Copyright © 2012 VMware, Inc. All rights reserved. T his product is protected by U.S. and international copyright and intellectual propert y laws.
V M w a re
VMware products are covered by one or more patents listed at http://www S to
.vm wa ra g e a n d ntsA. v VMware
re.com/go/pate a i l a b i lisi ty Do c u m
a registered e n t arkti oorn trademark
tradema / 1 0 of 8
VMware, Inc. in the United States and/or other jurisdiction. All other marks and names mentioned herein may be trademarks of their respective
companies.