Virtualization on the IBM System x3950 Server
Virtualization on the IBM System x3950 Server
Virtualization on the
IBM System x3950
Server
Covers VMware ESX Server 2.5.2 & 3.0
and Microsoft Virtual Server R2
Massimo Re Ferre’
Don Pomeroy
Mats Wahlstrom
David Watts
ibm.com/redbooks
International Technical Support Organization
June 2006
SG24-7190-00
Note: Before using this information and the product it supports, read the information in
“Notices” on page vii.
This edition applies to the IBM System x3950 and the IBM Eserver xSeries 460, VMware ESX
Server 2.5.2 and 3.0, Microsoft Virtual Server R2, IBM Director 5.10, and IBM Virtual Machine
Manager 2.1.
Note: This book is based on a pre-GA version of a product and may not apply when the
product becomes generally available. We recommend that you consult the product
documentation or follow-on versions of this redbook for more current information.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Contents v
Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service that
does not infringe any IBM intellectual property right may be used instead. However, it is the user's
responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not give you any license to these patents. You can send license
inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such provisions
are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES
THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer
of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may
make improvements and/or changes in the product(s) and/or the program(s) described in this publication at
any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm
the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on
the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrates programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the
sample programs are written. These examples have not been thoroughly tested under all conditions. IBM,
therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy,
modify, and distribute these sample programs in any form without payment to IBM for the purposes of
developing, using, marketing, or distributing application programs conforming to IBM's application
programming interfaces.
IPX, Java, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States,
other countries, or both.
Microsoft, Visual Basic, Windows server, Windows NT, Windows, and the Windows logo are trademarks of
Microsoft Corporation in the United States, other countries, or both.
Intel, Itanium, Pentium, Xeon, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or
registered trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
Between the server hardware and the operating systems that will run the
applications is a virtualization layer of software that manages the entire system.
The two main products in this field are VMware ESX Server and Microsoft Virtual
Server.
This IBM Redbook discusses the technology behind virtualization, the x3950
technology, and the two virtualization software products. We also discuss how to
properly manage the solution as though they all were a pool of resources with
Virtual Machine Manager, a unique consistent management interface.
This Redbook does not make any comparison between ESX Server and Virtual
Server. Instead, we assume that you have already decided one over the other
and are interested in learning more about the planning and implementation of
each particular product.
Don Pomeroy is a Senior Systems Administrator for Avnet Inc, one of the
world's largest B2B distributors of semiconductors, interconnect, passive and
electromechanical components, enterprise network and computer equipment
from leading manufacturers, and an IBM Business Partner and customer. His
primary focus for the last two years has been server consolidation and
virtualization with VMware products. He has the following certifications; A+,
Network+, Security+, IBM Certified Specialist for xSeries, MCSA, and VCP.
Mats Whalstrom is a Senior IT Specialist within the IBM Systems & Technology
Group in Raleigh, with over 10 years of experience within the
Intel/AMD/PowerPC and storage arena. He is a senior member of the IBM IT
Specialist accreditation board and has previously worked for IBM in Greenock,
Scotland and Hursley, England. His current areas of expertise include
BladeCenter®, storage hardware and software solutions. Previous to working on
this redbook, he is a coauthor of the first and third editions of the redbook IBM
TotalStorage SAN File System.
From IBM:
Jay Bretzmann
Marco Ferretti
Susan Goodwin
Roberta Marchini
Allen Parsons
Deva Walksfar
Bob Zuber
From VMware:
Richard Allen
Peter Escue
John Hawkins
From Microsoft®:
Jim Ni
Mike Sterling
Jeff Woolsey
Others
Eran Yona, Israeli Ministry of Defense
Jennifer McConnell, Avnet
Members of the VMTN Forum
Preface xi
Your efforts will help increase product acceptance and customer satisfaction. As
a bonus, you'll develop a network of contacts in IBM development labs, and
increase your productivity and marketability.
Find out more about the residency program, browse the residency index, and
apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
Each microprocessor has its own instruction set, the instruction set architecture
(ISA). The instruction set is the language of the processors and different
microprocessors implement different instruction sets. This is the reason for which
an executable file or program can only run on a processor it has been compiled
for. Table 1-1 shows some examples.
As you can see in this table, although all CPUs are 64-bit CPUs and some of
them share the same architecture design, they are basically all different in terms
of the language used internally. The only exceptions to this are the Intel
Pentium/Xeon and the AMD Athlon/Opteron. These two are fully compatible
because they both implement the very same exact instruction set, the industry
standard x86 ISA.
Virtualization is where the instruction set that the guest operating systems
expect and the instruction set that the hardware uses are identical. The
advantage here is that instead of running a single operating system on top of the
physical system, you can now run a number of operating sytems on a single
physical system. The key, however, is that operating systems and applications
still have to be designed and compiled for that ISA that the physical system uses.
On the other hand, emulation is the mechanism where the instruction set of the
guest OS is translated on-the-fly to the instruction set of the physical platform.
This means that not only would you be able to run more than one operating
VMware ESX Server and Microsoft Virtual Server are examples of products that
provide this type of virtualization mechanism.
These are only examples. The key point is that the combination of CPU,
virtualization software, and guest OS all need to be consistent with regards to the
ISA.
Virtualization layer
Multi-purpose host
operating system
On the other hand a hypervisor solution does not require any underlying host OS
that supports the virtual machines. The hypervisor is a thin kernel that runs
directly on the hardware and it is optimized to run specific tasks such as
scheduling virtual machines.
Figure 1-4 shows the hypervisor architecture. The VMkernel that ships with
VMware ESX Server is an example of hypervisor.
Management
console
Hypervisor/virtualization layer
As shown in Figure 1-4, this management console is not part of the hypervisor
layer. During the boot process, the hypervisor takes complete control of the
hardware and the management console is simply a more privileged virtual
machine.
1.2.3 OS virtualization
An alternative to hardware virtualization is OS virtualization. While hardware
virtualization simulates an server hardware on which you can run various x86
guest operating systems, OS virtualization is a layer that simulates separate
operating system instances to the applications you want to run. This approach is
not as flexible as hardware virtualization, because all guest operating systems
must be the same type as the master OS. There are advantages to OS
virtualization including performance.
Both VMware ESX and Microsoft Virtual Server both implement a full
virtualization technology.
(Standard) guest OS
Standard x86 hardware calls
Hypervisor
x86 hardware
(Paravirtualized) guest OS
Specific virtual-aware hypervisor calls
Standard x86 hardware calls
Hypervisor
x86 hardware
32/64-bit hardware
64-bit hardware
As you can see, with this configuration, both 32-bit as well as 64-bit guest
operating systems can be run side-by-side.
1.3.1 VMware
VMware Inc. is based in Palo Alto, California and is one the companies that
pioneered x86 server virtualization technologies. They have grown their software
offering in the last few years from a simple virtualization layer product set to a
more complex product portfolio that includes management tools and add-on
features. The virtualization products VMware have available at the moment are:
VMware Workstation is a hosted virtualization solution for desktop PCs and
mobile computers. Typical usage scenarios range from software development
environments, demo and test, and similar end-users’ requirements.
VMware GSX Server is a hosted virtualization solution for departmental
servers. Typical usage scenarios are consolidation of servers in remote
branch offices and consolidation of islands of under-used servers in specific
departments.
VMware ESX Server is an enterprise product based on hypervisor
technologies aimed at the datacenter virtualization. The server is capable of
simultaneously hosting many different operating systems and applications.
The Console OS is provided as an interface to support the VMware ESX
Server kernel.
There are a number of other products and add-on features that VMware offers,
including these:
VMware VirtualCenter is a management software used to configure and
monitor properly the VMware ESX and VMware GSX platforms.
VMware VMotion is a feature that allows the administrator to move a running
virtual machine from a physical host to another physical host without
downtime.
1.3.2 Microsoft
Microsoft moved into the virtualization space when it acquired Connectix in 2003.
Connectix was one of the first companies to ship virtualization software offerings.
Since then Microsoft has incorporated products and assets from Connectix and
built its own virtualization product portfolio:
Microsoft Virtual PC is a hosted virtualization solution for desktop PCs and
mobile computers. Typical usage scenarios range from software development
environments, demo & test and similar end-users requirements.
Microsoft Virtual Server is a hosted virtualization solution for departmental
servers. Typical usage scenarios are consolidation of test and development
servers, existing Windows NT® 4.0 servers, and infrastructure services.
1.3.3 Xen
Although VMware and Microsoft are currently the two biggest players in the x86
virtualization realm, at least from a commercial point of view, it is worth
mentioning that they are not the only software vendors working in this field.
One interesting thing to note is that while both VMware and Microsoft are
implementing full virtualization and moving towards providing the choice to
customers to run in either full virtualization or paravirtualization mode, Xen has
used a different approach. Xen 3.x supports paravirtualized Linux guests (as did
Table 1-2 Summary of most common virtualization technologies (parenthesis indicates future plans)
Product VMware GSX ESX Microsoft MS Virtual XEN
Workstation Server Server Virtual PC Server
Notes:
1. The vendor has indicated that this is in plan for future releases
2. Microsoft has stated that MSVS will implement a hypervisor layer in a future release
It is easy to agree that when using 2-socket blade servers, you are implementing
a scale-out approach, and when using a single 16-socket x86 server you are
implementing a scale-up approach. However, it is not easy to agree on which
approach we are using if we compare 4-socket to 8-socket servers. We do not
pretend to be able to give you this definition, because this is largely determined
by the size of your whole infrastructure, your company strategies, and your own
attitudes.
For this discussion, we always refer to the extremes: 2-socket blade servers for
the scale-out approach and 16-socket x3950 for the scale-up approach. Although
4-way servers should fall into the scale-out category and 8-way configurations
should fall into the scale-up approach, this varies depending on your company’s
characteristics and infrastructure size.
Having said this, we also want to remark that this market is being segmented by
Intel in three big server realms:
1-way servers: Intel Pentium® 4
2-way servers: Intel Xeon
4-way+ servers: Intel Xeon MP
From this segmentation, it is easy to depict that 4-way, 8-way and 16-way
systems might all fall into the premium price range as opposed to the 2-way
platforms (we do not even consider 1-socket platforms for virtualization
purposes). The point here is that there is much more difference as far as the
price is concerned between two 2-way servers versus one 4-way server than
there is between two 4-way servers versus one 8-way server. There is a
common misunderstanding that servers capable of scaling at 8-socket and
above are much more expensive than servers capable of scaling at 4-socket
maximum. While this might be true for other industry implementations, the
modularity of the System x3950 allows a pricing policy that is very attractive even
for high-end configurations.
The IBM building block for a scale-up approach is the System x3950, which is a
modular, super-scalable server that can drive, potentially, as many as 32 CPUs
in a single system image. In our case, single system image means a single
instance of VMware ESX Server with a single Console OS (one ESX Server
system).
Storage server
4-way
x3050
4-way
x3950
4-way
x3950
4-way
x3950
Gbit Ethernet
In this scenario, all 16 sockets are configured so that ESX Server can be
installed once and drive all of the available resources. (Note that the number of
SAN/Ethernet connections in this picture is merely an example).
Advantages Disadvantages
Automatic in-box resource management Higher hardware costs (servers)
Fewer ESX Server images to manage Large availability impact if a node fails
VC/VMotion not strictly required Fewer CPUs supported "per rack"
Lower software costs (VMware)
Lower infrastructure costs
(Ethernet & SAN switches)
Figure 1-10 lists advantages and disadvantages upon which the industry
generally agrees. Of course, one reader might rate fewer ESX Server images to
manage as a key advantage while another reader might not be concerned about
having to deal with 30 ESX Server system images. Even if we could try to give
each of these a weight based on our experience, we realize that this will be
different from customer to customer.
With ESX Server implementations, this is not usually a concern. In fact, we are
not dealing here with a single application running on the high-end system but
rather with multiple applications (and OS images) running on the same high-end
systems. Because the virtualization layer is very efficient, if the server is not
being fully utilized you can add more virtual machines to result in more workload
for the server. This has proven to scale almost linearly.
All of our discussions about the IBM blades can apply to other 2-socket
traditional servers, either rack or tower, but we think that the use of blades makes
more sense in a scale-out implementation such as this.
Storage server
Windows Windows Red Hat
Windows Windows Red Hat
Windows Windows Red Hat
ESX ESX ESX ESX ESX ESX ESX ESX ESX ESX ESX ESX ESX ESX
SAN
Ethernet LAN
Blade
Blade
Blade
Blade
Blade
Blade
Blade
Blade
Blade
Blade
Blade
Blade
Blade
Advantages Disadvantages
Lower hardware costs (servers) Manual across-boxes resource management
Low H.A. impact in case of failure VC/VMotion mandatory for reasonable
of a node management
More CPUs supported for each Many hypervisor (ESX) images to maintain
rack Higher software costs (VMware)
Higher infrastructure costs (Ethernet/SAN
switches)
This approach has a number of advantages, including the high availability and
resiliency of the infrastructure. If a single module fails (be it blade hardware or
the ESX Server image), you lose only a small part of your virtual infrastructure as
opposed to what would happen if you lost an entire 16-CPU server (which is
supposed to run eight times the number of virtual machines that the 2-socket
blade supports). Another easy advantage to notice is server costs. Eight
2-socket blades usually cost less than a single 16-socket server. The reason is
the premium price associated with 4-CPU and above systems.
We are not implying that a single 16-socket or 8-socket server can always be
configured to use only a pair of redundant HBAs. We have customers running
16-socket xSeries servers with only two HBAs, but because typically the dozens
of virtual machines running on those are CPU-bound and memory-bound and do
not require specific disk bandwidth, we understand that for scalability reasons
more HBAs might be required for those high-end configurations. However, when
using low-end servers, you would have to configure eight 2-socket servers—so,
for a total eight HBAs (16 if redundancy is a requirement), this also means eight
(or 16) SAN switch ports that (likely) would be under-used. This waste of money
and resources is exactly what we are trying to avoid with virtualization
technologies, which are all about simplifying the whole infrastructure through
better resource utilization, server consolidation and server containment.
Specifically, if you look at the past records of trends and technologies for
operating properly an IT environment, there are some interesting patterns and
recurring trends that generally followed this order:
1. In the 70s and the first part of the 80s it was a common and best practice to
use powerful mainframe computers to manage the IT operations properly of
any given organization. This was the centralized era. Life was good from a
management perspective. But most organizations felt that these systems had
a high cost of acquisition and that they needed to be more flexible, whatever
that meant.
2. In the second part of the 80s and in the 90s we saw a strong trend for which
computing resources tended to be distributed across the business units inside
the organization. A new application model emerged during those years, the
client/server era. While the hardware and software acquisition costs
dramatically decreased in this new model, the Total Cost of Ownership (TCO)
increased drastically. Organizations were supposed to introduce more
flexibility in their infrastructure but in turns they lost complete control of their
IT and also they created application silos that could not talk to each other.
3. As a way to improve upon the client/server model, in the beginning of 2000
most of the IT analyst agreed that, because of the poor manageability of the
distributed infrastructure and the high costs of ownership associated with that,
a certain degree of consolidation was desirable for the sake of those
organizations that embraced the client/server trend.
To clarify this concept, say your company needs to consolidate 30 servers onto a
VMware virtual infrastructure (Figure 1-13 on page 23). For the sake of the
discussion, they would all run on eight physical CPU packages based on
preliminary capacity planning. What would your options be? You could install a
single 8-socket server but that probably would not be the proper solution, mainly
for redundancy reasons.
You could install two 4-socket servers but this would cause a 50% reduction in
computing power if a hardware or software failure occurs on either of the two
servers. However, we know many customers who find themselves in that
situation because they thought that the value of such aggressive consolidation is
worth the risks associated with the failure of an entire system. Many other
customers would think that a good trade-off is to install four 2-socket servers, as
a node failure really provides a 25% deficiency of computing power in the
infrastructure, which is more than acceptable in many circumstances.
2 2 2 2
CPU CPU CPU CPU
Here is a second example: Say that your company needs to consolidate 300
servers. For the sake of the discussion, they would all run on 80 physical CPU
packages based on a similar preliminary capacity planning. You can deduce
from this second example that the absolute numbers that have been used in the
first example might have a very different meaning here. In this exact situation, for
example, the use of 10 8-socket ESX Server systems would cause a 10%
reduction of computing power in case of a single node failure, which is usually
acceptable given the RAS (reliability, availability, serviceability) features of such
servers and the chance of failure. (Remember that in the first example the failure
of a single 2-socket server brings a much larger deficiency of 25%).
8 8 8 8
CPU CPU CPU CPU
8 8 8 8
CPU CPU CPU CPU
Figure 1-14 Project scope: 300 virtual machines (with 8-socket systems)
However, the use of 2-socket server blocks means that the administrator has to
set up, administer, and update as many as 40 ESX Server systems, which could
pose a problem for regular maintenance and associated infrastructure costs.
2 2 2 2 2 2 2 2 2 2
CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU
2 2 2 2 2 2 2 2 2 2
CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU
2 2 2 2 2 2 2 2 2 2
CPU
2 CPU
2 CPU
2 CPU
2 CPU
2 CPU
2 CPU
2 CPU
2 CPU
2 CPU
2
CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU
2 2 2 2 2 2 2 2 2 2
CPU
2 CPU
2 CPU
2 CPU
2 CPU
2 CPU
2 CPU
2 CPU
2 CPU
2 CPU
2
CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU
2 2 2 2 2 2 2 2 2 2
CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU
Figure 1-15 Project scope: 300 virtual machines (with two-socket systems)
Also take into account that VMware is introducing with ESX Server 3.0 4-way
SMP-capable virtual machines. The bigger the VM is, the bigger your host is
supposed to be. Although you could theoretically run a 4-way VM on a dual-core
two-way server (4 cores), our experience with ESX Server 2.x running an n-way
VM on an n-core system reveals this is not a good practice. ESX Server 3.0 is
supposed to improve this situation, although it is too early to say for sure what
the result will be.
There are certainly good practical reasons for using a host that can support
many guests at the same time especially if the peaks of some of these virtual
machines (possibly not all) overlap at some point during the day. For more
information about the implications of using 4-way virtual machines on ESX
Server 3.0 refer to 3.7, “Four-way virtual machines” on page 159.
Note: Do not think about absolute numbers and so called general sweet-spots
regarding the configuration of the servers (that is, virtual infrastructure building
blocks) but rather put them into your perspective.
While this might be considered a valid approach, one of the pitfalls of it is that
you are in fact growing your number of physical servers, the number of
hypervisors (or host OSs in hosted type of solutions) and so forth. So in few
words you are adding to the infrastructure more objects that you need to
manage, monitor, upgrade, and so on.
As you can easily imagine, the last scenario does not modify in any manner the
way you manage your infrastructure, it just happens that the same number of
server objects are now capable of supporting twice the workload for the very
same complexity as before.
Figure 1-16 compares the basic building blocks of the Xeon MP single-core
processor (Potomac) and dual-core processor (Paxville).
L1 L1
Instruct Instruct
Cache Cache
Processor L2 Processor L2
Core Cache Core Cache
L1 L1
Data Data
Cache Cache
L1
Instruct
L3 Cache
Processor L2
Cache Core Cache
L1
Data
Cache
In addition to the two cores, the dual-core processor has separate L1 instruction
and data caches for each core, as well as separate execution units (integer,
floating point, and so on), registers, issue ports, and pipelines for each core. A
dual-core processor achieves more parallelism than Hyper-Threading
Technology, because these resources are not shared between the two cores.
Estimates are that there is a 1.2 to 1.5 times improvement when comparing the
dual-core Xeon MP with current single-core Xeon MP.
With double the number of cores for the same number of sockets, it is even more
important that the memory subsystem is able to meet the demand for data
throughput. The 21 GBps peak throughput of the X3 Architecture of the x3950
with four memory cards is well-suited to dual-core processors.
For example, an x3950 has four sockets, meaning that it can have up to four
physical processors inserted. All of the x3950 models have dual-core
processors, however the x460 which it replaced had models with either dual-core
or single-core processors.
The use of the term way to describe the number of processors can be confusing
because it can be interpreted as meaning cores or sockets. Throughout this
redbook we are use the terms n-CPU system, n-way system or n-socket system
to mean the same thing. That is, we consider the physical microprocessor
package the CPU and not the cores available inside the package.
Another way to look at this is the number of threads that can simultaneously
executed on a CPU. Original Intel CPUs were only able to execute a single
thread at a time. When Intel introduced the NetBurst architecture they doubled
some components of the CPU, thus making it appear as though the single CPU
package were two different processors. This is Hyper-Threading. Specifically
each logical CPU has its own architectural state, that is, its own data, segment
and control registers and its own advanced programmable interrupt controller
(APIC).
On the other hand the logical processors share the execution resources of the
processor core, which include the execution engine, the caches, the system bus
interface, and firmware. As a result of this the performance increase due to
Hyper-Threading was not in the range of 2x (compared to that of using two
different physical processors) but it is more like a maximum 1.3x (30%)
improvement.
With dual-core CPUs, we are just making a step forward doubling all the core
components on the die (that includes the execution resources). Again, there are
a number of things that these cores share between each other. One is the path to
memory. While two CPUs have their own system bus interface, dual-core CPUs
share that connection, thus potentially limiting their bandwidth to memory and I/O
devices.
Even more interesting is that while the current dual-core CPUs have dedicated
on-board cache (that is, one for each core) future multi-core processors might
end up sharing the cache. Table 1-3 on page 28 shows current and future CPU
characteristics.
* because multi-core designs have not been officially announced these are speculations about their
characteristics.
As you can see from Figure 1-17, the notion that a core is a CPU is a bit
misleading. The gain that you can achieve using a dual-core CPU is solely
dependent on the workload being run:
If the workload is CPU bound, then the dual-core will almost double the
performance (despite the dual-core CPUs have slightly lower clock
frequency).
If the workload being run on these systems is memory-bound, then a
dual-core CPU will not offer signifcant gains in performance because both
cores share the same System Bus Interface.
Cache
Cache
CORE CORE
Cache
CPU
SOCKET
And as we move forward towards multi-core CPUs we will likely see more shared
components on the socket such as the various caches included in the processor
package (Figure 1-18).
Cache
Cache
CORE CORE
Shared cache(s)?
SOCKET SOCKET
With the move away from processor speed as a means to improve processor
performance, CPU designers are looking to parallelism for innovation. The
implementation of dual-core and quad-core processors means that more CPU
operation can be performed per second. But to achieve maximum efficiency, this
requires that the system software must also be redesigned to better support
parallelism.
For a number of reasons, only two levels are used in the x86 space. The
operating system executes its core functions at ring 0 while applications (and
higher level functions of the operating system) execute code at ring 3. The other
levels are not used. This is shown in Figure 1-19 on page 31.
Ring 2
Ring 1
Intel CPU
Because the operating system expects to run in Ring 0, virtualizing this stack is
not easy because the virtualization layer must own the hardware, not the guest
operating system. To accomplish this, deprivileging techniques are used to run
the virtualization layer in the ring with the highest privilege, ring 0, lowering the
privilege of the OS to either ring 1 or ring 3, depending on the deprivileging
implementation chosen. This is shown in Figure 1-20.
Intel CPU
This new feature is called Intel Virtualization Technology (VT), earlier known by
its codename Vanderpool. Intel VT is generally referred to as one of the
hardware assist technologies that enable Intel based systems to become more
virtualization-aware.
Note: The new announced Intel Xeon dual-core CPUs, code named Paxville,
already ship with this feature but require a special BIOS release to enable it.
The current plan is to release this BIOS update in the first half of 2006. Some
server virtualization software products such as Xen 3.0 already supports this
feature and both VMware and MSVS has announced they will introduce it
soon.
It is important to understand that this is one of many steps to create systems that
are fully virtualization aware. In this early stage, CPU hardware assists are more
As you can see today we are basically only scraping the surface of all the
scenarios that are ahead of us running software in virtual environments.
The x3950 can form a multinode configuration by adding one or more x3950 E or
MXE-460 modular expansion enclosures. The following configurations are
possible:
Eight-way configuration with two nodes, one x3950 and one x3950 E
16-way configuration with four nodes, one x3950 and three x3950 Es
32-way configuration with eight nodes, one x3950 and seven x3950 Es
Table 2-1 shows the major differences between the xSeries 460 and the x3950.
Dual-core processors Single-core models are upgradable to All models standard with
support dual-core processors using the dual-core processors
Dual Core X3 Upgrade Kit, 41Y5005. This
kit does not include the dual-core CPUs
2.1.1 x3950 E
The x3950 E modular expansion enclosure (formerly the MXE-460) is a system
used to extend an x3950 configuration. Like the x3950, the x3950 E contains
microprocessors, memory, disks, and PCI-X adapters. However unlike the
x3950, the x3950 E can only be used to expand an x3950 configuration.
The x3950 E is functionally identical to the x3950 and supports the same
hardware options. The key differences between the x3950 and x3950 E are:
The x3950 E is for expansion purposes and cannot be used as the primary
node of a multi-node complex, nor can it be used as the primary node in a
partition.
With double the number of cores for the same number of sockets, it is even more
important that the memory subsystem is able to meet the demand for data
throughput. The 21 GBps peak throughput of the X3 Architecture of the x3950
with four memory cards is well suited to dual-core processors.
Figure 2-2 on page 39 compares the basic building blocks of the Xeon MP
single-core processor (Potomac) and dual-core processor (Paxville).
L1 L1
Instruct Instruct
Cache Cache
Processor L2 Processor L2
Core Cache Core Cache
L1 L1
Data Data
Cache Cache
L1
Instruct
L3 Cache
Processor L2
Cache Core Cache
L1
Data
Cache
2.2.2 Hyper-Threading
Hyper-Threading technology enables a single physical processor to execute two
separate code streams, threads, concurrently. To the operating system, a
processor with Hyper-Threading appears as two logical processors, each of
which has its own architectural state: data, segment and control registers, and
advanced programmable interrupt controller (APIC).
Architectural state
Processor
without
Hyper-Threading
Processing
Cache
resources
Logical processor
Architectural Architectural
state state
Processor with
Hyper-Threading
Processing
Cache
resources
2.3 X3 Architecture
The IBM X3 Architecture is the culmination of many years of research and
development and has resulted in what is currently the fastest processor and
memory controller in the Intel processor marketplace. With support for up to 32
Xeon MP processors and over 20 GBps of memory bandwidth per 64 GB of RAM
up to a maximum of 512 GB, the xSeries servers that are based on the X3
Architecture offer maximum performance and broad scale-up capabilities.
The x3950 use the third-generation IBM XA-64e chipset. The architecture
consists of the following components:
One to four Xeon MP processors
One Hurricane Memory and I/O Controller (MIOC)
Two Calgary PCI Bridges
DDR2 SMI2
6 GBps 6 GBps
6 GBps
Calgary PCI-X bridge PCI-X bridge
RSA SL
1 2 3 4 5 6
USB 2.0 ServeRAID
Each memory port out of the memory controller has a peak throughput of 5.33
GBps. DIMMs are installed in matched pairs, two-way interleaving, to ensure the
memory port is fully utilized. Peak throughput for each PC2-3200 DDR2 DIMM is
2.67 GBps. The DIMMs are run at 333 MHz to remain in sync with the throughput
of the front-side bus.
The memory controller routes all traffic from the four memory ports, two
microprocessor ports, and the two PCI bridge ports. The memory controller also
has embedded DRAM, which holds a snoop filter lookup table. This filter ensures
that snoop requests for cache lines go to the appropriate microprocessor bus
and not both of them, thereby improving performance.
One PCI bridge supplies four of the six 64-bit 266 MHz PCI-X slots on four
independent PCI-X buses. The other PCI bridge supplies the other two PCI-X
slots (also 64-bit, 266 MHz), plus all the onboard PCI devices, including the
optional ServeRAID-8i and Remote Supervisor Adapter II SlimLine daughter
cards.
Chipkill™ memory
Chipkill is integrated into the XA-64e chipset, so it does not require special
Chipkill DIMMs and is transparent to the operating system. When combining
Chipkill with Memory ProteXion and Active Memory, X3 Architecture provides
very high reliability in the memory subsystem.
When a memory chip failure occurs, Memory ProteXion transparently handles
the rerouting of data around the failed component as described previously.
However, if a further failure occurs, the Chipkill component in the memory
controller reroutes data. The memory controller provides memory protection
similar in concept to disk array striping with parity, writing the memory bits
across multiple memory chips on the DIMM. The controller is able to
reconstruct the missing bit from the failed chip and continue working as usual.
One of these additional failures can be handled for each memory port for a
total of four Chipkill recoveries.
With advances in chip design, IBM has now reduced the latency of main memory
to below that of the XceL4 cache in the x445. The time it takes to access data
The next section does into greater detail on the benefits of the XceL4v cache on
the x3950 server.
Snooping traffic is an important factor affecting performance and scaling for all
SMP systems. The overhead of this communication becomes greater with an
increase in the number of processors in a system. Also, faster processors result
in a greater percentage of time spent performing snooping because the speed of
the communications does not improve as the processor clock speed increases,
because latency is largely determined by the speed of the front-side bus.
It is easy to see that increasing the number of processors and using faster
processors results in greater communication overhead and memory controller
bottlenecks. But unlike traditional SMP designs, which send every request from
every processor to all other processors, greatly increasing snooping traffic, the
x3950 has a more optimal design. The XceL4v cache in the x3950 improves
performance because it filters most snooping operations.
The IBM XceL4v cache improves scalability with more than four processors
because it also caches remote data addresses. Before any processor request is
sent across the scalability bus to a remote processor, the memory controller and
cache controller determine whether the request should be sent at all. To do this,
the XceL4v dynamic server cache keeps a directory of all the addresses of all
data stored in all remote processor caches. By checking this directory first, the
XceL4v can determine if a data request must be sent to a remote processor and
only send the request to that specific node where the processor caching the
requested data is located.
The term NUMA is not completely correct because not only memory can be
accessed in a non-uniform manner but also I/O resources. PCI-X and USB
devices may also be associated with nodes. The exception to this are earlier I/O
devices such as diskette and CD-ROM drives which are disabled because the
classic PC architecture precludes multiple copies of these traditional items.
The key to this type of memory configuration is to limit the number of processors
that directly access a piece memory, thereby improving performance because of
the much shorter queue of requests. The objective of the operating system is to
ensure that memory requests are fulfilled by local memory whenever possible.
These modern operating systems attempt to allocate resources that are local to
the processors being used by each process. So when a process and its threads
start on node 1, all execution and memory access will be local to node 1. As
more processes are added to the system, the operating system will balance them
across the nodes. In this case, most memory accesses will be evenly distributed
across the multiple memory controllers, reducing remote access, greatly
reducing queuing delays, and improving performance.
An x3950 server can be configured together with one, three or seven x3950 Es to
form a single 8-way, 16-way or 32-way complex.
1 2-way 64 GB 6 None
1 4-way 64 GB 6 None
2 8-way 128 12 1
4 16-way 256 GB 24 3
8 32-way 512 GB 48 7
You can also form multinode complexes using multiple x3950s or combinations
of x3950s and x3950 Es. With these combinations, you can partition the complex
as described in 2.5.1, “Partitioning” on page 50.
x3950 + 7x x3950 E
Eight chassis
x3950 E
x3950 E
16-way
Up to 256 GB RAM x3950 E
x3950 + 3x x3950 E
Four chassis x3950 E
8-way
Up to 128 GB RAM x3950 E x3950 E
x3950 + 1x x3950 E
2-way or 4-way Two chassis x3950 E x3950 E
Up to 64 GB RAM
2.5.1 Partitioning
As discussed in 2.5, “Multi-node configurations” on page 46, the complex can be
configured as one scalable partition with two, four or eight nodes. Alternatively it
is also possible to split this complex into multiple independent partitions. For
example, an eight-node configuration can be split into two 4-node systems by
changing the configuration without changes to the cabling.
The decision whether partitioning is required or not must be made during the
planning stage of a multi-node system, because the primary node in a multi-node
complex always must be an x3950. It is not supported to configure multiple
partition on a complex that consists of one x3950 with one, three or seven x3950
Es attached. You must have one x3950 as the primary node for every partition
you create.
See 2.6.3, “Partitioning example” on page 63 for an example of how to partition a
4-node complex into two separate 2-node partitions.
2 4 (x3950 E)
2 3 (x3950)
1 2 (x3950 E)
1 1 (x3950)
Note: These cables are not compatible with the x440 and x445 equivalent
cables.
To build your multinode complex, you must order the appropriate number of each
type of scalability cable for your specific configuration, as in Table 2-3. The x3950
and x3950 E do not come with scalability cables.
32-way: 8 chassis 8 4
There are three different cabling schemes, depending on the number of nodes
used in the complex. These are shown in the following diagrams. In a 2-node
configuration, two scalability cables are used to connect both chassis. The
second cable provides redundancy for the chassis interconnect as well as a
slight performance benefit.
In any multi-node configuration any one scalability cable can fail without impact
on the server’s operation. In this situation, a warning LED on the light path
diagnostic panel will be lit and an event will be logged into the RSA event log as
shown in Figure 2-7.
SMI reporting Scalability Event:Double Wide Link Down.Chassis Number = 1. Port Number = 0.
SMI reporting Scalability Event:Link Down.Chassis Number = 1. Port Number = 1.
SMI reporting Scalability Event:Double Wide Link Down.Chassis Number = 2. Port Number = 0.
SMI reporting Scalability Event:Link Down.Chassis Number = 2. Port Number = 1.
Figure 2-7 Messages in the RSA II event log when a scalability cable fails
Figure 2-8 depicts the scalability cabling plan for a 2-node / 8-way configuration.
8-way configuration
RSAII Ethernet
MXE-460 Port 1 Port 2 Port 3 network
x460
RSAII 2.3m cable
(primary) Port 1 Port 2 Port 3 (13M7414)
Figure 2-9 on page 53 depicts the scalability cabling plan for a 4-node / 16-way
configuration. This uses just the short (2.3 m) cables.
MXE-460
(node 3) Port 1 Port 2 Port 3 RSAII
MXE-460
Port 1 Port 2 Port 3 RSAII
(node 2)
2.3m cable
x460 (13M7414)
(primary) Port 1 Port 2 Port 3 RSAII
Figure 2-10 depicts the scalability cabling plan for an 8-node / 32-way
configuration. This one uses a combination of short (2.3 m) and long (2.9 m)
cables.
32-way configuration
MXE-460
(node 8) Port 1 Port 2 Port 3 RSAII
Ethernet
network
MXE-460
(node 7) Port 1 Port 2 Port 3 RSAII
MXE-460
Port 1 Port 2 Port 3 RSAII
(node 6)
MXE-460
Port 1 Port 2 Port 3 RSAII
(node 5)
MXE-460
Port 1 Port 2 Port 3 RSAII
(node 4)
MXE-460
(node 3) Port 1 Port 2 Port 3 RSAII
MXE-460
Port 1 Port 2 Port 3 RSAII
(node 2) 2.3m cable
(13M7414)
x460
(primary) Port 1 Port 2 Port 3 RSAII 2.9m cable
(13M7416)
The multi-node configuration data is stored on the RSA in all chassis that are
members of the scalable partition. Figure 2-12 on page 57 depicts the RSA Web
interface for the scalable partition configuration.
To configure the RSA II SlimLine to use another static address, you can use one
of two methods:
Access the adapter from a client computer, for example using a notebook
computer connected to the RSA with a crossover Ethernet cable. Open a
Web browser and point it to the adapter’s IP address (192.168.70.125).
Tip: The default logon credentials are USERID and PASSW0RD. All characters
are uppercase, the 0 in PASSW0RD is a zero and not the letter O).
The RSA’s MAC address is displayed as RSA II MAC Address which can be
helpful for network switch configuration.
In the OS USB Selection field, select the appropriate value for your operating
system. This setting determines how the RSA presents emulated keyboard and
mouse in a remote control session to the operating system.
For Microsoft Windows, select Other OS.
For VMware ESX Server select Linux OS.
The Periodic SMI Generation setting is set to Disabled by default and should
not be changed on the X3 Architecture servers. This feature was intended for
support of older operating systems that did not include adequate checking of
CPU states. Modern operating systems poll for CPU machine checks without this
feature. No function is lost by disabling it.
The communication between the RSA adapters is handed through the systems
management Ethernet connections. Therefore it is very important to ensure
secure and reliable network connections. We recommend you connect all RSAs
to a separate management network, which is exclusively used for management
and not shared with the production or campus LAN.
2.6.1, “RSA II SlimLine setup” on page 54 describes how to prepare the RSA II
adapter.
Note: Scalable partitions can be created only using the primary node’s
RSA interface. It is not possible to create, delete, start or stop a scalable
partition from a secondary node’s RSA interface.
5. Enter the IP address or host name for each node to be part of the Scalable
System and click Assign. See Figure 2-13. If you intend to use host names
instead of IP addresses for the RSA adapters, note that only fully qualified
Note: In order to assign the IDs, the systems all must be powered off. An
error pop-up window opens if a chassis is not powered off. To delete a
configuration, just assign new IDs and click the Assign button.
6. Click Step 2: Create Scalable Partition Settings section in the RSA menu.
Figure 2-14 opens.
10.Before using the partition, you must move the new partition to be the current
partition. To move your new partition to the current partition click Scalable
Partitioning → Control Partition(s) and then click Move New Scalable
Partition to Current Scalable Partition as shown in Figure 2-16 on page 62.
Note: If you have a current partition running and want to schedule the new
partition to become active at a latter time you can select the On Next
Restart Move New Scalable Partition to Current Scalable Partition and
schedule a reboot to take place at the time you want the new partition to
become the current partition.
11.Click OK the dialog box shown in Figure 2-17. This is your last chance to
cancel. After you click OK any current running partition will be replaced with
the new partition. Each partition can only have one current partition and one
new partition.
12.Check your status again by clicking Scalable Partitioning → Status. You can
see your partition has moved to Current Scalable Partition as shown in
Figure 2-18 on page 63.
7. Click Scalable Partitioning → Control Partition(s) and then click Move the
new Scalable partition configuration to Current Partition configuration.
8. To start the current partition click Scalable Partitioning → Control
Partition(s) then click Start Current Scalable Partition, see Figure 2-16 on
page 62. This will power on both servers in the partition. Click OK in the
dialog box shown in Figure 2-19 on page 63.
9. Now we create the second partition. Logon to the RSA Web GUI of chassis 3.
In our example it is 192.168.70.102.
The two partitions now behave like two independent 8-way servers. Each
partition can be controlled independently with the RSA in each primary node.
This means, they can powered on and off without any impact on each other. For
example, powering off partition 1 does not affect partiton 2. Though they are still
wired as a 4-node complex, there will be no data transmitted between the
partitions.
Tip: The key concept is that you are not actually adding two nodes, but
instead creating a new four node partition.
1. Shut down the existing system and cable all four nodes as illustrated in
Figure 2-9 on page 53.
2. Logon to the RSA Web GUI of chassis 1
3. In the RSA main menu click Scalable Partitioning → Create Partition. Enter
the IP address or host name for all four nodes to be part of the Scalable
System and click Assign. See Figure 2-22 on page 67.
4. Click Step 2: Create Scalable Partition Settings section in the RSA menu.
5. Enter the following parameters:
– Partition Merge timeout minutes: Use this field to specify the minutes for
BIOS/POST to wait for merging scalable nodes to complete. Allow at least
8 seconds for each GB of memory in the scalable partition. Merge status
for POST/BIOS can be viewed on the monitor attached to the primary
node.
The default value is 5 minutes, we do not recommend setting it lower than
3 minutes.
– On merge failure, attempt partial merge?: Use this field to specify Yes or
No if BIOS/POST should attempt a partial merge if an error is detected
during full merge in a 4 or 8 node configuration. If your application is
sensitive to load order or full scalable partition support, you might not want
6. Check the status by clicking Scalable Partitioning → Status. You can see a
screen similar to Figure 2-24 on page 69.
Note: In our lab environment we connected two 4-way x3950s along with two
2-way x3950s in order to demonstrate how to configure a four node system.
That is why Figure 2-25 shows 12 installed processors.
This is not a supported configuration. For support, all four nodes must have
identical speed, and number and type of processors installed. Also, for
performance reasons, we recommend that all nodes have the same amount of
memory installed.
Both VMware ESX Server 3 and Microsoft Virtual Server have features that can
help minimize the impact of a node failure.
ESX Server 3 when combined with VirtualCenter 2 with VMware High
Availability (HA) offers the ability to restart virtual machines automatically on
another host in the event of a node failure. More information about HA is
available in 3.6, “Load balancing and fault tolerance considerations” on
page 137.
You configure the memory subsystem in the server’s BIOS Setup menu by
selecting Advanced Settings → Memory → Memory Array Setting, shown in
Figure 2-26.
Memory Settings
Memory Card 1
Memory Card 2
Memory Card 3
Memory Card 4
Memory Array Setting [ RBS (Redundant Bit Steering) ]
The memory configuration mode you select depends on what memory features
you want to use:
Redundant Bit Steering (RBS):
This option enables Memory ProteXion and is the default/standard setting.
Select RBS if you are not using mirroring, hot-swap or hot-add. See 2.3.1, “X3
Architecture memory features” on page 42 for more information about how
RBS works.
Full Array Memory Mirroring (FAMM)
Select FAMM to enable memory mirroring (and to enable hot-swap).
FAMM reduces the amount of addressable memory by half on each chassis
in the partition, but provides complete redundancy of all addressable memory.
RBS is available in the mode Automatic Failover is available in this mode.
See 2.3.1, “X3 Architecture memory features” on page 42 for more
information.
Hot-Add Memory (HAM)
Select HAM to enable the use of hot-add in the future.
HAM provides an array layout which supports runtime hot memory add within
an OS that supports that feature. This setting has lower performance and may
also restrict the amount of memory that can be installed in each chassis as
addressable ranges must be reserved on each chassis for the hot add
function. RBS is available in this mode. See 2.3.1, “X3 Architecture memory
features” on page 42 for more information.
High Performance Memory Array (HPMA)
HPMA optimizes the installed memory array on each chassis in the partition
for maximum memory performance. Hardware correction (ECC) of a single
correctable error per chip select group (CSG) is provided, but RBS is not
available.
We recommend that you do not select the HPMA setting in a production
environment because this disables Memory ProteXion.
CPU Options
On x3950 multi-node configurations, all the nodes in a partition must have the
same Hyper-Threading setting. You must set this individually on each node.
Clustering Technology
For certain operating systems it is necessary to configure how the routing of
processor interrupts in a multi-processor system is handled. It is a low-level
setting that sets a multi-processor interrupt communication protocol (XAPIC).
The settings are functional only, and do not affect performance.
Clustering here refers to the ability of the x3950 to have CPUs across multiple
processor buses. The processors are clustered into pairs of processors, one pair
for each bus. Each server has two processor buses, and each additional node in
an x3950 multi-node complex has two extra processor buses.
Note: The term clustering here does not refer to the cluster technology
provided by services such as Microsoft Cluster Service.
The choices are Logical Mode, Physical Mode and Special Mode.
The Logical Mode is the default mode for the system. It can be used by
Windows, Linux, and ESX Server.
This setting may affect performance, depending on the application running on the
server and memory bandwidth utilization. Typically it will affect certain
benchmarks by a few percent, although in most real applications it will be
negligible. This control is provided for benchmark users that wish to fine-tune
configurations and settings.
If this feature is enabled and provided the operating system has marked the
memory segment as containing data, then the processor will not execute any
code in the segment.
Memory Settings
Memory Card 1
Memory Card 2
Memory Card 3
Memory Card 4
Memory Array Setting [ RBS (Redundant Bit Steering) ]
Note: While changing this setting can result in a small memory performance
increase, you will lose the redundancy provided by RBS.
If you plan to run applications that do not take advantage of prefetch, such as
Java, file/print, or a Web server, then you can gain 10% to 20% by disabling
prefetch. To disable prefetch, go to Advanced Setup → CPU Options and set
Processor Hardware Prefetcher to Disabled.
Note: This chapter is based on ESX Server 3.0 beta 1. You may find that the
windows and steps described here may be slightly different in beta 2 and the
final generally available versions.
New features such as VMware High Availability (VMware HA, formerly known as
Distributed Availability Services or DAS), Distributed Resource Scheduler (DRS),
and Consolidated Backup will provide higher availability, guaranteed service
level agreements, and quicker recovery from failures than was ever possible
before and coming close to the availability you get from more expensive and
complicated alternatives such as physically clustered servers. The System
x3950 server with its scale-up abilities is uniquely positioned to take advantage
of the larger workloads now available to be virtualized.
The new features in ESX Server 3 and VirtualCenter 2 include the following:
NAS and iSCSI support: The ability to store virtual machines on lower cost
NAS and iSCSI storage should allow more companies to take full advantage
Note: This Redbook was written with early beta versions of ESX Server 3 and
VirtualCenter 2. Features and specifications are subject to change.
vpxa vpxd
WebCenter
ESX Server VirtualCenter server
(client
hostd Tomcat Tomcat browser)
Consolidated backup
server
The only components that must be a physical server is the ESX Server and the
Consolidated Backup Server. VirtualCenter as well as the Licensing Server can
be installed on a physical server or a virtual machine. The Licensing Server is the
There is nothing to install on the client side as they connect to the virtual machine
with a Web browser. HA and DRS are features that can be unlocked by entering
the licensing information into the Licensing Server. There is no additional
software associated with the components. For Consolidated Backup it is required
that you have a physical Windows system to act as the backup proxy server for
all your virtual machines. There is no additional software to install on the ESX
Serve or VirtualCenter systems for Consolidated Backup.
See 3.4, “Architecture and design” on page 111 for a more detailed discussion.
As you can see, folders and the other new objects give you much more flexibility
in how you organize your virtual infrastructure. Figure 3-2 on page 87 shows a
view of how some of the different objects relate.
Table 3-1 Maximum CPU configurations with ESX Server (nodes refers to the number of x3950 servers)
ESX Server Single-core CPUs, Single-core Dual-core CPUs, Dual-core CPUs,
Version HT disabled CPUs, HT enabled HT disabled HT enabled
There are three ways of describing the number of processors. They are based
on:
The socket (the physical connection the chip has to the system board)
The core (the compute engine and associated circuitry in the chip)
The logical processor using Hyper-Threading Technology
HT HT HT HT HT HT HT HT HT HT HT HT HT HT HT HT
NUMA node
The hardware in Figure 3-3 is a single x3950 server and represents a single
NUMA node. So in a multi-node ESX Server configuration, we have a number of
NUMA nodes equal to the number of the x3950 chassis connected together.
Figure 3-4 on page 89 shows a two-node configuration.
Home nodes
ESX Server assigns each VM a home node when the VM begins running. A VM
will only run on processors within its home node. Newly-allocated memory
The rebalancer selects an appropriate VM and changes its home node to the
least-loaded node. When possible, the rebalancer attempts to move a VM that
already has some memory located on the destination node. From that point on,
the VM allocates memory on its new home node, unless it is moved again. It only
runs on processors within the new home node.
When a VM moves to a new node, ESX Server immediately begins to migrate its
memory in this fashion. It adaptively manages the migration rate to avoid
overtaxing the system, particularly when the VM has very little remote memory
remaining or when the destination node has little free memory available. The
memory migration algorithm also ensures that it will not move memory
needlessly if a VM is moved to a new node for only a short period of time.
VMware ESX Server provides two sets of controls for NUMA placement, so that
administrators can control both memory and processor placement of a VM. The
ESX Server web-based Management User Interface (MUI) allows you to indicate
that a VM should only use the processors on a given node through the Only Use
Processors option and that it should only allocate memory on the desired node
through the Memory Affinity option. If both of these are set before a VM starts, it
only runs on the desired node and all of its memory is allocated locally. An
administrator can also manually move a VM to another node after the VM has
started running. In this case, the Page Migration Rate of the VM should also be
set manually, so that memory from the VM’s previous node can be moved to its
Note that manual NUMA placement may interfere with the ESX Server resource
management algorithms, which attempt to give each VM a fair share of the
system’s processor resources. For example, if ten VMs with processor-intensive
workloads are manually placed on one node, and only two VMs are manually
placed on another node, then it is impossible for the system to give all twelve
VMs equal shares of the system’s resources. You should take these issues into
account when using manual placement.
Conclusion
ESX Server’s rich manual and automatic NUMA optimizations allow you to fully
exploit the advanced scalability features of the Enterprise X-Architecture
platform. By providing a dynamic, self-optimizing NUMA load balancer in
conjunction with patented memory migration techniques, ESX Server can
maintain excellent memory performance even in the face of changing workloads.
If you need more information about the technical details of NUMA system
configuration, consult the ESX Server documentation.
We have created a 4-way virtual machine on a 2-node 8-way system with single
core CPUs and Hyper-Threading enabled. We have then stressed the 4-way
virtual machine to simulate an heavy workload to push it towards full utilization of
the resources associated to that. Figure 3-5 on page 93 shows the utility esxtop.
As you can see from this basic example the workload being generated by the
4-way virtual machine is kept local in a single NUMA node (NUMA node 1 in this
case).
ESX Server has been Hyper-Threading aware since V2.1 and the VMkernel is
capable of scheduling workloads on the system taking into account this feature
and its limitations. ESX Server implements algorithms and options for proper
handling of threads running on Hyper-Threading enabled processors. These
algorithms can also be set at the VM level.
For ESX Server 2.5.2 you can do this on the Web interface in the Startup Profile
while on ESX Server 3.0 you have to do that in the processor configuration
panel.
For more information about how ESX Server 2.x works with with
Hyper-Threading Technology, refer to this document:
http://www.vmware.com/pdf/esx21_hyperthreading.pdf
So on a 4-way x3950 with dual-core CPUs and Hyper-Threading turned off, the
output from esxtop would be similar to Figure 3-7, showing eight physical CPUs
(PCPU).
Figure 3-7 esxtop on an x3950 4-way dual-core with Hyper-Threading turned off
Figure 3-8 on page 96 shows a 4-way x3950 system with dual-core CPUs and
Hyper-Threading turned on.
At the time of writing, however, this display can cause confusion because, under
certain circumstances, you cannot tell from the interface the exact configuration
of your system. Figure 3-9 on page 97 is an example of a potential misleading
screen.
However using esxtop, you can easily determine whether those logical
processors are actual cores or the result of Hyper-Threading. Cores will show up
as PCPUs (physical CPUs) while Hyper-Threaded logical CPUs will show up as
LCPUs (logical CPUs).
Note: the discussion in this section applies to both ESX 2.5.x as well as to
ESX 3.0. There is no difference in how the two versions deal with these
different processor layers.
These recommendations are being made specifically for the x3950 server in a
multi-node configuration, although they may apply to other servers as well.
Note: For multi-node x3950 configurations with single-core CPUs you must
use ESX 2.5.1 upgrade 1 or higher For dual-core CPUs, you must use ESX
2.5.2 or higher.
http://www.pc.ibm.com/support?page=MIGR-53046
For ESX Server 2.5.x we recommend the following BIOS setting be changed.
Disable Hardware Prefetch
To disable prefetch, go to Advanced Setup → CPU Options and set
Processor Hardware Prefetcher to Disabled.
Disable Hyper-Threading
ESX Sever 2.5.x supports up to 16 logical processors. Because each
processor with Hyper-Threading enabled would appear as two logical
processors to ESX you will need to disable Hyper-Threading on a 16-way, 4
node x3950, or a 8-way, 2 node x3950 using dual-core CPUs. From the main
BIOS menu, select Advanced Setup → CPU Options. Figure 3-10 on
page 99 appears. With Hyper-Threading Technology selected, press the right
arrow key to change the value to Disabled.
CPU considerations
The x3950 and all x3950 Es must each have four processors installed and all
processors must be the same speed, number of cores, and cache size.
Note: This is to illustrate basic network configuration and does not take
into account more advanced topics such as backup networks and DMZ
networks.
Hard drives
We recommend you install ESX Server on a RAID 1 array and add a hot spare
drive for increased fault tolerance. The size of hard drives you need will depend
on how much RAM you have in the server and how many virtual machines you
plan to run.
We recommend you configure your server with three 72.3 GB hard drives, two
for a RAID 1 array and one hot spare. This will provide enough space for a
configuration up to a 4-node x3950 with 64 GB of RAM and 64 virtual machines
running. This is assuming all the virtual machines are running on the SAN and
not local disk.
Disk partitioning
Disk partition size depends on a number of factors including the number of virtual
machines that will be running and the amount of RAM installed. Your swap
should be 2x the amount of RAM, and the VMFS2 volume used for the VMkernel
swap should be at least as large as the amount of physical RAM installed in the
server.
Table 3-3 on page 102 shows an example of how to partition the disks for a
two-node, 8-way x3950 with 32 GB of RAM designed to run 32 virtual machines,
assuming a 72.3 GB local RAID-1 array and with virtual machines stored on a
SAN.
swap 1 GB Swap file for service console, should be twice the amount of
RAM assigned to the service console. Should be created as
a primary partition.
/var 1 GB Various ESX Server logs are stored in this partition. This
size should be sufficient to not run out of space. This is also
used if you plan to use the VMware method of a scripted
install.
/home 512 MB Virtual machine configuration files are stored here. They are
small and this will be enough space regardless of how many
virtual machines you have running.
core dump 100 MB In the event of an ESX server crash, a log is put in the
coredump partition to send to VMware support.
/tmp 1 GB Optional. Some people like to create a partition for temp files
You should also take note of the minimum CPU value assigned to the service
console. By default, when you install ESX Server 2.5.x, it will allocate 8% of
CPU0 as the minimum for the service console. This is based on the assumption
that no additional applications were to be installed in the service console.
Remember, these minimum values are only enforced if the service console
needs the additional CPU cycles, and there is contention for resources. Under
most circumstances the service console will use less CPU than the minimum
listed here, and the unused processor capacity is available to virtual machines.
Network configuration
We recommend you start with all the network controllers set to auto negotiate
for their speed and duplex settings. Our experience is that this is the best setting
for the onboard Broadcom NICs that are in the x3950.
If you experience network-related performance issues you can try changing the
NIC settings to 1000/Full. These settings are in the MUI under Options →
Network Connections. See Table 3-2 on page 100 for our recommended
configuration using the onboard NICs. This is a basic configuration. For more
information about advanced networking topics, VMware has several white
papers about networking, available from:
http://www.vmware.com/vmtn/resources/esx_resources.html
Time synchronization
It is important that ESX Server keep accurate time. To sync ESX Server with a
NTP server follow the directions as outlined in VMware KB Answer ID# 1339:
http://www.vmware.com/support/kb/enduser/std_adp.php?p_faqid=1339
VMware also recommends that you sync your virtual machine’s time with the
ESX Server’s time. This is a function of the VMware Tools that are installed in the
virtual machines. For more detailed information about timekeeping, see the
VMware white paper Timekeeping in VMware Virtual Machines available from:
http://www.vmware.com/pdf/vmware_timekeeping.pdf
Storage configuration
There are a couple of changes that can be made to optimize your storage I/O.
Fiber channel queue depth on QLogic HBAs
VMware recommends in high I/O environments that you increase the HBAs
maximum queue depth. This can be done in the Service Console by editing
RSA II configuration
Set the OS USB Selection to Linux OS. See 2.6.1, “RSA II SlimLine setup” on
page 54 for instructions on how to configure the RSA II adapter.
Memory configuration
Follow the recommendation in 2.9.1, “Optimal memory module installation” on
page 76. Remember ESX Server 3.0 supports up to 64 GB of RAM per server.
VMware recommends that you burn in your memory for 48 hours before putting a
new server into production. They suggest a free tool such as Memtest86+ be
used.
See 3.4.2, “ESX Server 3 design” on page 119 for more information.
We expect that after ESX Server 3 is released, VMware and its user
community will develop a more comprehensive list of best practices for the
extensive new networking features in ESX Server 3.
With ESX Server 3, disk partitioning is no longer dictated by how much RAM is
installed in the physical server and how many virtual machines you have running.
Table 3-4 shows the recommended partitioning scheme for ESX Server 3 on a
multi-node x3950.
It is still recommended that the swap partition be twice the amount of RAM
allocated to the service console. The other partitions will not need to change size
based upon the number of virtual machines, amount of RAM, and so forth.
Network configuration
As we discuss in 3.4.2, “ESX Server 3 design” on page 119, there have been a
number of changes to networking in ESX Server 3. It will take time for VMware
and the user community to develop best practices regarding all the new network
features. Initially we can recommend the following:
Add your additional NICs to the first virtual switch that was created during
install and add ports to this switch for your virtual machines, VMotion, NAS
storage, and so forth (Figure 3-14).
Change the default security policy on the vSwitch to Reject for all three;
Promiscuous, MAC Change, and Forged Source (Figure 3-15 on page 110).
Unless you have a need for any of these in your organization, it is probably
best for security reasons to set to disallow.
Storage configuration
One recommendation we can make at this time is that if you plan on using iSCSI
storage, you should use the hardware-based initiator with the QLogic QLA4010
HBA and not the software-based initiator built into ESX Server. The iSCSI HBA
has a TCP Offload Engine that will be higher performance and much less
overhead than using the software-based initiator.
Time synchronization
It is important that ESX Server keep accurate time. To sync ESX Server with a
NTP server follow the directions as outlined in VMware KB Answer ID# 1339:
http://www.vmware.com/support/kb/enduser/std_adp.php?p_faqid=1339
VMware also recommends that you sync your virtual machine’s time with the
ESX Server’s time. This is done as a function of the VMware Tools that are
installed in the virtual machines. For more detailed information about
timekeeping see the VMware white paper Timekeeping in VMware Virtual
Machines available from:
http://www.vmware.com/pdf/vmware_timekeeping.pdf
For initial testing purposes or troubleshooting you might want to disable the
firewall temporarily, you can do this by logging onto the service console and
entering the following command:
esxcfg-firewall -u
This will disable the firewall until the next reboot. Figure 3-16 shows the options
that can be used with the esxcfg-firewall command. This figure displays the
output of the esxcfg-firewall -help command.
esxcfg-firewall <options>
-q|--query Lists current settings.
-q|--query <service> Lists setting for the
specified service.
-q|--query incoming|outgoing Lists setting for non-required
incoming/outgoing ports.
-s|--services Lists known services.
-l|--load Loads current settings.
-r|--resetDefaults Resets all options to defaults
-e|--enableService <service> Allows specified service
through the firewall.
-d|--disableService <service> Blocks specified service
-o|--openPort <port,tcp|udp,in|out,name> Opens a port.
-c|--closePort <port,tcp|udp,in|out> Closes a port previously opened
via --openPort.
--blockIncoming Block all non-required incoming
ports (default value).
--blockOutgoing Block all non-required outgoing
ports (default value).
--allowIncoming Allow all incoming ports.
--allowOutgoing Allow all outgoing ports.
-h|--help Show this message.
See Table 5-2 on page 227 for a list of what firewall ports you must open for
IBM Director.
For the latest list of supported guest operating systems and qualified hardware
see the Systems Compatibility Guide:
http://www.vmware.com/vmtn/resources/esx_resources.html
http://www.pc.ibm.com/support?page=MIGR-62310
ESX Server runs on a physical server, while VirtualCenter can either run on a
separate physical server or in a virtual machine. One thing to consider if you
choose to run VirtualCenter in a VM is that if the parent ESX Server system goes
offline, you will not have access to VirtualCenter until the server is back online or
you restart the virtual machine on another host. vSMP and VMotion are features
already installed and are unlocked with a license key.
VMware offers a Virtual Infrastructure Node (VIN) license that includes the
following software licenses:
ESX Server license
Virtual SMP license
VirtualCenter Agent license
vMotion license
The VIN license offers considerable savings over buying all the individual
licenses separately.
Both methods provide very similar results in the number of virtual machines you
could support on an 8-way x3950 server. Our experience is that in most
organizations, these two methods usually result in a similar number of virtual
machines per host. Therefore to save yourself some time, we recommend you
use the first method for initial sizing of your ESX servers.
The exact mix of virtual machines and applications running will affect how many
virtual machines you can run. Unfortunately there is no one formula that will
calculate exactly how many virtual machines you can run. The low end of the
recommendations we illustrated here should provide a realistic and conservative
target for most organizations. In reality you could end up supporting more or
fewer virtual machines.
To avoid this cycle, one recommendation is that you follow the same purchasing,
approval, and change management procedures for virtual machines as you do
for physical systems. While the process can usually be streamlined and shorted
for virtual machines, having a formal process in place to request virtual machines
as well as a way to associate costs to each new virtual machine, you will have
much better control over your virtual machine growth and a better idea of future
growth.
Sometimes there are significant changes to processors in the same family that
have different extended features, such as 64-bit extensions and SSE3. In these
cases VMotion might not work, even though the CPUs are in the same processor
family. CPU speed and cache level are not an issue, but the extended features
will cause problems or VMotion failure if they are different on the target and host
servers.
For example, because the x366 and x260 use the same processors as the
x3950, these servers would be suitable candidates for joining some x3950s in a
VMotion configuration. However other xSeries servers with different processors
will not.
VMotion also requires its own dedicated Ethernet controller and network. A
Gigabit Ethernet network is listed as required for VMotion by VMware, but it is
possible to use VMotion with a 100 Mbps network if was your only option,
although migration times will increase significantly.
Another important requirement for VMotion is shared storage. The ESX Server
systems that you are going to run VMotion across need to be zoned so that all
LUNs are visible to all hosts.
Because VMotion requires shared storage, then the upper limit per farm would
be 16 ESX Server systems per farm. You might want to create smaller farms for
a number of reasons. The lower limit is two servers, assuming you are using
VMotion.
Storage sizing
Like server sizing, there is no one universal answer that can be applied to every
organization. The previous section lists that there should not be more than 32
I/O-intensive virtual machines per VMFS volume, and staying within this limit
should reduce any resource contention or SCSI locking issues.
There are a number of ways to determine the most appropriate size of your
VMFS volume. Here is one of the easier ways.
Say you have decided that two 8-way x3950 servers with 32 virtual machines on
each server will meet your processing requirements. Using the 32 virtual
machines per LUN guideline, this would mean that two LUNs are needed. If you
are creating new virtual machines, you can estimate the average size of the
virtual disks. If we use 20 GB of disk per VM, this would give us 640 GB per LUN.
Consider adding a little additional space for growth, Ten percent is a good rule of
thumb, bringing us to 720 GB. If you are planning on using redo logs, you might
want to add additional space for that as well.
In ESX Server 2.5.x there are three basic components you should consider.
Service console
It is recommended that the service console have its own dedicated NIC for
performance and security reasons. If you have a separate management
network in your data center, then this is where you want to locate the service
console NIC.
In a default configuration, a 100 Mbps Ethernet controller is sufficient
bandwidth for the service console. If you are planning on also using the
service console for backups or other high bandwidth functions, then a Gigabit
NIC is recommended.
This is a brief overview of networking with ESX Server 2.5.x. Advanced topics
such as backup networks, DMZ networks, traffic shaping, and detailed
configuration steps are beyond the scope of this Redbook. For in depth
information about networking and configuration steps, see the documentation on
the VMware Web site:
http://www.vmware.com/support/pubs/esx_pubs.html
http://www.vmware.com/vmtn/resources/esx_resources.html
Virtual Infrastructure
See 3.1.2, “Virtual Infrastructure Overview” on page 84 to review the basic
components of Virtual Infrastructure with ESX Server 3 and VirtualCenter 2.
Not having any definitive answers at this time, the guidelines of 4-5 virtual CPUs
on single-core and 6-8 on dual-core CPUs is a good minimum expectation for
ESX Server 3.
You might want to create multiple VMotion boundaries within a datacenter. For
example, you could have a datacenter called Boston and in it a folder called Intel
and one called AMD because these CPUs would not allow VMotion between
them. The same basic rules of VMotion still apply in VirtualCenter 2: the physical
servers must have similar CPUs (as we discussed in “VMotion considerations” on
page 116) as well as sharing the same storage.
As of this writing VMware did not have a best practices document available, but
assured us that there would not be any decreases over what is supported in ESX
Server 2.5.x.
There are a few changes in the VMFS 3 file system that are worth noting here:
VMFS 3 supports 3300 files per VMFS volume and subdirectories.
VMFS 3 supports 233 files and subdirectories per sub-directory.
Given the increased abilities of ESX Server 3 and the VMFS 3 file system, it is
likely the maximum ESX Server hosts per shared LUN and number of virtual
machines per LUN will increase, but these number were not yet determined by
VMware. We can assume the numbers provided in 3.4.1, “ESX Server 2.5.x
design” on page 112 would be the minimum starting point for ESX Server 3.
Storage sizing
As we mentioned in the previous section, with the increased capacities of the
VMFS3 file system, we expect increases in the number of ESX Server hosts per
LUN and number of virtual machines per VMFS volume.
Because the upper limits had yet to be determined while we were writing, we
could not make any additional recommendations over the guidelines for storage
sizing in ESX Server 2.5.x in “Storage sizing” on page 117.
There are four important changes to consider in ESX Server 3 that will affect how
you size your LUNs and shared storage:
Swap files
The virtual machine’s swap files are now stored on the shared storage and
not on the local disk. Instead of one giant swap file for all the virtual machines,
there is now one swap file per virtual machine equal to the size of RAM
allocated to the virtual machine. Each virtual machine has its own folder on
the VMFS volume and the swap file is in that folder.
The result of these changes is that you must increase the size of your VMFS
volume to accommodate the additional storage requirements of the swap
files. For example if you have 20 virtual machines with an average of 1 GB of
RAM allocated to each one, you need 20 GB of additional storage on the
VMFS3 volume to accommodate the swap files.
Configuration files
The virtual machine’s configuration (.vmx) and NVRAM file are now stored on
the shared storage in the same folder as the virtual machine and its swap file,
and no longer on the local disks of ESX Server.
This change was made largely in part to enable the HA feature of
VirtualCenter 2,but also to ease the management of the virtual infrastructure.
These files do not take much additional space, but you might want to add a
small amount of storage per virtual machine.
Snapshots
VMware ESX Server 3 supports multiple snapshots similar to the feature in
VMware Workstation product. Because these snapshots will be located on
the shared storage along with the virtual machine’s disk and configuration
files, you must account for additional space if you will be using this feature.
You still must have one Gigabit NIC for each 10-20 virtual machines, one for
VMotion, and one for the service console. The difference now is that they can all
be connected to the same virtual switch and provide redundancy for all the port
groups from the failure of a single NIC.
There can be reasons, still, to assign a physical NIC to just the service console.
For example if your company mandates all management interfaces must be
isolated on a management network, you can do this if you need to, it is just no
longer a requirement.
5. Select which NIC you want to use for the service console. Enter your IP
address information, DNS servers, and host name for your server as shown in
Figure 3-23 on page 129. As you can see, you now have the option to use any
NIC in the system for the service console during the install. Click Next to
continue.
6. After the networking portion, the installer reboots and the ESX Server
installation is complete.
7. You should see a screen similar to Figure 3-24 on page 130. You can use the
VirtualCenter Client to connect directly to ESX server because there is no
longer a MUI.
Note: It was not clear at the time of this writing if the Licensing Server will
remain a separate installer, or be part of the VirtualCenter installation.
1. Run vlsetup.exe to start the install of the licensing server. You will see the
license server install wizard.
2. Accept the license agreement and click Next to continue.
3. Enter User Name and Organization in the Customer Information screen and
click Next to continue.
4. Select the destination folder for installation and click Next to continue.
5. Provide a path to your licensing file. Click Next to continue. See Figure 3-25
on page 131.
11.Click Install on the ready to install screen. Click Back to make any changes.
12.When the installation is complete, click Finish to exit.
Note: You can also use the IP address instead of a server name.
You can also use the advanced VMware features such as VMware High
Availability (formerly Distributed Availability Services) and Distributed Resource
Tip: VMware High Availability is the new name for Distributed Availability
Services.
Figure 3-32 provides an example of a cluster comprised of six System x3950 all
running ESX Server 3.0.
ESX image ESX image ESX image ESX image ESX image ESX image
VM VM VM VM VM VM VM VM VM VM VM VM
VM VM VM VM VM VM VM VM VM VM VM VM
VM VM VM VM VM VM
Note: one of the major difference between ESX Server 2 and ESX Server 3 is
that now all the virtual machine files, including the vmx configuration file, are
located on the SAN, resulting in even easier management of the infrastructure
and easier recovery in the event of a disaster.
SAN
ESX image ESX image ESX image ESX image ESX image ESX image
VM VM VM VM VM VM VM VM VM VM VM VM
VM VM VM VM VM VM VM VM VM VM VM VM
VM VM VM VM VM VM
Figure 3-33 HA moves the VMs from the failing node to the surviving nodes
In this case, all virtual machines running on the host that failed in this manner
stop as well, but HA will redistribute them on the surviving nodes and restart
them as shown in Figure 3-34 on page 140.
ESX image ESX image ESX image ESX image ESX image ESX image
VM VM VM VM VM VM VM VM VM VM
VM VM VM VM VM VM VM VM VM VM
VM VM VM VM VM VM VM VM VM VM
Note: When the physical host goes down, the virtual machines on that host go
down with it. These VMs will be restarted on other hosts, but the end user will
experience some downtime. Depending on the application or the service
running in the virtual machines, the downtime might be a few minutes.
Also notice that because all the relevant files of the virtual machines are now on
the SAN, the HA service only needs to recatalogue the virtual machines on the
proper hosts without a restore of such files. This addresses one of the major
concerns of ESX Server 2.x, the architecture of which required the VM
configuration files to be owned, typically on local storage, by the ESX Server host
that was originally running the virtual machine.
Configuring HA
Perform the following steps to configure VMware HA.
1. Define the cluster boundaries using the VirtualCenter management interface
as shown in Figure 3-35 on page 141. As you can see, we have a datacenter
that already contains an ESX Server 3.0 system.
2. Now create a new cluster in the new datacenter that we have defined as
shown in Figure 3-35. To create the cluster, right-click the datacenter and click
New Cluster.
3. Enter the name you want to assign to the cluster as well as whether this
cluster should support DRS, HA or both algorithms. We have chosen to
support both by adding check marks as shown in Figure 3-36 on page 142.
Click Next.
4. For the DRS feature, specify the level of automation you want to provide.
DRS can be configured in three different modes:
– Manual: VirtualCenter suggests whether to create or move virtual
machines, but full control is left to the administrator.
– Automated: VirtualCenter automatically assigns a host, based on
workload, to a newly created virtual machine but only suggests to the
administrator how to rebalance the workload at run time.
– Fully automated: VirtualCenter decides where to host a newly created
virtual machine as well as decides where and when to move virtual
machines across the cluster, by means of the VMotion feature, when the
workload demands it.
For the purpose of our test, we set the DRS feature to act in manual mode.
Click Next.
5. The next step is to configure HA but our beta version did not have a
configurable HA service. So in our case, the HA window only stated that it will
be enabled.
The new cluster is essentially an empty logical entity that is filled with physical
servers running ESX Server and that will be subject to the DRS and HA
algorithms we have defined in this cluster.
2. Enter the information for the host you want to add: host name (or IP address)
and the password for root. Click Next.
If the information provided is correct, VirtualCenter is granted the ability to
connect to the host and a summary page on the host’s status is presented,
similar to Figure 3-40 on page 146.
3. Click Next to display further summary pages. The host is then added to the
cluster. In Figure 3-41 on page 147, we added an 8-way x3950 to cluster
CLUS1.
4. Repeat the same steps to add a second host to the cluster. Figure 3-42 on
page 148 shows the result of adding a 4-way x3950 host to CLUS1.
Using HA
Now that the cluster has been set up, we can test failover.
Figure 3-43 on page 149 shows our setup when we tested HA. As you can see,
both HA and DRS functions are enabled. There are two hosts that comprise this
cluster and a summary of the resources available cluster-wise in terms of CPU
and memory.
Each of the two hosts support a certain number of running virtual machines.
Figure 3-44 on page 150 shows the first host.
To simulate the crash of a server, we shut-down the first host in the farm. As you
can see in Figure 3-46 on page 151, VirtualCenter recognizes that the host is no
Figure 3-47 on page 152 shows a point in the middle of the failover process
where virtual machine vm1 is restarted on the surviving node.
After a few seconds, the failover is completed and all virtual machines are
running on the remaining node available on the cluster, Figure 3-48 on page 153.
The initial configuration of the cluster is the same as shown in the Figure 3-32 on
page 138.
The idea behind DRS is that if the resources of a host in the cluster become
saturated, depending on its configuration DRS will either suggest or initiate a
VMotion move onto a host in the cluster that has more resources available. So if
one of your hosts is driving resource utilization at 90% for exampl,e and you have
other hosts running at a lower utilization rate, DRS will try to rebalance the usage
of the resources across all nodes, as you can see in Figure 3-50 on page 155.
ESX image ESX image ESX image ESX image ESX image ESX image
VM VM VM VM VM VM VM VM VM VM VM VM
VM VM VM VM VM VM VM VM VM VM VM VM
VM VM VM VM VM VM
60% CPU 70% CPU 40% CPU 50% CPU 90% CPU 50% CPU
utilization utilization utilization utilization utilization utilization
Figure 3-50 DRS initiates or recommends the transfer of the virtual machine.
ESX image ESX image ESX image ESX image ESX image ESX image
VM VM VM VM VM VM VM VM VM VM VM VM
VM VM VM VM VM VM VM VM VM VM VM VM
VM VM VM VM VM VM
60% CPU 70% CPU 60% CPU 50% CPU 70% CPU 50% CPU
utilization utilization utilization utilization utilization utilization
Using DRS
Now that we have introduced the concepts of DRS, we use it in this section. We
use the same cluster setup as described in 3.6.1, “VMware High Availability” on
page 138.
In this case the virtual machines running on the first host, an 8-way x3950, are
basically idle and the overall system utilization is very low, as you can see in
Figure 3-52 on page 157.
On the other hand, virtual machine vm2, a 4-way SMP virtual machine running
on the second 4-way host in the cluster, has a heavy workload, as shown in
Figure 3-53 on page 158.
This system is being pushed hard in terms of CPU and, even more importantly,
memory utilization. This is especially true if you consider that the 8-way system
in the cluster is idle most of the time. As a result of this, after a few minutes, the
DRS algorithms suggest the system administrator move vm2 from the second
host to the first one using VMotion. This recommendation is shown in Figure 3-54
on page 159.
Note: if we set DRS to fully automated mode, VMotion would move the
virtual machine without any sort of confirmation from the administrator and the
action would simply be recorded in the VirtualCenter event log.
The System x3950 is not only a superb platform to run standard ESX Server
workloads, the x3950, especially in multi-node configuration, is also the
reference platform when it comes to run scalable and enterprise workloads inside
virtual machines. There are at least two reasons for this, discussed in the
following sections:
3.7.1, “Big virtual machines require big hosts” on page 160
3.7.2, “NUMA-enabled 4-way virtual machines” on page 163
Dual-core processors do help in this trend but they do not solve all the problems
associated with running multiple enterprise workloads on the same physical
server, especially when there is an overlapping of utilization peaks of the virtual
machines.
One option would be to create a high priority resource pool on the server where
all the enterprise virtual machines can be placed. Then they can benefit from all
the resources of the server whenever needed. You can also create a low priority
resource pool on the same x3950 where all the other virtual machines get their
resources from so that when the high priority virtual machines are not fighting for
resources, the other low priority virtual machines can get them if needed.
This scenario does not require any manual user intervention and it is provided by
the basic VMkernel scheduling algorithms.
The message is that these setups could give you the confidence for running
multiple 4-way virtual machines peaking at the same time. On a 2-node x3950
single-core configuration, you could have, for example, five or ten 4-way virtual
For example, if you have a 4-way single core x3950 (the first row in the table),
you can only allow a single 4-way vm to peak at any given point in time. At that
point, the other virtual machines are basically not given any resources. With
more cores (as with a multi-node x3950 configuration), you will have more
flexibility because if you have four 4-way virtual machines on a 8-way dual core
x3950 complex, it is likely the virtual machines will not peak at the same time and
would therefore allow the other VMs to continue to be scheduled.
Note: For the sake of streamlining the discussion, we have not considered
enabling Hyper-Threading on this configuration and counting the HT logical
processors for this exercise.
Whenever you can find enterprise workloads that have different load patterns
and therefore are not hostile to each other in terms resource utilization for longer
periods of time, those workloads are good candidates for consolidation on a
multi-node x3950 configuration.
The NUMA algorithms of the VMkernel have the goal of containing a virtual
machine within a single NUMA node. Should the VMkernel find that the number
of virtual CPUs assigned to a virtual machine exceeds the capabilities of the
NUMA node in the system, the VMkernel will disable the NUMA algorithms for
that specific virtual machine. This means that this virtual machine will be
assigned memory as though it were a flat SMP system. This can affect negatively
the performance of the virtual machine. Fortunately this does not occur with the
x3950 as shown in Table 3-7.
The Intel Xeon architecture is the only x86 architecture currently available to
satisfy the requirement of having the size of the NUMA node that is larger than a
4-way virtual machine. This means that on the x3950, the VMkernel will use the
NUMA algorithms to optimize CPU and memory locality for all 4-way virtual
machines that get created on the system.
Table 3-7 summarize the size of the NUMA node in terms of logical processors
available depending on the configuration:
100 VMs
SAN
ESX Server ESX Server ESX Server ESX Server ESX Server ESX Server
Console OS
Production
By adding x3950 (or x3950 E) nodes to the existing x3950 servers, as shown in
Figure 3-57 on page 165, we are able to support twice the number of virtual
machines without changing the architecture of their farms and or clusters.
SAN
ESX Server ESX Server ESX Server ESX Server ESX Server ESX Server
Console OS
Production
Figure 3-57 Adding nodes to the existing x3950 servers doubles the supported VMs
The configuration can be extended further by simply adding more nodes to the
existing servers, upgrading some ESX Servers systems from 8-way (two nodes)
to 16-way (four nodes), as shown in Figure 3-58.
300 VMs
SAN
ESX Server ESX Server ESX Server ESX Server ESX Server ESX Server
Console OS
Production
Figure 3-58 Upgrading some ESX Server systems to 16-way (four nodes)
You might have noticed in this example that we have not considered upgrading
the Ethernet and Fibre Channel connections. Depending on the pattern of the
workloads being deployed in the virtual machines, you should consider adding
physical Ethernet connections to the virtual switches as well as Fibre Channel
adapters to increase the I/O throughput.
The following examples in Table 3-8 are of situations with which many systems
administrators might find themselves involved. Here, we assume that the current
configuration is based on single-node 4-way x3950 servers.
I have 300 virtual machines running on Update your six servers by adding an
six 4-way x3950 systems. I need to grow extra x3950 (or x3950 E) node to each.
my infrastructure adding new virtual This will provide you the ability to host
machines but I am short on CPU capacity more virtual machines without changing
and RAM across my farms and clusters. or adjusting anything in the way you
Network and storage throughput are not a operate today your infrastructure.
problem at the moment. What do I do?
I have 300 virtual machines running on Update your six servers by adding an
six 4-way x3950 systems. I need to grow extra x3950 (or x3950 E) node to each. In
my infrastructure adding new virtual each new node connect the onboard
machines but I am short on CPU Gigabit Ethernet controllers and install
capacity, RAM and I/O throughput across and connect a Fibre Channel adapter.
my farms and clusters. What do I do? This will provide you the ability to host
more virtual machines.
I have 300 virtual machines running on Add new I/O cards to your existing 4-way
six 4-way x3950 systems. I need to grow systems.
my infrastructure adding new virtual
machines but I am short on I/O
throughput across my farms and clusters.
CPU and RAM are not a problem at the
moment. What do I do?
Figure 3-59 on page 167 shows the CPU and memory of the existing 4-way
configuration.
In our example, we are using the two integrated Gigabit Ethernet controllers. As
you can see from Figure 3-60, the first connection is used to support the Console
OS, while the second connection is shared between the VMotion port group and
the production port group. Note that this is not an optimal networking
configuration, simply a test configuration.
Figure 3-61 ESX Server reboots so the hardware reconfiguration can take effect
5. After the two-node system has rebooted, the memory and CPU configuration
is correctly updated, as shown in Figure 3-62 on page 170.
Virtual Server offers cost saving tasks through virtual machines with advanced
levels of scalability, manageability and reliability. It is designed to deliver cost
saving tasks in software test and development, legacy rehosting and server
consolidation.
The following operating systems are supported as the host for Microsoft Virtual
Server:
32-bit operating systems
– Windows Server 2003 Standard Edition or later
– Windows Server 2003 Enterprise Edition or later
– Windows Server 2003 Datacenter Edition or later
– Windows Small Business Server 2003 Standard Edition or later
– Windows Small Business Server 2003 Premium Edition or later
– Windows XP Professional (for non-production use only)
64-bit operating systems (64-bit support only for Virtual Server 2005 R2)
– Windows Server 2003 x64 Standard Edition or later
– Windows Server 2003 x64 Enterprise Edition or later
– Windows Server 2003 x64 Datacenter Edition or later
– Windows XP Professional x64 (for non-production use only)
The guest operating systems running on the virtual machines can only be 32-bit
(64-bit virtual machines currently not supported) and the following Operating
Systems are supported;
Windows Server 2003, Standard, Enterprise, and Web Editions.
Note: Windows Server 2003 SP1 will run only as a guest in Virtual Server
2005 R2 only.
Windows Small Business Server 2003, Standard & Premium Editions
Windows 2000 Server and Advanced Server
Windows NT Server 4.0, Service Pack 6a
Windows XP SP2 (only on Virtual Server 2005 R2)
Linux
A number of Red Hat and SUSE Linux distributions are supported as guests
for Virtual Server 2005 R2. The complete list is available from:
http://www.microsoft.com/windowsserversystem/virtualserver/evaluatio
n/linuxguestsupport
The Microsoft Virtual Server Migration Toolkit (VSMT) is a free toolkit for Virtual
Server that allows businesses to migrate physical platforms to a virtual server
Migration
using VSMT
VM: NT 4.0
VM: OS and VM: OS and
and legacy
applications applications
applications
Windows NT 4.0
running legacy
applications Virtual hardware Virtual hardware Virtual hardware
Movement of
VMs between
cluster nodes
VM: OS and VM: OS and VM: OS and VM: OS and VM: OS and VM: OS and
applications applications applications applications applications applications
Virtual hardware Virtual hardware Virtual hardware Virtual hardware Virtual hardware Virtual hardware
Virtual Server host clustering uses Windows Server 2003 Enterprise Edition or
Datacenter Edition clustering and about 250 lines of Visual Basic® scripts. It
supports up to eight cluster nodes and can utilize SAN, iSCSI, or direct attached
storage.
The Virtual Server host clustering code is based from the clustering service that
is part of Windows Enterprise or DataCenter edition. See section 4.7.2,
“Clustered installation” on page 184 for more information.
The use of the SRAT table ensures that whereever possible, processes are run
in the same server as where the memory is allocated. The SRAT describes both
installed and hot-pluggable memory, that is, memory that can be added or
removed while the system is running, without requiring a reboot.
If memory is not local at any time during normal operation, the Microsoft Virtual
Server logs an error in the event log indicating that it is not using local memory.
The x3950 and Microsoft Virtual Server together is capable of adding nodes and
processors as you grow. For example, a two-node, 8-way solution could grow to
a 16-way configuration by adding two nodes each with four processors.
Currently, Microsoft Virtual Server Standard Edition supports up to four
processors and Enterprise Edition can support up to 32 processors. Both
editions support up to 64 GB of RAM.
Important: Virtual Server provides its own resource manager for allocating
system resources to virtual machines. You should not use any other resource
manager, such as Windows System Resource Manager, with Microsoft Virtual
Server.
The x3950 offers up to 64 GB SDRAM per chassis. Memory can be added while
the server is up and running with certain limitations. Hot-add memory is
supported with Microsoft Windows Server 2003 and it powers the dynamic
addition of main memory to increase performance. Both Microsoft Virtual Server
and the virtual machines running under Microsoft Virtual Server do not support
hot-add memory. However, by restarting the Virtual Server service, it should be
able to access the added memory.
Consult the Microsoft Article 903748 for Microsoft Virtual Server performance
tips, available at:
http://support.microsoft.com/default.aspx?scid=kb;en-us;903748
Consult the Virtual Server 2005 R2 pricing and licensing information from
Microsoft for additional information, available from:
http://www.microsoft.com/windowsserversystem/virtualserver/howtobuy
As shown in Figure 4-4, you can see that a four-way dual-core x3950 system will
present with 16 processors to the operating system.
Figure 4-4 x3950 with 4 dual core processor and Hyper-Threading enabled
If you look at the Device Manager, you will see that the operating system
recognizes 16 processors. If you disable Hyper-Threading, you will see eight
processors instead of 16.
4.6 Sizing
The rule of thumb is that each VM requires the same amount of hardware as
compared to a physical machine. Microsoft Virtual Server allows up to 64 VMs to
To install the host operating system, follow one of the following sets of
instructions:
Installing Windows Server 2003 x64 Edition on the x3950
http://www.pc.ibm.com/support?page=MIGR-60676
Installing Windows Server 2003 (32-bit) on the x3950
http://www.pc.ibm.com/support?page=MIGR-61178
You also need to install the Internet Information Services (IIS) component of the
operating system:
1. Open the Control Panel.
2. Open Add or Remove Programs.
3. Select Add/Remove Windows Components.
4. Select Application Server and select Internet Information Services.
Virtual Server host clustering is a way of combining Microsoft Virtual Server 2005
R2 with the server cluster feature in Microsoft Windows Server 2003. The setup
in this example is shown in the Figure 4-5.
Virtual hardware Virtual hardware Virtual hardware Virtual hardware Virtual hardware Virtual hardware
DS4300
shared
storage
As you can see in Figure 4-5, each cluster node consist of two x3950 cabled
together to form a single 8-way server. Both of the clustering nodes share an IBM
TotalStorage® DS4300 for storage. In this example we are using the clustering
feature in Microsoft Windows Server 2003 x64 Enterprise Edition and the
Microsoft Virtual Server 2005R2 Enterprise edition.
Note: These instructions are based from the Virtual Server Host Clustering
Step-by-Step Guide for Virtual Server 2005 R2, available from:
http://www.microsoft.com/downloads/details.aspx?FamilyID=09cc042b-15
4f-4eba-a548-89282d6eb1b3&displaylang=en
1. Configure the x3950 systems’ two 8-way complexes. See 2.5, “Multi-node
configurations” on page 46 for instructions
2. Install Windows Server 2003 x64 Enterprise Edition using these instructions:
http://www.pc.ibm.com/support?page=MIGR-60676
3. On each cluster node, install Internet Information Services (IIS). To install IIS,
open the Control Panel. Select Add or Remove Programs and then select
Add/Remove Windows Components. Select the check box for IIS under
Application Server.
g. Select whether you want to allow Virtual Server to enable the firewall
locally on the server for remote administration, as shown in Figure 4-8. In
this example we are using defaults. select Next.
h. Click Install to begin the installation. Click Finish when prompted that the
installation has complete.
i. Using the Cluster administrator, start the Cluster service on the first node
where you just installed Microsoft Virtual Server, as shown in Figure 4-9
on page 188.
6. Repeat all of step 5 on page 186 to install Microsoft Virtual Server on the
second node in the cluster and then restart the cluster service on that node.
7. Verify that both of the nodes are online and that the cluster resources are
online, as shown in the Figure 4-10.
Tip: You may specify additional physical disk resources for each virtual
machine. If you do specify additional disks, you need to ensure that the
physical disk for data to be dependent for the physical disk resource
with the guest operating system. This will ensure that all of the
resources that are associated with the data disk are online before the
guest operating system attempt to access the data on them.
c. From the second node, create a folder on disk X: called Guest1, as shown
in the Example 4-1. This folder will be used in a future step when installing
and configuring the guest VM.
10.On each node's local disk, in the systemroot\Cluster folder, copy the script
Havm.vbs, as shown in the Figure 4-12 on page 191. The script is available in
Appendix B of Virtual Server Host Clustering Step-by-Step Guide for Virtual
Server 2005 R2, available from:
http://www.microsoft.com/downloads/details.aspx?FamilyID=09cc042b-15
4f-4eba-a548-89282d6eb1b3&displaylang=en
Note: The script must be copied to the correct folder on each node's local
hard disk, not to a disk in the cluster storage.
The HAVM.VBS is a Visual Basic script that ensures that the guest VM
functions correctly when a failover or other cluster-related processes occurs.
The script also triggers restart of the guest VM if the guest VM stops running.
The script is configured as a Generic Script resource in the cluster.
11.Now you need to install and configure the guest VM. All the configuration files
of the guest VM must reside on the shared cluster disk resource, in this case
drive X.
a. On the computer that contains the management tool for Virtual Server
2005 R2, click Start → Programs → Microsoft Virtual Server → Virtual
Server Administration Web site. Highlight the cluster node that currently
owns the shared cluster disk resource X: (in Guest1Group).
b. In the navigation pane, under Virtual Networks, click Create, as shown in
Figure 4-13 on page 192.
c. Create the VMM, as shown in Figure 4-14 on page 193. Ensure that in the
Network adapter on the physical computer, you select the network adapter
associated with the public network (not the private network) and then click
Create.
e. In the line labeled .vnc file in Figure 4-16 on page 195, select the path,
then copy and paste it into a text editor such as Notepad for later use.
Tip: The purpose of this step is not to undo the creation of the virtual
network, but to clear Virtual Server of information that will prevent you
from moving the configuration file for the virtual network (the .vnc file) to
the cluster storage.
h. On the cluster node on which you created the .vnc file, open the command
prompt, and then navigate to the path that you copied into a text file in the
previous step. Move the *.vnc file to x:, as shown in Example 4-2.
X:\>cd Guest1
X:\Guest1>dir
Volume in drive X is DATA1
Volume Serial Number is B853-D7BC
Directory of X:\Guest1
05/01/2006 01:13 PM <DIR> .
05/01/2006 01:13 PM <DIR> ..
03/20/2006 04:09 PM 3,242 ClusterNetwork.vnc
1 File(s) 3,242 bytes
2 Dir(s) 697,157,910,528 bytes free
Note: You must move the file using Move, not copy using Copy.
Also, it has to point to the clustered disk volume, in this case disk X:.
12.The last step is to configure the Guest1 VM for failover. Perform the following
steps.
a. In Cluster Administrator, move Guest1Group to the other node (not the
node on which you were working in the previous procedure).
b. For the cluster node on which Guest1Group is currently located, open the
Virtual Server Administration Web site.
c. In the navigation pane, under Virtual Machines, click Add.
d. In Fully qualified path to file, type:
\\IPofMSVS\SharedFolderName\Guest1\ClusterNetwork.vnc, as shown in
Figure 4-20 on page 200.
e. Click Add.
f. On either cluster node, in Cluster Administrator, create a new script
resource with the properties in the following list.
Note: Do not bring this new resource online until you have completed
step h.
g. Apply the following for the script, as shown in the Figure 4-21 on
page 201:
i. Call it Guest1Script.
ii. Make it a Generic Script resource.
iii. Assign the resource to Guest1Group.
iv. For Possible Owners, make sure both cluster nodes are listed.
v. Add DiskResourceX to the list of resource dependencies.
vi. For the Script filepath, specify the following:
%windir%\Cluster\Havm.vbs
h. With Guest1Script in the Offline state, on the same node as in the previous
step, click Start → Run and enter the following command:
cluster res "Guest1Script" /priv VirtualMachineName=Guest1
This command associates the Guest1Script resource with the guest
named Guest1.
i. In Cluster Administrator, bring Guest1Group online. If you use the Virtual
Server Administration Website to view the node that is the owner of
Guest1Group, in Master Status, Guest1 will now have a status of Running.
13.Now it is time to install an Operating System on the Guest1 VM.
14.After the OS is installed on the Guest1 VM, you must install Virtual Machine
Additions on the guest. Virtual Machine Additions is included in Virtual Server
2005 R2 and improves the integration and performance of a virtual machine
running certain Windows operating systems.
Virtual Machine Additions will also improve many aspects of your experience
when using Virtual Server. For example, if Virtual Machine Additions is
installed on the virtual machine, you can move the pointer freely between the
virtual machine window and the host operating system when using the Virtual
Machine Remote Control (VMRC) client.
d. Once the virtual machine has started, point to the virtual machine name,
and then click Remote Control, as shown in the Figure 4-23 on page 203.
i. Proceed through the wizard. When the wizard is complete, you are
prompted to restart the virtual machine to complete the installation.
When Virtual Server is restarted, it detects any new hardware and, without any
further configuration, is able to use the new resources. You can then add and
configure extra VMs as needed. You also might want to configure the new
onboard Gigabith Ethernet adapters and any extra PCI adapters that you wish to
add to virtual machines.
150 VMs
SAN
Console OS
Production
By adding x3950 (or x3950 E) nodes to the existing x3950 servers, as shown in
Figure 4-26, we are able to support twice the number of virtual machines without
changing the architecture of their farms and or clusters.
300 VMs
SAN
Console OS
Production
Figure 4-26 Adding nodes to the existing x3950 servers doubles the supported VMs
The configuration can be extended further by simply adding more nodes to the
existing servers, upgrading some systems from 8-way (two nodes) to 16-way
(four nodes), as shown in Figure 4-27 on page 207.
SAN
Console OS
Production
Figure 4-27 Upgrading some Virtual Server systems to 16-way (four nodes)
You might have noticed in the above example, we have not considered
upgrading the Ethernet and Fibre Channel connections. Depending on the
pattern of the workloads being deployed in the virtual machines, you should
consider adding physical Ethernet connections to the virtual switches as well as
Fibre Channel adapters to increase the I/O throughput.
Table 4-1 shows examples of situations with which many systems administrators
might find themselves involved. Here, we assume that the current configuration is
based on single-node, 4-way x3950 servers.
Table 4-1 Common situations and suggested solutions for virtual servers
Situation Suggested Solution
I have 24 virtual machines running on six Update your six servers by adding an
4-way x3950 systems. I need to grow my extra x3950 (or x3950 E) node to each.
infrastructure adding new virtual machines This will provide you the ability to host
but I am short on CPU capacity and RAM more virtual machines without changing or
across my farms and clusters. Network adjusting anything in the way you operate
and storage throughput are not a problem today your infrastructure.
at the moment. What do I do?
I have 24 virtual machines running on six Update your six servers by adding an
4-way x3950 systems. I need to grow my extra x3950 (or x3950 E) node to each. In
infrastructure adding new virtual machines each new node connect the onboard
but I am short on CPU capacity, RAM and Gigabit Ethernet controllers and install
I/O throughput across my farms and and connect a Fibre Channel adapter.
clusters. What do I do? This will provide you the ability to host
more virtual machines
I have 24 virtual machines running on six Add new I/O cards to your existing 4-way
4-way x3950 systems. I need to grow my systems.
infrastructure adding new virtual machines
but I am short on I/O throughput across
my farms and clusters. CPU and RAM are
not a problem at the moment. What do I
do?
I have 24 virtual machines running on six Adding a number of 4-way x3950 systems
4-way x3950 systems. I need to increase and cluster all of the Virtual Servers
the high-availability and prevent from installations using Windows Server
systems failing. clustering.
Figure 4-28 on page 209 shows the CPU and memory of the existing 4-way
configuration.
Figure 4-28 x3950 4-way current dual-core CPU and memory count and
Hyper-Threading enabled
Note that for the purpose of the test we are only using the two integrated
Broadcom network adapters.
There is nothing that needs to be configured within MSVS to recognize the new
hardware. All of the hardware changes are managed by the host Operating
System.
The hardware in an IBM Director environment can be divided into the following
groups:
Management servers: one or more servers on which IBM Director Server is
installed
Managed systems: servers (physical systems and virtual machines),
workstations, desktop computers, and notebook computers that are managed
by IBM Director
Management consoles: servers, workstations, desktop computers, and
notebook computers from which you communicate with one or more IBM
Director Servers.
SNMP devices: network devices, printers, or computers that have SNMP
agents installed or embedded
Managed Clients
(Servers, Desktops, Laptops) SNMP devices
Each system in your IBM Director environment will have one or more of these
components installed:
IBM Director Server is installed on the system that is to become the
management server. Ideally, this is a single system in the environment, but
this is not always possible.
IBM Director Agent is installed on each managed system or virtual machine
(including the Console OS
IBM Director Console is installed on any system from which a system
administrator will remotely access the management server (called a
management console).
In the default installation under Windows, Linux, and AIX, IBM Director Server
stores management information in an embedded Apache Derby database. You
can access information that is stored in this integrated, centralized, relational
database even when the managed systems are not available. For large-scale
IBM Director solutions, you can use a stand-alone database application, such as
IBM DB2® Universal Database™, Oracle, or Microsoft SQL Server.
See the IBM Director 5.10 Hardware and Software Compatibility Guide for a
complete list of supported operating systems by platform, available from:
http://www.pc.ibm.com/support?page=MIGR-61788
A full list of the supported operating systems and virtualization products can be
found in the IBM Director 5.10 Hardware and Software Compatibility Guide:
http://www.pc.ibm.com/support?page=SERV-DIRECT
When you install IBM Director Console on a system, IBM Director Agent is not
installed automatically. If you want to manage the system on which you have
installed IBM Director Console, you must also install IBM Director Agent on that
system.
You can install IBM Director Console on as many systems as needed. The
license is available at no charge.
For a complete discussion of other plug-ins, see the redbook IBM Director 5.10,
SG24-6188.
The VMM agent uses the standard APIs provided by the VMware and Microsoft
products. With VMM installed, you can perform the following tasks from IBM
Director Console:
Correlate relationships between physical platforms and virtual components.
With VMM, IBM Director can recognize systems that contain virtual components
and create the following managed objects:
Coordinators (VirtualCenter)
Farm (VirtualCenter farms)
Hosts (ESX Server, GSX Server, Virtual Server)
Guest operating systems
In addition, VMM can be used for the creation of an event action plan to initiate
the transfer of virtual machines. More information about VMM can be found at:
http://www.ibm.com/servers/eserver/xseries/systems_management/ibm_direc
tor/extensions/vmm.html
Note: We were successfully able to use VMM 2.1 with ESX Server 3,
however at the time of writing, the level of support for ESX Server 3 was
still being determined (since ESX Server 3 had not been released).
For Microsoft Virtual Server users, VMM 2.1 has these new features:
Support Microsoft Virtual Server 2005 R2
Allow virtual machines to be created using 3.6 GB of RAM and 12 GB of disk
Note: VMM only runs on Windows systems, or the ESX Server Console OS.
Linux systems are not supported.
As shown in the Figure 5-2 on page 219, VMM can manage ESX Server systems
directly or through VirtualCenter. It also directly manage systems running GSX
Server and Microsoft Virtual Servers.
Guest OS
Guest OS
VMM Console
Extension
IBM IBM
Director Director
machine
machine
machine
Virtual
Virtual
Virtual
Console Server
VMM Server
Extension
VMM Agent Microsoft Virtual Server
Guest OS
xSeries 460
VMM Agent
D
VirtualCenter
VMM Guest OS
Guest OS
Guest OS
Guest OS
machine
machine
Virtual
Virtual
Agent
Service
console
VMware ESX Kernel
machine
machine
machine
machine
Virtual
Virtual
Virtual
Virtual
xSeries 460
Service Service
A console console
VMware ESX Server VMware ESX Server
xSeries 460 xSeries 460
B C
Figure 5-2 IBM Virtual Machine Manager architecture
With these components installed, you will be able to receive hardware alerts from
your x3950 and have those sent to the IBM Director management server for
processing.
ESX Server 3.0 now supports the use of the RSA II daemon for in-band alerting
for hardware events to IBM Director Agent.
At the time of writing, however, there was no compiled RSA II daemon for use in
ESX Server 3.0. This section describes how to compile the daemon and install it.
You must install the daemon before installing IBM Director Agent and the VMM
agent on the ESX Server console OS.
Note: Compiling the daemon requires files from a Red Hat distribution.
3. Login to the ESX Server service console and change to the temp directory
where all the RPMs have been copied to and install the RPMs using the
following command:
rpm -ivh *.rpm
The output is shown in Example 5-1.
4. Download and copy the source RPM for the latest USB daemon for Linux to
the ESX console, the latest code is available from:
http://www.pc.ibm.com/support?page=MIGR-59454
At the time of writing, the downloaded file is named
ibmusbasm-1.18-2.src.rpm.
6. Install the RSAII daemon using the rpm command from the
/usr/src/redhat/RPMS/i386 directory, as shown in the Example 5-3.
rpm -ivh ibmusbasm-1.18-2.i386.rpm
Tip: If you also want to manage the operating system and applications
installed in a virtual machines, you will also need to install IBM Director Agent
there as well. See the IBM Director Installation and Configuration Guide for
installation procedures.
To install IBM Director Agent in the Console OS of ESX Server, follow these
steps:
1. Download and the Linux version of IBM Director Agent for Linux to the ESX
console, available from:
http://www.pc.ibm.com/support?page=SERV-DIRECT
2. Unpack the tar file and install the agent using the dir5.10_agent_linux.sh
script, as shown in Example 5-4.
Tip: You can confirm the status of IBM Director Agent by issuing the following
command:
/opt/IBM/director/bin/twgstat
The document Managing VMware ESX Server using IBM Director, can provide
additional information, although it is based on IBM Director 4.21 and is available
from:
http://www.pc.ibm.com/support?page=MIGR-59701
IBM Director
IBM Director Server to IBM 14247 UDP and TCP, 4490 (hex) read, 4491
Director Agent 14248 UDP (LINUX only) (hex) write
IBM Director Agent to IBM 14247 UDP and TCP 4490 (hex) read, 4491
Director Server (hex) write
Service processors
Web-based access 80
NOTES:
1. A random TCP session is used for some tasks (such as file transfer) between the
IBM Director Agent and IBM Director Server. The operating system returns a port in
the range 1024-65535.
2. IBM Director Console opens a port in the 1024-65535 range then connects through
TCP to IBM Director Server using port 2033. When IBM Director Server responds
to IBM Director Console, it communicates to the random port in the 1024-65535
range that IBM Director Console opened.
In addition, you must install VMM on both the IBM Director management server
and any systems with IBM Director Console installed.
Note: Install IBM Director Agent on the system before installing VMM.
To install VMM on your server running VirtualCenter, download the exe from the
above URL and run it.
During the installation, accept the defaults. VMM will detect that VirtualCenter is
installed, as shown in Figure 5-3.
Note: Install IBM Director Agent on the Console OS before installing VMM.
See 5.3.2, “Installing IBM Director Agent on ESX Server” on page 224.
If you are not running VirtualCenter, then install the VMM agent on the ESX
Server Console OS.
You will need to have IBM Director installed before installing VMM.
Download and run the VMM installer from the link on the previous page. During
the installation, the installer detects that either the IBM Director Server and
Console are installed (Figure 5-4 on page 231) or just the Console is installed.
Simply download the installer and follow the instructions. More correctly, in
Windows, the RSA II is in fact a service. When it is installed, you will see it listed
in Services in the Control Panel.
To install IBM Director Agent, download and run the installer from this URL and
following the onscreen prompts. Details of the installation procedure and the
dialog boxes you are presented with are fully documented in the IBM Director
Installation and Configuration Guide, available in PDF format or online in the IBM
InfoCenter:
http://www.ibm.com/servers/eserver/xseries/systems_management/ibm_direc
tor/resources
Note: Install IBM Director Agent on the system before installing VMM.
To install VMM on your server, download the executable file from the URL on 229
and run it.
During the installation, accept the defaults. VMM detect that Microsoft Virtual
Server is installed, as shown in Figure 5-5 on page 233.
You will need to have IBM Director installed before installing VMM.
Download and run the VMM installer from the link on page 229. During the
installation, the installer detects that either the IBM Director Server and Console
are installed (Figure 5-6 on page 234) or just the Console is installed.
The tasks are associated with each object, such as Virtual Machine Manager,
Coordinator, VMM farm, hosts and virtual machines. These tasks are described
more in detail here:
Virtual Machine Manager tasks
– Create VMM Farm: This task is used to create a VMM farm.
– Help: Provides online help for VMM.
– Migrate All Virtual Machines: This task is used to create a IBM Director
schedulable tasks for migrating all virtual machines from a single host to a
different host.
– Migrate Single Virtual Machine: This task is used to create the IBM
Director schedulable tasks for migrating a single virtual machine from one
host to a different host.
– Start Vendor Software: This task is used to start the virtualization vendor
application for the targeted VMM object.
Note: If you do this task, the VMM farm cannot be rediscovered and
instead must be recreated.
– Start (Microsoft Virtual Server only) starts all hosts that are associated with
the targeted VMM farm.
– Stop (Microsoft Virtual Server only) stops all hosts that are associated with
the targeted VMM farm.
Host Management Tasks
– Create Virtual Machine creates Virtual Machines within the Virtual Server.
– Discover Virtual Machines discovers all virtual machines that are
associated with a host.
– Force Power Off All Running Virtual Machines powers-off all running
virtual machines that are associated with a host without an orderly shut
down of any guest operating systems.
– Power On All Stopped Virtual Machines powers-on all stopped virtual
machines that are associated with a host.
– Register Virtual Machine
– Remove Host From VMM Farm removes the managed object for the host
from the VMM farm object in IBM Director Console.
– Resume All Suspended Virtual Machines resumes all suspended virtual
machines that are associated with a host.
– Start (Hosts that are running Microsoft Virtual Server only) starts the host
that is represented by the managed object. You can create scheduled jobs
that use this task only for hosts that are currently stopped.
– Stop (Hosts that are running Microsoft Virtual Server only) stops the host
that is represented by the managed object. You can create scheduled jobs
that use this task only for hosts that are currently started.
In the IBM Director Console, as shown in Figure 5-7, the group VMM Systems
contains all systems that have the VMM agent installed. Tasks available are a
combination of those tasks in the Tasks pane (Figure 5-7) and those available in
the right-click context menu, Figure 5-8 on page 237.
2. Enter the name of the virtual machine, assign processor, disk, memory, and
the network adapter. then click OK. When the VM is created, the Director
console will be updated with the new virtual machine (Figure 5-10).
Figure 5-10 The new VM created now appears in the list of VMM systems
In the next example, we show how to perform the same task on a ESX Server.
1. Right-click the Microsoft Virtual Server within the VMM Systems area and
click Create Virtual Machine as shown in Figure 5-11 on page 239.
3. As shown in the Figure 5-13, we have now created the ESX1 virtual machine
on ESX Server using VMM.
The user interface is the same no matter if you are using virtualization
technology from either Microsoft, VMware, or both at the same time.
Note: The VMM Agent does not enable VirtualCenter VMotion for a newly
added ESX Server host. If you want to migrate virtual machines
dynamically to or from this host, you must use VirtualCenter to enable
VMotion for the host. For information about VirtualCenter VMotion
requirements, see the documentation that comes with VirtualCenter.
5.5.1 Migration
VMM supports two types of migration based on the VMM Agent associated with
the virtual machines that are being migrated:
When using the VMM Agent for VirtualCenter with ESX Server hosts, VMM
uses dynamic migration.
Note: The VMM Agent for VirtualCenter does not support dynamic
migration for GSX hosts.
Dynamic migration
VMM supports dynamic migration (sometimes referred to as live migration or
simply migration in the VMotion documentation) of virtual machines when using
the VMM Agent for VirtualCenter. Dynamic migration is supported only for hosts
that are running ESX Server in a VirtualCenter environment. It is not supported
for hosts that are running GSX Server in a VirtualCenter environment.
The guest operating systems on migrated virtual machines remain available for
use; they are not shut down. VirtualCenter VMotion must be enabled on both the
source host and destination host between which you want to migrate virtual
Static migration
VMM supports static migration of virtual machines when using the VMM Agent
for VirtualCenter with GSX Server hosts or the VMM Agents for ESX Server,
GSX Server, or Virtual Server.
One of the first steps towards a true self-managing system is establishing what
actions are to be taken at the first indications of trouble. You are probably already
familiar with IBM Director Event Action Plans which enable you to automate
responses to alerts, with notification or actions to minimize outages.
2. In the first window of the Wizard, enter a name for the plan and click Next. We
entered PFA.
4. In the Event Filters page of the Wizard, select the check boxes adjacent to the
types of events you want to monitor. In this example, we are going to trigger
on PFA events, shown in Figure 5-18 on page 247. The following event filters
are available. The one we chose is in bold:
– Hardware Predictive Failure Analysis
– (PFA) events
– Environmental sensor events
– Storage events
– Security events
– IBM Director Agent offline
– CPU Utilization
– Memory use
5. Select the action to be taken after the event has occurred and been received
by the IBM Director Server. The available options using the Wizard
– Send an e-mail or
– Send a page
– Start a program on the server, the event system, or any other system.
There are many other actions available in the regular Event Action Plan
Builder, but the Wizard only offers these two for the sake of simplicity. In our
example we chose E-mail, as shown in Figure 5-19.
7. Review the summary and select Finish when finished, as shown in the
Figure 5-21.
In this example, we have now configured IBM Director to send an e-mail if there
is a PFA event on the Microsoft Virtual Server.
4. Next, you must configure the threshold. In this example we gave the threshold
an appropriate name, configured the threshold to generate an event if CPU
Utilization is sustained at 80% during a 5-minute(s) Minimum Duration, as
shown in Figure 5-26 on page 252.
5. Click OK to save the changes, then click File →> Save As to save the
resource monitor, as shown in Figure 5-27 on page 253.
6. Give the Resource Monitor an appropriate name and click OK, as shown in
Figure 5-28. Click File → Close to exit.
7. In the next step we are going to verify that the resource monitor got created
OK. To do this, right-click the Virtual Server system again and click All
Available Thresholds.
The thresholds defined on this system will now display, as shown in
Figure 5-29 on page 254.
8. Use the Event Action Plan Wizard to create a simple EAP. In this example, we
create a simple EAP with the characteristics shown in Figure 5-30.
9. Associate this event action plan to the server running Microsoft Virtual Server
either in the Wizard or by dragging and dropping it in the IBM Director
Console.
10.Now we need to create a new event action to suspend the Development VM.
Launch the Event Action Plan Builder.
11.In the Actions pane, right-click Manage a Virtual Machine and click
Customize as shown in Figure 5-31 on page 255.
12.Figure 5-32 opens. Select the Development virtual machine from the
drop-down list and select Suspend in the Action drop-down list.
13.Select File → Save As, type an appropriate name, and click OK, as shown in
the Figure 5-33 on page 256.
14.Associate the new customized action to the CPU Utilization event action plan
by dragging and dropping it as shown in Figure 5-34 .
You have now successfully configured IBM Director to suspend the Development
VM if CPU Utilization is above 80% on the Microsoft Virtual Server. Other
examples of Event Action Plans includes trigger migrations based on hardware
based events.
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this redbook.
IBM Redbooks
For information on ordering these publications, see “How to get IBM Redbooks”
on page 265. Note that some of the documents referenced here may be available
in softcopy only.
VMware ESX Server: Scale Up or Scale Out?, REDP-3953
Introducing Windows Server x64 on IBM Eserver xSeries Servers,
REDP-3982
IBM Director 5.10, SG24-6188
Tuning IBM Eserver xSeries Servers for Performance, SG24-5287
Planning and Installing the IBM Eserver X3 Architecture Servers,
SG24-6797
Server Consolidation with VMware ESX Server, REDP-3939
Implementing VMware ESX Server 2.1 with IBM TotalStorage FAStT,
SG24-6434
Introducing Microsoft Virtual Server 2005 on IBM Eserver xSeries Servers,
REDP-3912
IBM Eserver xSeries Server Consolidation: an Introduction, REDP-3785
Other publications
These publications are also relevant as further information sources:
Whitepaper: Managing VMware ESX Server using IBM Director - IBM
Director
http://www.pc.ibm.com/support?page=MIGR-59701
IBM Director 5.10 publications
http://www.pc.ibm.com/support?page=MIGR-61788
Online resources
These Web sites and URLs are also relevant as further information sources:
Numerics D
data center 86
41Y5004, export memory card 49
default IP address 54
4-way Virtual SMP 159–163
deprivileging techniques 31
64-bit support 10
DHCP 55
differences
A x445 and x440 37
Active Memory 42 DIMMs 41
adapter placement 78 disaster recovery 2
adding nodes 66 disk partition sizes 101
application deployment 3 Distributed Availability Services
See High Availability
Distributed Resource Scheduler 81, 154–159
B
best practices 98 about 154
BIOS configuration 55 automation 155
BIOS settings 72 configuring 143
BladeCenter 18 moving VMs 159
block diagram 41 recommendations to move 158
boot from SAN 2 using 156
Boot Loader 128 dual-core processors 25, 38, 87
ESX Server 94
Virtual Server 181
C dynamic migration 242
cables 51
cache 44
Calgary 41 E
Chassis ID 58 emulation 4
ChipKill 36 EPIC 5
Chipkill memory 43 ESX Server 12, 79–171
CISC 5 See also ESX Server 2.5
Citrix 83 See also ESX Server 3
client/server 21 4-way Virtual SMP 159–163
clock frequency 28 add to farm using VMM 241
cluster. 85 adding x460 nodes 168
clustering 177 best practices 98
comparison 14 create a VM using VMM 238
configuration 54 dual-core processors 87, 94
consistent drivers 4 farm design 120
Consolidated Backup 81 firewall configuration 226
containment 24 home node 89
core 27, 88 Hyper-Threading 87, 93
CPU options 74 IBM Director 214
create partitions 57 IBM Director Agent 215
initial placement 90
Index 269
M paravirtualization 9, 32
maintenance 3 reasons 9
management 82, 211–256 partial merge 59, 70
Master Boot Record 128 Partition Merge timeout minutes 59
memory partitioning 56
Active Memory 42 example 63
Chipkill memory 43 partitions 50
disabling during POST 44 Paxville 26, 32
hot-add and replace 44 PCI adapter placement 78, 99
memory mirroring 43 PCI bridge 42
Memory ProteXion 42 Periodic SMI Generation 55
memory scrubbing 42 Potomac 26
memory configuration 72 power button 50
memory mirroring 73 power consumption 20
memory optimization 76 prefetch 75
memory ports 41 primary node 48
Memory ProteXion 73 products 12
merging nodes 49 prtition members 60
Microsoft Virtual PC 13
Microsoft Virtual Server 13
R
See Virtual Server reasons 2
Microsoft Virtual Server Migration Toolkit 175 rebalancing 91
migrating VMs 241 recovery of a complex 70
multi-node 47 Redbooks Web site 265
MXE-460 47 Contact us xii
Redundant Bit Steering 73
N reliability 33
NetBurst 27 resource pool 161
network adapter usage 100 ring 30
networking 20 RISC 5
NICconfiguration 103 RSA II 54
node failure 70 scalable system setup 57
node recovery 70
nodes 47
S
NUMA 45, 87, 89 scalability 47
4-way virtual machines 163 scalability bus 51
Virtual Server 179 scale-out 15, 18, 22
scale-up 15–16, 22, 164
O scrubbing, memory 42
objects 85 secondary nodes 48
On merge failure 59 security 33, 83
on-demand business 3 server consolidation 2, 20
OS USB Selection 55 server containment 2, 24
OS virtualization 8 ServeRAID-8i 49
service console firewall 110
service console memory 102, 107
P service console NIC 128
parallelism 29
single system image 16
Index 271
defined 1
emulation 4
full virtualization 8
OS virtualization 8
paravirtualization 9
products 12
reasons 2
Virtualization Technology 32
vitual machines
4-way SMP 159
VMFS 3 82, 120
VMFS partitions 108
VMkernel 7
VMotion 13
VMware 12
VMware Consolidated Backup 81
VMware HA 81
VMware Workstation 12
vmx files 121
vSwitch policy exceptions 109
W
Windows System Resource Manager 180
wizard 245
Workstation 12
X
X3 Architecture 35, 40
x3950 35–78
x3950 E 37, 47
XA-32 chipset 41
XceL4v cache 44
XEN 9
Xen 13, 32
XpandOnDemand 46
xSeries 460 35–78
Virtualization on the
IBM System x3950
Server
Covers VMware ESX Virtualization is becoming more and more a key technology
Server 2.5.2 & 3.0 enabler to streamline and better operate data centers. In it’s INTERNATIONAL
and Microsoft Virtual simplest form, virtualization refers to the capability of being TECHNICAL
Server R2 able to run multiple OS instances, such as Linux® and SUPPORT
Windows, on a physical server. ORGANIZATION
Describes how the Usually the concept of virtualization is associated with
x3950 is an ideal high-end servers, such as the IBM System x3950, that are
solution for able to support and consolidate multiple heterogeneous BUILDING TECHNICAL
virtualization software environments. The System x3950 is a highly INFORMATION BASED ON
scalable x86 platform capable of supporting up to 32 PRACTICAL EXPERIENCE
Includes IBM processors and 512 GB of memory and is aimed at customers
that want to consolidate data centers.
Director’s Virtual IBM Redbooks are developed by
Machine Manager Between the server hardware and the operating systems that the IBM International Technical
will run the applications is a virtualization layer of software Support Organization. Experts
that manages the entire system. The two main products in from IBM, Customers and
Partners from around the world
this field are VMware ESX Server and Microsoft Virtual Server. create timely technical
This IBM Redbook discusses the technology behind information based on realistic
scenarios. Specific
virtualization, x3950 technology, and the two virtualization recommendations are provided
software products. We also discuss how to manage the to help you implement IT
solution properly as though they all were a pool of resources solutions more effectively in
with Virtual Machine Manager, a unique and consistent your environment.
management interface.