Exchange Server 2003 Design and Architecture at Microsoft: Technical White Paper
Exchange Server 2003 Design and Architecture at Microsoft: Technical White Paper
Introduction...........................................................................................................................6
Overview of Current Network Infrastructure 6
Availability/Reliability/Manageability Enhancements 14
Improved Security 16
Mobility Features/Enhancements 18
Conclusion............................................................................................................................47
Solution The migration from Microsoft Exchange 2000 Server to Microsoft Exchange Server 2003 led
Microsoft IT upgraded its to some significant changes in the messaging architecture at Microsoft. Microsoft IT has
messaging infrastructure worldwide moved toward a fully clustered, mailbox server environment. Each of these server clusters
to use Exchange Server 2003 on
clustered Windows Server 2003 are connected to one or more Storage Area Network (SAN) enclosures for its data storage.
servers attached to Storage Area The use of clustering technology has improved reliability, increased availability, and improved
Network (SAN) systems. the process of performing rolling upgrades.
Benefits The benefits of deploying Exchange 2003, especially when combined with the benefits
• Consolidation. The use of derived from the deployments of both Microsoft Windows Server™ 2003 and Microsoft
Windows Server 2003’s
improved clustering technology Office 2003, have enabled Microsoft to consolidate its messaging infrastructure. Microsoft IT
enabled Microsoft IT to has begun implementing its plans to consolidate 113 mailbox servers in 75 locations
implement a major mailbox
server consolidation. worldwide to just 38 mailbox servers in seven locations. Exchange 2003 also supports all
• Mobility Improvements. mobility messaging services, such as Outlook Web Access (OWA), Outlook Mobile Access
Exchange 2003 integrates (OMA), and Exchange ActiveSync® (EAS), on the same server, enabling Microsoft IT to
Outlook Mobile Access and
Exchange ActiveSync with
additionally consolidate its worldwide front-end server infrastructure.
Outlook Web Access to improve
mobile messaging. The messaging data storage infrastructure has also been updated. Data storage, once a
• Improved SLA Performance. combination of direct attached Small Computer System Interface (SCSI) storage arrays at
The use of SANs enabled remote locations and SAN solutions in the Redmond, Washington headquarters data center
Microsoft IT to increase the
number of mailboxes per server
have been replaced by SANs at all locations. These changes have enabled Microsoft IT to
and enhance Microsoft IT’s increase the number of mailboxes per server and thoroughly enhanced the performance and
ability to backup and restore capability of backup and recovery solutions as well.
mailbox data in a timely manner.
Products & Technologies As of this writing, Microsoft IT has significantly reduced administrative overhead for
Exchange, improved system performance and service availability, and improved its own
• Microsoft® Windows Server®
2003 ability to meet its Service Level Agreement (SLA) obligations. Those benefits should become
• Microsoft Exchange Server 2003 even more dramatic as the company moves closer to its consolidation goal.
• Microsoft Office 2003
• Microsoft Office Outlook® 2003 Note: For security reasons, the sample names of forests, domains, internal resources, and
• Microsoft Operations Manager organizations used in this paper do not represent real resource names used within Microsoft
• Storage Area Networks and are for illustration purposes only.
The purpose of this document is to provide an overview of the architecture and design
decisions made during the upgrade of Exchange Server 2003 at Microsoft. The paper
focuses on the hardware selection and configuration aspects of the project. It also includes
discussions on the key technology wins and best practices that emerged from the upgrade.
Since Microsoft IT is a leading edge implementer of Microsoft technologies and products, the
organization brings a unique set of requirements as well as innovative approaches to meeting
the needs of its customers. This paper describes these requirements and approaches, as
well as the way they affected design decisions for the deployment. The intended audience for
this white paper includes technical decision makers, system architects, IT implementers, and
messaging system managers.
Microsoft IT based its mission for migrating from Exchange 2000 to Exchange 2003 on
achieving several objectives:
• To test and improve the product before Microsoft offered it to its customers.
• To consolidate Exchange server sites worldwide to reduce server maintenance and
administration costs and workload.
• To simplify the messaging infrastructure based on standardized server and storage
hardware for all deployment locations.
• To improve the ability of Microsoft IT to meet its SLA obligations for data backup and
restore.
• To significantly improve the end-user experience with messaging services at Microsoft.
Microsoft IT met all these objectives when it deployed Exchange 2003.
The network is architected following a multi-domain routing model. It is divided into four
regional networks, with each network functioning as a single Open Shortest Path First
(OSPF) routing and addressing domain. The four regions cover the following areas: 1. the
Puget Sound metropolitan area in western Washington State; 2. Europe, Africa, and the
Middle East; 3. Japan, the Pacific Rim, and the South Pacific, and 4. the remainder of North
America and South America.
Each regional network consists of a backbone area (Area 0) and multiple areas to ensure
scalability of each regional network. External Border Gateway Protocol (EBGP) is used to
exchange routes between the regional networks to ensure the scalability of the network as a
whole.
This network relies on Gigabit Ethernet and Packet over Synchronous Optical Network
(SONET), using privately owned or leased Dark Fiber as the transport medium. In the metro
area, efficient use of limited fiber resources is realized by leveraging Wave Division
Multiplexing (WDM) technologies to provision multiple circuits across a single physical link.
The available network bandwidth is significant for applications like Exchange Server 2003
and site-to-site connectivity. As of June 2003, the network had grown to encompass:
In the “dog food” messaging environment of Microsoft IT, servers regularly receive software
patches, operating system test releases and upgrades, Exchange server test releases and
upgrades, and more. Each Exchange server is “touched” by Microsoft IT for these software
upgrades on an average of two times each month. The changes to software are implemented
to test new scenarios, meet specific requirements, and continually run the latest application
concepts through real world, enterprise-level testing. The rate of change is very high in
Microsoft IT.
At the time of this writing, the messaging environment at Microsoft consists of more than 200
servers, including 190 Exchange 2003 servers (113 of which are mailbox servers) in 75
locations worldwide, including servers in additional cross-forest test environments. This
environment supports:
• Global mail flow of 6,000,000 messages per day, with 2,500,000 average Internet e-mail
messages per day, 70 percent of which is filtered out as either unwanted spam e-mail,
virus-infected, or to invalid e-mail addresses. Comparing bytes over the wire, the size
ratio of blocked message content versus accepted message content received at
Microsoft is 40:1. The average size of a typical e-mail message is 44 KB.
• Approximately 85,000 mailboxes, each being increased from a 100 MB to 200 MB limit.
Average 100 MB mailbox was only 44 MB in size.
• More than 85,500 distribution groups.
• More than 230,000 unique public folders managed on public folder servers.
The Microsoft IT server infrastructure includes:
• The global service availability Service Level Agreement (SLA) goal in the Main corporate
forest, calculated as the availability of mailbox databases per minute (including both
planned and unplanned outages), was 99.9 percent for stand-alone server designs. This
was increased to 99.99 percent for the new clustered server designs used with
Exchange 2003.
• Worldwide e-mail delivery in less than 90 seconds, 95 percent of the time.
• Backup and restore operation SLA of less than one hour per database.
Note: For security reasons, the sample names of forests, domains, internal resources,
and organizations used in this paper are fictitious. They do not represent real resource
names used within Microsoft and they are in this document for illustration purposes only.
Mailbox 113 38
Public Folder 20 11
Messaging Hub 12 7 **
Internet Gateway 22 18
Front-End ***** 14 12
Antivirus 9 7
* The mailbox server consolidation project is slated to be completed as of the end of the calendar
year 2003.
** Microsoft IT will set up seven messaging hubs and four additional dual-purpose servers that
will provide messaging hub services.
*** Exchange Instant Messaging servers will be eliminated as the messaging service is migrated
to Windows Real Time Communications (WinRTC) servers.
**** All of the Free/Busy server services will be provided by existing Public Folder servers.
Microsoft IT will not set up any dedicated Free/Busy servers at Microsoft.
***** Front-End servers were consolidated with the deployment of Exchange 2003 since the
technology formerly included in Mobile Information Server (MIS) 2002 product was added into
Exchange 2003. To increase system availability, each Exchange 2003 front-end server
deployment site was configured with a pair of load-balanced servers.
Since the release of Exchange 2000 on Windows 2000, the limits and boundaries imposed
by the Exchange 5.5 model were no longer a concern. The ability to place servers in routing
groups independent of their administration group membership allowed Microsoft IT to
optimize the routing topology without losing the advantages of large administrative groups.
Restore Time/Database ~12 hours ~12 Hours ~8 Hours ~1 Hour ~25 minutes *
The storage design varied depending upon the requirements of each server configuration. All
Exchange 2000 mailbox servers supported 100 MB mailboxes. The regional server
configurations used direct attached SCSI storage disk arrays that were backed up over the
100 Mbps LAN. The data center configuration servers used three SAN arrays, each one
comprising one SG. They were backed up over the Gigabit LAN.
Microsoft IT used best practice guidelines when designing their original Exchange servers
with consideration towards maximizing system performance and availability with both the
server and storage hardware. To optimize the disk input/output (I/O), each volume of an SG
Microsoft IT configured each SG to maintain three separate LUNS. The mailbox data LUN
using 24 18-GB disks and the Log LUN using six 18-GB disks were both configured using a
striped mirror configuration, known as Redundant Array of Independent Disks (RAID)-10. The
SAN also maintained a dedicated backup LUN utilizing 12 36-GB disks in a RAID-5
configuration. This LUN was used to support two days of online, disk-to-disk backup
retention.
Each SG supported five databases, and each database supported 200 mailboxes, meaning
that they could support up to 1,000 mailboxes per SG and 3,000 mailboxes per server.
Since Exchange 2000 does not offer support for new recovery options such as Recovery
Storage Group (RSG) functionality or Volume Shadow Copy Service (VSS), a database
outage due to corruption on an Exchange 2000 Server meant that the process of database
restoration would result in an extended outage. In many sites, backups were managed
across multiple computers in a datacenter, which resulted in backups and restores occurring
over the 100 MB LAN, for which restore times averaged, at best, 16 GB per hour. The
original restore SLA was full database restore in one hour, a goal that was quickly becoming
unattainable.
Data restoration required the creation of a temporary restoration server to serve as a staging
server for retrieving data from tape. Microsoft IT learned that it in addition to the time it took to
restore the data, before that process could start, a tape drive had to read and seek the
starting point of that particular database on a tape. This process often entailed a wait of 90
minutes or more before any data actually transferred to disk. The typical throughput for data
restoration (once data began to flow) on the Microsoft IT 100 Mbps network was
approximately 300-350 MB per minute. With a selective restoration of a sample 15 GB
database, the total time needed to complete the job was often more than two hours – far in
excess of the SLA.
In the end, Microsoft IT based its entire architecture of Exchange 2000 on the technical
requirements for meeting backup and restore efforts within the allotted SLA time window.
In addition, Exchange 2003 resolved the Exchange 2000 challenges for Microsoft IT as
described earlier. The deployment of Exchange 2003 enabled Microsoft IT to improve service
to its customers and to reduce operations requirements. Microsoft realized the following
business benefits:
Normally an increased number of mailboxes per server and a greater amount of data per SG
would present an increased risk in the event of failure. Indeed, Microsoft IT measures
database service availability as a factor of downtime multiplied by the number of databases
affected. For example, a one-minute outage affecting a single SG of five databases on a
server containing three SGs (containing 15 databases) is measured as five minutes of
downtime. In addition, Microsoft IT studied its downtime incidents and learned that its
planned downtime exceeded its unplanned downtime by a factor of 6:1.
Despite the fact that the number of mailboxes per server is growing, and that mailboxes are
doubling in size, the site and server consolidation project is expected to improve Microsoft
IT’s overall availability as well as its backup and restore performance SLAs. It is also
expected to reduce the Microsoft IT server management workload significantly, thereby
reducing costs.
For more information about Microsoft IT’s Exchange Server 2003 site consolidation plan, see
the iT Showcase technical white paper titled, “Exchange 2003 Site Consolidation” at
http://www.microsoft.com/technet/itshowcase.
Availability/Reliability/Manageability Enhancements
Exchange 2003 offers a variety of enhancements that make it a compelling upgrade.
• Improved method for moving mailboxes. The Exchange Task Wizard now allows you
to select as many mailboxes as you want and then, using the task scheduler, to
schedule the move to occur at some point in the future. You can also use the scheduler
to cancel any unfinished moves at a selected time. Using the wizard’s multi-threading
capabilities, you can move up to four mailboxes simultaneously.
• Improved Public Folder interfaces. To make public folders easier to manage,
Exchange 2003 includes several new public folder interfaces in the form of tabs.
• The Content tab displays the contents of a public folder in Exchange System Manager.
• The Find tab enables searches for public folders within the selected public folder or
public folder hierarchy. A variety of search criteria can be specified, such as the folder
name or age. This tab is available at the top-level hierarchy level as well as the folder
level.
• The Status tab displays the status of a public folder, including information about servers
that have a replica of the folder and the number of items in the folder.
• The Replication tab displays replication information about the folder.
• New Mailbox Recovery Center. Using the new Mailbox Recovery Center, you can
simultaneously perform recovery or export operations on multiple disconnected
mailboxes.
• Enhanced Queue Viewer. The Queue Viewer improves the monitoring of message
queues. Enhancements include:
• The X.400 and STMP queues are displayed in Queue Viewer, rather than from their
respective protocol nodes.
• The Disable Outbound Mail option allows you to disable outbound mail from all SMTP
queues.
• The refresh rate of the queues can be set using the Settings option.
• Messages are searchable based on the sender, recipient, and message state using Find
Messages.
• Queues are clickable for displaying additional information about that queue.
• Previously hidden queues, DSN messages pending submission, Failed message
retry queue, and Messages queued for deferred delivery, have been exposed.
A volume mount point is a feature of the NTFS file system that allows linking of multiple disk
volumes into a single tree, similar to the way the Distributed File System (DFS) of a server
links remote network shares. Administrators can link many disk volumes together with only a
single drive letter pointing to the root volume. The combination of an NTFS junction and a
volume mount point can be used to graft multiple volumes into the namespace of a host
NTFS volume.
Improved Security
When Microsoft prioritized security as its first order of business, Exchange 2003 realized
many benefits:
Kerberos
Exchange 2003 uses Kerberos delegation when sending user credentials between an
Exchange front-end server and Exchange back-end servers. In previous versions of
Exchange 2003 also uses Kerberos when authenticating users of Microsoft Office
Outlook 2003.
Permission improvements mean the Windows Cluster Service no longer requires Exchange
Full Administrator rights to create, delete, or modify an Exchange virtual server.
Mobility Features/Enhancements
Significant enhancements were made in Exchange 2003 for the mobile, client-side
experience. All of the mobility features previously found in Mobile Information Server 2002
For OWA users connecting by means of either dial-up, low bandwidth wireless networks, or
by using Secure Sockets Layer (SSL), the new use by Exchange 2003 of data compression
technology provides substantial overall performance improvements compared to those
realized from using previous versions of Exchange Server. Additional performance
improvements were attained by the elimination of all ActiveX controls required to use OWA
on client computers connecting to Exchange 2003. When using earlier versions of Exchange
Server, these controls, when not available in the client computer’s Internet Explorer cache,
had to be downloaded each time OWA was run.
MIS had to be installed in every network domain where these services were needed. Since
Exchange 2003 comes with built-in mobile services, installation on network domains is no
longer necessary.
Furthermore, Exchange 2000 users were limited to using only the MIS servers located in their
home domains. Users from a domain within the Microsoft corporate network in which the MIS
server was off-line could not use the MIS servers from other sub-domains to access these
services.
Exchange 2003 has eliminated the domain boundary limitations for OMA. Any user enabled
for OMA use can use mobile services on any of the front-end servers, regardless of their
network domain. As an added benefit for Microsoft IT, if one region’s Exchange front-end
servers had to be taken offline for service, the user could still access those services from the
remaining servers on the network, thereby all but eliminating downtime for this service.
To reduce the amount of traffic a device might receive for a user who regularly receives large
quantities of e-mail, Windows Mobile 2003 devices offer the user the option to either specify
time ranges during the day called Peak Time in which the synchronization only occurs at
specified intervals or synchronize continuously at all times. During Off Peak Time, however,
the mobile device is synchronized by up-to-date notifications every time a message arrives.
Support for up-to-date notifications requires the use of Windows Mobile 2003 devices such
as Pocket PC Phone Edition devices or Smartphones.
Exchange cached mode, a feature of Outlook 2003, is supported under both Exchange 2000
and Exchange 2003, but several performance improvements have been implemented
specifically to enhance the performance of Outlook 2003 clients when used in conjunction
with Exchange 2003.
Exchange cached mode is considered a key requirement toward the Exchange Server
consolidation effort. Exchange cached mode will prevent regionally located users from
suffering from the effects of system latency when working with Outlook over WAN links
connected to remote mailbox servers.
This significant level of data compression between client and server helped Microsoft IT
mitigate the effect of additional WAN usage generated when local servers were consolidated
onto regional servers. What was formerly all SMTP network traffic locally has now become all
Messaging Application Programming Interface (MAPI) Remote Procedure Call (RPC)
network traffic across the WAN, but the quantity of that traffic was significantly reduced when
compared to traffic generated by previous versions of Exchange and Outlook.
Note: The feature named RPC over HTTP actually uses Secure Hypertext Transfer Protocol
(S-HTTP) over an SSL connection.
Users who use notebooks as their primary Outlook computer will find this feature to be
especially useful. Users who travel to customer sites and often end up waiting for the
opportunity to make presentations can use RPC over HTTP to keep in touch with their
corporate Exchange server without the need for a VPN connection. RPC over HTTP enables
a user to make a connection through firewalls at customer sites (which typically block VPN
connections) to the corporate Exchange Server, thereby improving their accessibility and
productivity.
Unlike OWA, the contents of locally stored personal folder files are available in Outlook on a
remote connection in exactly the same way they would be while connected to the corporate
network in the office.
Microsoft IT is optimistic that the use of RPC over HTTP will reduce the number of VPN
servers required to meet the needs of the company. Most employees use VPN to connect to
the corporate network primarily to use Outlook. To quantify the level of VPN usage, Microsoft
IT is analyzing the matter to better understand employee needs in an effort to reduce the
number of VPN servers deployed without reducing needed connectivity services.
Topology
Microsoft IT used the topology from the Exchange 2000 on Windows 2000 Server as its basis
for designing the topology in the Exchange 2003 deployment. Active Directory was a key
element in the organizational structure and administrative requirements for Exchange 2000.
Microsoft IT was able to use the existing Active Directory structure for the Exchange 2003
deployment.
Microsoft IT was already deeply involved in the deployment of Windows Server 2003 in its
worldwide network infrastructure when the initial deployments of Exchange 2003 began. This
development was critical, for while Exchange 2003 can run on Windows 2000 Server,
Exchange 2000 cannot run on Windows Server 2003. Running Exchange 2003 on Windows
Server 2003 presents many additional benefits to Exchange, which are discussed in detail
later in this paper. Those benefits enabled Microsoft IT to begin implementing plans for
consolidating the number of servers in the messaging infrastructure worldwide, which drove
the design for the Exchange 2003 topology.
For more information about Microsoft IT’s Exchange Server 2003 topology, see the iT
Showcase technical white paper titled, “Exchange 2003 Site Consolidation” at
http://www.microsoft.com/technet/itshowcase.
The mobility enhancements in Exchange 2003 enabled Microsoft IT to modify the design of
its mobile messaging infrastructure with additional server consolidations and improved
security. The mobility infrastructure in Microsoft IT includes such services as OWA, OMA,
EAS, RPC over HTTP, and up-to-date notifications.
Microsoft IT reduced its server population from seven OWA servers and seven MIS servers
(one set for each domain in the Microsoft corporate network) to seven Exchange front-end
sites hosting OWA, OMA, EAS, and RPC over HTTP services. Each Exchange front-end site
worldwide hosts a pair of non-clustered, network load balanced Exchange front-end servers.
While Microsoft IT theoretically could have consolidated to a single set of Exchange front-end
servers, the project team decided to retain the larger number due to the network latency that
is caused by the great geographic distances between Exchange front-end servers and
regional Exchange mailbox servers. If Microsoft IT had consolidated to a single set, user
performance would have suffered. Network latency would have been particularly evident
among those users with slow Internet connections or mobile devices.
1. Is the user logging on from a public kiosk/shared computer or from a private home
computer?
2. Does the user want to use basic or premium OWA user interface (UI) feature sets? (The
answer typically depends on whether the connection is a fast or a slow data link.)
All of the UI elements displayed in the OWA logon page are customizable, enabling the
inclusion of company logos, specific URLs to regional front-end servers, custom usage
instruction text, and more. Microsoft IT created its customized OWA page using these
features.
Once the form has been filled out and the user clicks Log On, the data is encapsulated and
sent by means of an SSL connection to the front-end server specified by the user when they
navigated to the specific server to bring up the authentication form. Once the logon
credentials have been sent over the Web, a special time-out cookie is created on the local
client computer. Depending upon whether the user indicated the client is a public or private
computer, the time-out cookie starts counting up to a threshold of inactivity. Once that
threshold is met with no activity having taken place for that duration, the session connection
is automatically closed, and requires reauthentication if the user wants to regain access to
the Exchange mailbox. Microsoft IT configured the time-out cookie to close out inactive
sessions on public or shared computers after 15 minutes, whereas inactive sessions on a
user’s private home computer were configured to last for two hours of inactivity before
closing. The session time-out periods are enterprise customizable to meet any security
requirements.
Processors
Processor technology continues to advance, improving performance in processing speeds,
increasing the number and enlarging the size of on-board caches, and increasing the number
of tasks that can be processed in parallel. Most of the servers of the Exchange 2000
infrastructure were based on Intel Pentium II and Pentium III processors running in the 500 to
700 MHz range, with a 100 or 133 MHz front-side bus (FSB).
Through a combination of Exchange Server 2003, Windows Server 2003, third party SAN
technology, and faster servers, Microsoft IT decided to create a clustered server design that
offers greater operational reliability and a reduction in administrative overhead. Their design
choice allowed them to achieve the following specific benefits:
• Reduced service outages by having active node mailbox servers automatically failover to
passive node servers.
• Clustered Exchange Virtual Server (EVS) failover performance of just two minutes was
achieved, regardless of the amount of the mailbox data contained within the SAN
attached to the failed node.
• Increased the number of EVSs as well as the number of supported SGs per EVS within
the cluster. Each SG was configured to use three LUNs. Volume Mount Points were
used with these LUNs to minimize the number of drive letters used.
• Enabled server consolidation by hosting many more mailboxes per server.
• Reduction in administration and maintenance overhead by consolidating more than 113
mailbox servers in 75 locations into 38 servers in seven locations.
Microsoft IT implemented two separate types of passive nodes: primary passive nodes and
alternative passive nodes. A primary passive node is a server using equivalently equipped
hardware to the active node servers. This allows for full functionality upon an active node
failover. The alternative passive node is a server equipped with lower-scaled hardware that is
used primarily for tasks such as streaming backup data from disk to tape. It also serves as a
reduced performance failover server. Both types of passive nodes are leveraged for rolling
software upgrades.
Microsoft IT’s multi-node cluster design employs both primary and alternate passive nodes.
Unlike primary passive nodes, alternative passive nodes are smaller servers primarily
designed to carry out disk-to-tape backup tasks. Microsoft IT uses all of the passive nodes in
the cluster when rolling upgrades of the operating system and/or Exchange are required.
Instead of failing an active node to the primary passive node, upgrading the offline active
node, then restoring the upgraded node to active status again and rolling through this cycle
for every active node in the cluster, Microsoft IT’s deployment of alternative passive nodes in
conjunction with primary passive nodes speeds up the process. Microsoft IT first patches all
the offline passive nodes, then fails over the number of active nodes equivalent to the
number of available passive nodes. These offline nodes are then upgraded in parallel and
restored to service when ready. This process is repeated once to upgrade the one remaining
active node server.
• Regional Design. The server specification for the regional cluster implementation
consists of one SAN enclosure per cluster, with three active nodes, one primary passive
node, and one alternate passive node (designated as AAAPp).
• Headquarters Design. The headquarters clustered implementation is similar in design.
It consists of two SAN enclosures, four active nodes, one primary passive node, and two
alternate passive nodes (designated as AAAAPpp).
• Level B Test Forest Design. The Level B Test server specification is similar to the
regional cluster in design but with greater mailbox capacity. It consists of one SAN
enclosure, one active node, and one primary passive node (designated as AP).
To get the best performance at the best price point, Microsoft IT standardized on the four-
processor, 1.9 GHz Intel Xeon Processor MP server for its active and primary passive cluster
nodes for both regional and headquarters data center deployments. For alternative passive
cluster nodes, Microsoft IT uses two-processor 2.4 GHz Intel Xeon Processor MP servers.
Because of this new processing platform, Microsoft IT has seen substantial performance
improvements in its Exchange 2003 infrastructure.
Microsoft IT’s cluster design supports a significant increase in both the number and size of
mailboxes per Exchange server. It helps eliminate performance impact to users during the
second stage backup process because it offloads that stage of the backup process to non-
active servers within the cluster, thereby maintaining the SLA.
In Microsoft IT’s design for meeting this demand, each SAN enclosure selected by Microsoft
IT can support up to 12,000 I/Os per second, affording a margin of headroom for unusual
spikes in activity but expected to perform adequately in normal peak periods of I/O activity.
Any significant load beyond this would likely result in disk read and write latencies, which
would adversely affect the performance of all the mailboxes attached to that SAN. Microsoft
IT system architects deemed this an acceptable risk, given anticipated conditions, the cost of
additional hardware, and monitoring and alerting improvements in Microsoft Operations
Manager.
To determine the messaging storage requirements for any enterprise, one must measure
average peak time I/O per mailbox user per second, the maximum size of mailboxes, the
length of time items are retained in deleted item retention, and the typical usage e-mail
patterns turnover rate in an organization. These are the factors Microsoft IT considered when
designing their Exchange 2003 SAN solution.
Fluff factor is what Microsoft IT refers to as the average capacity allocation to support a given
mailbox on disk based on deleted item retention, database overhead, non-limited mailboxes
etc. For example, creating 100 MB mailboxes for users on Exchange 2000 actually required
them to reserve 140 MB of space per user. The value of 1.4 was trended over the years on
production Exchange servers supporting 100 MB mailboxes and was maintained as a basis
for designing the new solution with support for 200 MB mailboxes.
Microsoft IT’s 100 MB mailbox size limit was a hard and fast disk quota limitation set and
enforced at the Exchange level by means of policy, but if the user consumed the entire 100
MB of available space, it was often because they had exceeded the amount on the back end.
This usually happened when a user deleted e-mail from a mailbox. The e-mail was actually
not immediately deleted from the mailbox database on the server. Rather, it was temporarily
retained in the database, held in a space known as deleted item retention. Only after three
days was the deleted e-mail actually purged from a mailbox database. Microsoft IT needed to
account for that level of usage overhead when planning its storage needs for Exchange
2003.
Additionally, Microsoft IT sized each data LUN to support six and a half databases even
though they would only support five in production. This allowed them to duplicate a single
corrupted database on the same LUN and then run an integrity check on it. This ability to use
the same LUN enabled Microsoft IT to provide the fastest possible response to database
corruption.
The deployment of Exchange 2003 gave Microsoft IT the opportunity to assess how SAN
technology had matured since it had last been studied. Microsoft IT embarked on a project to
qualify and test technology and products from SAN vendors. Microsoft IT required that any
new SAN technology standard implemented at Microsoft needed to be easily supported in
remote locations. Microsoft IT required that a storage solution be easy to deploy, modular in
design, and remotely manageable.
Within each HP StorageWorks Enterprise Virtual Array 5000 (eva5000) SAN used by
Microsoft IT are 168 disks. Each SAN enclosure supports approximately 8,000 200 MB
mailboxes. Each SAN enclosure has the ability to process about 12,000 I/Os per second
before disk latency becomes evident. Each mailbox server in the headquarters data center
will support 4,000 mailboxes and is expected to process a peak-time load of between 5,000
and 6,000 I/Os. As a result, one SAN enclosure supports two mailbox servers in the
headquarters data center. Regional mailbox servers will support just under 2,700 mailboxes,
so the resultant peak-time load of three regional servers is supported by one SAN enclosure.
The use of volume mount points allowed Microsoft IT to configure an optimized disk layout
using four drive letters to maintain nine physical LUNs. This design allowed for the creation of
four Exchange instances that mapped across thirty-six physical LUNs utilizing only sixteen
drive letters.
Subsequent LUNs were maintained to support online backup-to-disk with a single disk
allocated per SG per node. The disk assigned for SG1 per node support three additional
volume mount point LUNs as backup targets for SG2, SG3, and SG4. The backup resources
were configured across sixteen physical LUNs addressable by four drive letters.
Node 1 Backup
Node 1
SMTP
M:\Exsrvr03
50 GB (VMP)
In all, a total of 53 physical LUNs are addressable using 21 drive letters within the clustered
design. This allows for easy disk subsystem optimization with LUNs distributed across
controllers and Fibre Channel Adapters (FCAs) to ensure peak disk transfer requirements are
met as required within the Microsoft production environment.
Microsoft IT uses HP StorageWorks Secure Path for Windows to provide many benefits
within its SAN infrastructure. Secure Path provides three key benefits:
1. Eliminates the risk of a single point of failure supporting the server and SAN
interconnect.
2. Allows for LUN distribution to maintain optimized I/O required on a busy Exchange host,
reducing peak read/write disk latency and substantially improving online backup
throughput to disk.
3. Insures only single LUN presentation independent of the number of paths to the host.
Microsoft IT’s implementation of Secure Path uses two FCAs per host, two fibre channel data
switches, and two storage controllers. Each FCA, switch, and controller group makes up what
is known as a fabric. Secure Path allows the use of two separate fabrics per SAN, and each
Secure Path also assists with eliminating many single points of failure between the nodes
and the connected SAN storage. Microsoft IT can maintain service in the event of a
component failure affecting a single FCA per host, multiple fiber cables, fiber channel
switches, or a single storage controller that makes up the SAN fabric. The component
failure is detected by Secure Path, which ensures that I/O is maintained by moving LUNs
from the failed path to an available path. This process, called failover, requires no resource
downtime while maintaining LUN availability. Failed-over LUNs can be failed-back using HP's
Secure Path Manager to restore optimized I/O, once failed components have been replaced.
The headquarters data center cluster implementation using Secure Path to connect to a
16,000 mailbox SAN is shown in Figure 2.
Recovering a mailbox store affected by corruption in Exchange 2000 meant that 1,000
mailboxes were out of service for six or more hours during the restore operation. This
represented a cost in lost productivity of $60-$80 per hour per user. Single mailbox restore
operations required dedicated restore servers. This configuration is shown in Figure 3.
Regional
Configuration
100 MB LAN
Microsoft IT leveraged the fact that resources within a cluster resource group can move
within that resource group independent of other resource groups. For example, an active
In the first stage, backup runs on all active nodes within the cluster to complete an online,
disk-to-disk backup from the LUNs in the production data resource groups to the LUNs in the
backup resource group over a direct attached fibre channel. The backup resource group has
the capacity to support two-day online retention. Once that process has completed, the
control of the LUNs in the backup resource group is transferred to an alternative passive
node. At this point, passive node initiates the second stage, disk-to-tape backup from the
backup resource group to the tape library over a direct attached fibre channel. This process
frees up the active nodes from the time consuming disk-to-tape data transfer, thereby
minimizing the amount of time required of the active nodes for processing data backup
operations. This process is shown in Figure 4.
Microsoft IT elected to use this two-stage process rather than using a single stage, disk-to-
tape backup over a direct fibre attachment to a tape library. While the single-stage process
would eliminate the need for backup LUNs in the SAN, which would free up additional
storage capacity in the SAN for more mailboxes, Microsoft IT realized that it could not take
the risk of losing valuable production time in the event that the node in the cluster might
become disconnected from the tape library. If that happened, the node server would be
required to reboot to reattach the server to the library. If the active node were the server
performing this work, Microsoft IT would be required to failover the node so it could reboot
and reconnect to the library. Microsoft IT considered that an unacceptable risk to system
availability. Instead, by placing the burden of backing up to tape on a passive node that does
not support users, no loss of production service occurs when the passive node needs to be
rebooted to restore the server-to-library connection.
Per-database online backups are scheduled at regular intervals that let Microsoft IT back up
each entire server between 8:00 P.M. and 1:30 A.M. The databases are backed up
concurrently per SG. An important feature here is that Exchange 2003 allows parallel backup
and restore operations on a per-SG basis. Therefore, backup operations for each database
can be interleaved.
For example, if a single database was lost, up to 200 people could have been affected.
Because up to two days of backup data was available on disk and could be restored online in
less than an hour (restore rates of up to 2 GB per minute were achieved), regular Exchange
restore procedures were used to get user mailboxes quickly back online with their data.
Note: Each Exchange database consists of two files: the Exchange Database (EDB) file and
the Streaming Media (STM) file.
With Exchange 2000, if an entire SG was lost, the time of day of the failure was often the
deciding factor on how to proceed. If the failure was during the business day, restoration of
service usually took precedence over restoration of data, which could be restored later. In
that scenario, the damaged databases are deleted and recreated (a process known as
“stubbing a database”).
If the failure occurred in late, non-business hours, Microsoft IT chose to sacrifice the
immediate return of service in favor of a faster restoration of all lost data. In that situation,
they elected to perform the restoration without stubbing the affected databases.
1. Is problem
Move mailboxes and
known and No
investigate further
resolvable?
Yes
2. Is a
database
Fix problem No Complete
restore
required?
Yes
4. Can database
restore be
No Complete
completed by 8 AM
next business day?
Yes
Start database
Complete
restore
Microsoft IT creates a temporary RSG and restores the corrupted databases from the backup
source into it. Once the restore from backup is complete, the data generated between the
time of the backup and the point of failure is restored through replaying the transaction logs.
This process immensely speeds up the recovery of both the users’ messaging service and
their data from corrupted databases. When the replaying of transaction logs is complete, the
restored database is swapped places between the RSG and new stubbed SG database.
Then any new data generated between the time e-mail service was restored and the data
recovery was completed is exported from the stubbed database and imported into the
restored database using the Microsoft Exchange Mailbox Merge Wizard (also called
ExMerge). The RSG is then deleted afterward. Because database restore speed is restricted
to LAN-based tape, this method is also used for the legacy non-clustered servers that are
currently in the process of being consolidated. In a large storage failure, a large amount of
data must be recovered, and many mailboxes might wait for an extended period before the
data is restored.
For more information about Microsoft IT’s Exchange Server 2003 Backup and Restore, see
the iT Showcase technical case study titled, “Messaging Backup and Restore at Microsoft” at
http://www.microsoft.com/technet/itshowcase.
VSS offers the ability to clone disk data, creating an image of that data at a single point in
time. Microsoft IT’s goal is to end its reliance on the current two-stage online backup process
and instead use VSS to clone its servers at midnight and then use VSS differential snapshots
again at 12 noon and at 6 P.M. to a new set of clone LUNs. In an incident, the scope of data
loss incurred and time of day the incident occurs would determine whether Microsoft IT would
use the last known good VSS clone or snapshot to restore data. For example, if after 2 P.M.,
a database goes offline due to corruption, the easiest and fastest method of restoring the
data and service of that database would be to restore that data from the noon snapshot. If the
corruption is detected late in the evening hours, due to the reduced traffic load, restoring from
the last clone would be the preferred method. The recovery of large amounts of data using
VSS should occur in a matter minutes as opposed to hours as it does today.
VSS as a backup solution has many third-party dependencies to make it work efficiently. A
requestor, a provider, and a writer are needed. Microsoft IT is testing the operational benefits
of VSS as a possible solution for “snap and clone” integration. As of this writing, VSS is not
used for production backups in Microsoft IT and is still in a testing phase.
Just before Microsoft IT migrated to Exchange 2003, Microsoft IT decided to migrate from
Prospector to MOM 2000 with the MOM Exchange Management Pack for the management of
its Exchange servers. MOM is an enterprise systems management application that uses a
client agent to collect predefined events into a central database from event logs on monitored
servers. It also creates, in response to the predefined events, alerts that are routed to central
consoles monitored by the Data Center Operations staff.
In addition to many other capabilities, MOM provides specific instrumentation for Exchange
Server. The key Exchange 2003 management data monitored include server state,
performance metrics, and messaging queue status. MOM also provides customizable
“Knowledge Scripts” (KS) that enable system managers to create specific management
objects for the operating system or applications. Microsoft uses the MOM KS functionality
extensively for managing the Exchange 2003 environment. Table 5 provides an overview of
some of the key MOM Knowledge Scripts that Microsoft uses for Exchange 2003.
Table 5. Key MOM Knowledge Scripts for the Microsoft Exchange 2003
Deployment
Backup Monitor This script looks at backup operations and databases to verify that
regular backup operations are occurring. The script enumerates SGs,
verifies log files and database headers to ensure they have been
backed up.
Disk Space Monitor This script verifies that there is sufficient disk space available for
transaction log, database, and backup volumes. The script verifies that
at least 20 percent free space is available.
Event Log Monitor This script checks for critical Exchange 2003 event log errors. It also
looks for databases that have been dismounted.
Availability Monitor This script verifies that Exchange services are available to users by
performing test logins on each information store.
Active Directory Monitor This script looks at the Exchange 2003 server to discover problems
with access to the AD. Global Catalog and DS_Access errors are of
key concern to this KS.
MOM uses a store-and-forward technique to collect events so that events are reliably
delivered, even if temporary network outages occur during normal operation of the servers.
MOM uses an organizational structure, called the configuration group, to manage monitored
servers. A configuration group typically consists of one database, one or more DCAM (Data
Access Server + Consolidator and Agent Manager) servers, and one or more agents that run
on all monitored computers.
Once the system was up and running, especially after the MOM Exchange Management
Pack was applied and finely tuned to the Microsoft IT group’s needs, there was very little
network traffic overhead in using MOM to monitor servers over the WAN. Because of this
efficiency, early plans for using five MOM configuration groups to better manage the MOM
traffic over the WAN were deemed unnecessary and were scrapped. The process was so
efficient that Microsoft IT needed only one MOM configuration group to monitor all Exchange
servers worldwide, holding the cost for the single MOM configuration group server
deployment to $50,000.
In tuning the MOM Exchange Management Pack, Microsoft IT chose not to modify the default
management pack but instead to create a custom Microsoft IT management pack that
maintained the new and modified rules. This included collecting data not specified by default,
changing default data collection parameters and thresholds, and so forth. Microsoft IT still
uses its custom management pack to manage specific backup events that are unique to its
processing environment. Microsoft IT brought all this tuning and consolidation feedback to
the product development group for inclusion in the released product.
For more information about MOM, see the iT Showcase Technical Solutions Brief titled
“Monitoring Messaging at Microsoft” and the iT Showcase technical white paper titled,
“Monitoring Enterprise Servers at Microsoft” at http://www.microsoft.com/technet/itshowcase.
Application Management
Once MOM detects an alert from a remote server, Microsoft IT can access that server to
further investigate and diagnose the problem by using the remote administration tools built
into Windows Server 2003.
Server Management
Microsoft IT uses MOM to create long-term trending data about server performance.
However, the most aggressive trending cycle MOM can manage is recording a data
checkpoint every five minutes or so. For more real-time performance monitoring, Microsoft IT
uses Performance Monitor (PerfMon), a tool provided in Windows Server 2003.
HP Insight Manager
HP Insight Manager was the first server element manager available for PC servers. It was
released in 1992. Since that time, Insight Manager has established itself as the leading
management application for server platforms. Microsoft IT uses Insight Manager extensively
to monitor HP hardware-specific information. While Insight Manager has no specific
Exchange management data, system managers use this tool to correlate events from other
management applications with hardware-specific conditions on Microsoft IT’s Exchange 2003
servers. HP Insight Manager also integrates closely with MOM to provide a unified
management platform for system managers. Table 6 shows a few key objects for which
Insight Manager provides management data.
Disk Subsystem Insight Manager provides extensive disk monitoring and diagnostic information
that can be correlated with application events such as I/O errors.
Version Control Insight Manager’s version control feature provides detail version information on
firmware, software, and drivers useful for configuration management purposes
Utilization Insight Manager provides hardware-based statistics on processor and I/O bus
utilization.
Storage Management
Events that occur on the SAN enclosure are not recorded into a server’s Event Log, which is
where MOM picks up many of the alerts. Instead, SAN enclosure events are stored in the HP
Storage Manage Appliance (SMA). Microsoft IT also configured MOM to monitor events on
the SMA to keep on top of SAN enclosure events. In headquarters, one SMA was installed
per pair of SAN enclosures. In the regions, one SMA was installed per SAN enclosure. Using
SMAs with MOM ensures that Microsoft IT’s SAN enclosures are as effectively monitored as
its Exchange servers.
Exchange 2003 can run on either Windows 2000 Server or Windows Server 2003 computers
and it is supported in all Active Directory environments, including Windows 2000 mixed,
Windows 2000 native, and Windows 2003 domain and forest functional levels. When running
in an environment with Windows 2000 domain controllers and global catalog servers, the
domain controllers and global catalog servers that Exchange 2003 uses must all be running
Windows 2000 SP3 or later. This requirement affects both Exchange 2003 servers and the
Exchange 2003 version of Active Directory Connector (ADC). ADC does not work with
domain controllers or with global catalog servers that are running a version of Windows 2000
Server earlier than SP3.
Mailbox Moves
The new Exchange cached mode feature of Outlook 2003 made the mailbox move process
during consolidation easier to manage. From a client perspective, Exchange cached mode
mitigated any significant performance impact that might have occurred as a result of
migrating from the use of many small Exchange servers toward fewer, larger Exchange
servers.
Microsoft IT took a performance baseline both before and after moving mailboxes from local
to regional services during its mailbox server consolidation effort. Microsoft IT did this to
ensure that post-migration client performance was equal to or better than pre-migration
performance.
This performance data also served a public relations role with customers. Many people are
hesitant to change, and once a change takes place, they often feel as if the change has had
an adverse affect upon client performance. By taking baseline performance data both before
and after the moves, Microsoft IT not only demonstrated that it was concerned about
maintaining good service, but it also showed empirical measurements that proved there had
been no performance degradation.
By associating OABs with regional servers, Microsoft IT was able to eliminate the repetitive
full downloads of the OAB on client computers. Additionally, Exchange 2003 filtered
certificate data from the OAB to reduce its size from 100 MB (300 MB uncompressed) to
approximately 43 MB compressed (about 150 MB uncompressed). The differential OAB
updates, used to update the OAB once a full download is complete, were also reduced to
about 50 percent of their original size.
Server Optimizations
Microsoft IT’s server configurations came equipped with four GB of RAM. These servers ran
Windows Server 2003, Enterprise Edition and Exchange Server 2003 with the following
modifications:
Use the Front-End Server Closest to the Mailbox Server for Best
Performance
Microsoft IT learned that, when using any of the mobile features over the Internet, the best
performance was achieved when users selected the Exchange front-end servers physically
located closest to their mailbox servers, not the front-end servers closest to their present
locations. For example, when an employee from Australia traveled to the United States, her
online OWA experience was optimized when using the front-end server closest to her
mailbox server. After discovering this, Microsoft IT modified the OWA logon Web page to
include links to all available front-end servers and included instructions on which one to use.
After extensive testing of various processing platforms, Microsoft IT concluded that it could
gain substantially greater overall system performance by addressing memory bus limitations
rather than processor limitations. Microsoft IT tested a beta version of a four-processor Xeon
Processor MP 1.6 GHz Hyper-Threading-enabled server running on a 400 MHz FSB. The
performance test on this system confirmed Microsoft IT’s assumptions that the processor
utilization never peaked beyond 40 percent. Based on these tests, and to optimize server
performance, Microsoft IT planned the Exchange 2003 server migration around Xeon
processor systems that employ the new, faster FSB technology.
Mount Points were used to eliminate drive letter limitations for supporting the log, SMTP and
backup drives. Volume mount points were introduced with Windows 2000. However,
Windows 2000 did not support volume mount points on NTFS volumes within a cluster.
Windows Server 2003 introduced that feature. Because the lack of available drive letters was
no longer a problem, using Exchange 2003 running on a Windows Server 2003 cluster
Note: Windows clusters organize resources into functional units, called resource groups,
which are assigned to individual nodes. If a node fails, the Cluster service transfers the
groups that were being hosted by the node to other nodes in the cluster. This transfer
process is called failover.
Additional production validation was done for the Level B Test forest project where Microsoft
IT completed performance analysis on a 5,000 200 MB mailbox cluster, which was trended
towards 1.0 to 1.2 disk transfers per second per mailbox at peak periods. The increase in
disk transfers resulted in poor performance in the form of unacceptable read and write disk
latencies when the server was scaled to support over 2,500 mailboxes. The default SCSI
miniport FCA driver parameter for queue depth was identified as the bottleneck and was
adjusted from a default of 32 to 128. The parameter change allowed Microsoft IT reach the
5,000-mailbox target with better than expected levels of read/write latency, enabling Microsoft
IT to move forward with the decision to make 200 MB mailboxes a standard on all new server
designs.
Microsoft IT’s experience with Gigabit Ethernet showed a gradual trend of network adapter
performance degradation. The administration effort required to manage and resolve the
degradation was quite time and resource consuming. Once the use of clusters with fiber-
attached libraries eliminated Microsoft IT’s dependency upon extremely fast network
throughput, Microsoft IT simplified the server maintenance effort by replacing the Gigabit
Ethernet network adapters with 100 Mbps Ethernet network adapters. These adapters
provide more than enough network performance capacity to meet the Exchange server
requirements (normal network utilization typically peaks at around 20 percent of capacity)
since the network itself was no longer a bottleneck for backup throughput. Moreover, the
100 Mbps Ethernet adapters required much less maintenance overhead.
Client-Side Monitoring
The use of Outlook 2003 and Exchange 2003 together enables the gathering of valuable
client-side performance monitoring data. Outlook 2003 collects client messaging performance
data, including messaging system successes, failures, and latencies, and reports it to the
Exchange 2003 mailbox server. The Exchange 2003 server pools the client performance
information for its mailboxes and makes that data available to the Performance Monitor tool
as well as storing it in the server’s event logs. Using MOM from the Exchange 2003
Management Pack, Microsoft IT accesses that information from the server event logs to
provide reports and, if necessary, generate alerts when problems arise. Microsoft IT uses the
data gathered by MOM to investigate client-side outages and report performance metrics on
client performance and availability. While MOM reports are based on consolidated client
data, Microsoft IT also uses WMI scripts to get more detail about the messaging client
performance of smaller groups such as those in remote offices on the WAN that have been
consolidated from a local server onto a regional server.
Custom Rules
With the default MOM Management Packs, the level of granularity for a threshold breech of
any particular monitored event did not correspond to all the various server configurations
used by Microsoft IT. For example, a small configuration regional server in Bombay, India,
supporting 100-200 mailboxes may not trigger an alert for a problem indicator configured for
the threshold of a data center configuration server in headquarters supporting over 4,000
mailboxes. When creating a custom rule, Microsoft IT disables the rule in the default
Exchange Management Pack, copies those rules into its own custom management pack, and
creates multiple child processing rule groups. Those rule groups define differing threshold
levels to meet the specific needs of each server configuration in the Microsoft IT messaging
infrastructure. This practice preserves the original rules for easier upgrade.
Microsoft IT runs two concurrent backup jobs per active Exchange instance, providing an
aggregate data throughput rate of approximately 2.4 GB per minute per server, with two to
three servers per SAN enclosure (depending on headquarters data center or regional
design). Microsoft IT has monitored maximum throughput without excessive read and write
disk latencies at approximately 6.3 GB per minute per SAN enclosure. Throughputs are
dependent on LUN distribution across controllers with Data, Log, and Backup LUNs per SG
assigned per controller.
Note: RAID-1 targets would provide better throughputs, and Microsoft IT is currently
considering them as an option for use with the 146 GB disks for the first stage backup (disk-
to-disk).
Backup Synchronization
Now that clusters are in place, before Microsoft IT activates the daily backup scripts (at
8 P.M. local time), it verifies that each virtual instance of Exchange is active on its predefined
node. If any nodes have been moved, Microsoft IT must either move it back to the proper
node or configure the automated scripts that run the backup process to run on the passive
node; otherwise, the scheduled backup processes, set for each physical active node server,
will fail for the server that was moved.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Microsoft grants you the right to
reproduce this White Paper, in whole or in part, specifically and solely for the purpose of personal education.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights
covering subject matter in this document. Except as expressly provided in any written license agreement from
Microsoft, the furnishing of this document does not give you any license to these patents, trademarks,
copyrights, or other intellectual property.
Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses,
logos, people, places and events depicted herein are fictitious, and no association with any real company,
organization, product, domain name, email address, logo, person, place or event is intended or should be
inferred.