0% found this document useful (0 votes)
155 views

Exchange Server 2003 Design and Architecture at Microsoft: Technical White Paper

The messaging infrastructure at Microsoft was quite varied. There were over 100 mailbox servers running in 75 locations worldwide. Exchange Server 2003 is the latest edition of the company's industry-leading enterprise messaging application.

Uploaded by

Dennis Black
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views

Exchange Server 2003 Design and Architecture at Microsoft: Technical White Paper

The messaging infrastructure at Microsoft was quite varied. There were over 100 mailbox servers running in 75 locations worldwide. Exchange Server 2003 is the latest edition of the company's industry-leading enterprise messaging application.

Uploaded by

Dennis Black
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 47

Exchange Server 2003 Design

and Architecture at Microsoft

Technical White Paper


Published: August 2003
CONTENTS
Executive Summary
................................................................................................................................................5

Introduction...........................................................................................................................6
Overview of Current Network Infrastructure 6

Overview of Current Messaging Infrastructure 7

Exchange 2000 Legacy Architecture..................................................................................11


Overview of Exchange 2000 Infrastructure 11

Reasons for Microsoft IT to Upgrade.................................................................................14


Site and Server Consolidation 14

Availability/Reliability/Manageability Enhancements 14

Improved Cluster Support 16

Improved Security 16

Improved Recoverability Technologies to Better Meet SLA Requirements 18

Mobility Features/Enhancements 18

Office 2003 Integration 20

Exchange 2003 Architecture Design Decisions................................................................23


Topology 23

Mobility Design and Configuration 23

Server Design and Configuration 25

Storage Design and Configuration 27

Backup and Recovery 31

Management and Monitoring using Microsoft Operations Manager (MOM) 2000 37

Best Practices and Lessons Learned.................................................................................40


Topology Best Practices 40

Server Configuration Best Practices 41

Storage Design Best Practices 42

Management and Monitoring Best Practices 45

Operational Best Practices 46

Conclusion............................................................................................................................47

For More Information...........................................................................................................48


EXECUTIVE SUMMARY
Situation
The Microsoft IT group recently deployed Microsoft® Exchange Server 2003, the latest
The messaging infrastructure at
Microsoft was quite varied. There edition of the company’s industry-leading enterprise messaging application. Microsoft IT not
were over 100 mailbox servers only serves the company by running the IT utility for its myriad employees and locations, but
running in 75 locations worldwide, also serves as the first and best customer for the various enterprise product development
using a variety of hardware
configurations that were not groups at Microsoft, deploying Microsoft software within the company before it is available to
scalable. outside customers.

Solution The migration from Microsoft Exchange 2000 Server to Microsoft Exchange Server 2003 led
Microsoft IT upgraded its to some significant changes in the messaging architecture at Microsoft. Microsoft IT has
messaging infrastructure worldwide moved toward a fully clustered, mailbox server environment. Each of these server clusters
to use Exchange Server 2003 on
clustered Windows Server 2003 are connected to one or more Storage Area Network (SAN) enclosures for its data storage.
servers attached to Storage Area The use of clustering technology has improved reliability, increased availability, and improved
Network (SAN) systems. the process of performing rolling upgrades.
Benefits The benefits of deploying Exchange 2003, especially when combined with the benefits
• Consolidation. The use of derived from the deployments of both Microsoft Windows Server™ 2003 and Microsoft
Windows Server 2003’s
improved clustering technology Office 2003, have enabled Microsoft to consolidate its messaging infrastructure. Microsoft IT
enabled Microsoft IT to has begun implementing its plans to consolidate 113 mailbox servers in 75 locations
implement a major mailbox
server consolidation. worldwide to just 38 mailbox servers in seven locations. Exchange 2003 also supports all
• Mobility Improvements. mobility messaging services, such as Outlook Web Access (OWA), Outlook Mobile Access
Exchange 2003 integrates (OMA), and Exchange ActiveSync® (EAS), on the same server, enabling Microsoft IT to
Outlook Mobile Access and
Exchange ActiveSync with
additionally consolidate its worldwide front-end server infrastructure.
Outlook Web Access to improve
mobile messaging. The messaging data storage infrastructure has also been updated. Data storage, once a
• Improved SLA Performance. combination of direct attached Small Computer System Interface (SCSI) storage arrays at
The use of SANs enabled remote locations and SAN solutions in the Redmond, Washington headquarters data center
Microsoft IT to increase the
number of mailboxes per server
have been replaced by SANs at all locations. These changes have enabled Microsoft IT to
and enhance Microsoft IT’s increase the number of mailboxes per server and thoroughly enhanced the performance and
ability to backup and restore capability of backup and recovery solutions as well.
mailbox data in a timely manner.

Products & Technologies As of this writing, Microsoft IT has significantly reduced administrative overhead for
Exchange, improved system performance and service availability, and improved its own
• Microsoft® Windows Server®
2003 ability to meet its Service Level Agreement (SLA) obligations. Those benefits should become
• Microsoft Exchange Server 2003 even more dramatic as the company moves closer to its consolidation goal.
• Microsoft Office 2003
• Microsoft Office Outlook® 2003 Note: For security reasons, the sample names of forests, domains, internal resources, and
• Microsoft Operations Manager organizations used in this paper do not represent real resource names used within Microsoft
• Storage Area Networks and are for illustration purposes only.

Exchange 2003 Deployment and Architecture Page 4


INTRODUCTION
Microsoft Exchange Server 2003 represents an important, continuing investment in
enterprise technology for Microsoft. Exchange 2003 offers improvements required by
enterprise messaging and collaboration customers. Many of the largest companies in the
world run their messaging systems on Microsoft Exchange, including Microsoft.

The purpose of this document is to provide an overview of the architecture and design
decisions made during the upgrade of Exchange Server 2003 at Microsoft. The paper
focuses on the hardware selection and configuration aspects of the project. It also includes
discussions on the key technology wins and best practices that emerged from the upgrade.
Since Microsoft IT is a leading edge implementer of Microsoft technologies and products, the
organization brings a unique set of requirements as well as innovative approaches to meeting
the needs of its customers. This paper describes these requirements and approaches, as
well as the way they affected design decisions for the deployment. The intended audience for
this white paper includes technical decision makers, system architects, IT implementers, and
messaging system managers.

Microsoft IT based its mission for migrating from Exchange 2000 to Exchange 2003 on
achieving several objectives:

• To test and improve the product before Microsoft offered it to its customers.
• To consolidate Exchange server sites worldwide to reduce server maintenance and
administration costs and workload.
• To simplify the messaging infrastructure based on standardized server and storage
hardware for all deployment locations.
• To improve the ability of Microsoft IT to meet its SLA obligations for data backup and
restore.
• To significantly improve the end-user experience with messaging services at Microsoft.
Microsoft IT met all these objectives when it deployed Exchange 2003.

Overview of Current Network Infrastructure


With all of the beta-level and test version software used in its production environment, the
Microsoft corporate network is the world’s largest experimental computer network. The
network is a confederation of functional backbones, spanning the globe. Each backbone is
defined on regional boundaries with connectivity focused on the Main corporate campus
located in the Puget Sound Metropolitan Area.

The network is architected following a multi-domain routing model. It is divided into four
regional networks, with each network functioning as a single Open Shortest Path First
(OSPF) routing and addressing domain. The four regions cover the following areas: 1. the
Puget Sound metropolitan area in western Washington State; 2. Europe, Africa, and the
Middle East; 3. Japan, the Pacific Rim, and the South Pacific, and 4. the remainder of North
America and South America.

Each regional network consists of a backbone area (Area 0) and multiple areas to ensure
scalability of each regional network. External Border Gateway Protocol (EBGP) is used to
exchange routes between the regional networks to ensure the scalability of the network as a
whole.

Exchange 2003 Deployment and Architecture Page 5


The Puget Sound Metropolitan Area Network (MAN) supports the bulk of data traffic on the
global enterprise network providing gigabit rate connectivity between buildings and the main
datacenters located in the area. The current campus is comprised of 70 separate buildings
and two datacenters with a network infrastructure providing access to corporate resources,
developer lab networks, and Internet connectivity to any location within the campus.

This network relies on Gigabit Ethernet and Packet over Synchronous Optical Network
(SONET), using privately owned or leased Dark Fiber as the transport medium. In the metro
area, efficient use of limited fiber resources is realized by leveraging Wave Division
Multiplexing (WDM) technologies to provision multiple circuits across a single physical link.

The available network bandwidth is significant for applications like Exchange Server 2003
and site-to-site connectivity. As of June 2003, the network had grown to encompass:

• Three enterprise data centers, nineteen regional data centers worldwide


• 310 sites in approximately 230 cities in 77 countries
• The largest wireless LAN (802.1x EAP-TLS) in the world
• More than 24,000 wireless devices
• More than 4,000 wireless access points
• More than 250 wide area network (WAN) circuits
• More than 200 WAN sites in more than 70 countries
• More than 3,300 IP subnets
• More than 2,000 routers
• More than 2,600 network layer 2 switches
• More than 275 ATM switches
• More than 10,000 world wide servers
• More than 350,000 LAN ports

Overview of Current Messaging Infrastructure


Managing the complex messaging infrastructure at Microsoft is a team effort that involves
many different groups within Microsoft IT. Organizationally, Microsoft IT is comprised of more
than 2,500 staff members that are responsible for operations spanning more than 400 IT
locations worldwide. In addition to providing the IT utility for the company, Microsoft IT plays
a key role in helping Microsoft meet its main business objective of software development and
marketing. As the first and best customer of Microsoft, Microsoft IT serves as an early
adopter of new Microsoft software, such as Windows Server 2003, Microsoft Office 2003, and
Exchange Server 2003. The result of this process is known in the industry as “eating your
own dog food.”

In the “dog food” messaging environment of Microsoft IT, servers regularly receive software
patches, operating system test releases and upgrades, Exchange server test releases and
upgrades, and more. Each Exchange server is “touched” by Microsoft IT for these software
upgrades on an average of two times each month. The changes to software are implemented
to test new scenarios, meet specific requirements, and continually run the latest application
concepts through real world, enterprise-level testing. The rate of change is very high in
Microsoft IT.

Exchange 2003 Deployment and Architecture Page 6


Microsoft employees place a significant load on the messaging infrastructure. The average
employee at Microsoft possesses three computers, typically all of which are used to
synchronize with Exchange. In addition, a significant portion of that population also carries
Pocket PC and Smartphone devices that also synchronize with Exchange. The average
Remote Procedure Call (RPC) operations per second (a measurement of work) at Microsoft
is significantly higher than at any other company known to Microsoft IT. Microsoft often works
with customers and partners to benchmark their messaging infrastructure. The workload
managed by the Exchange servers at Microsoft is typically more than double than the load
measured at these companies.

At the time of this writing, the messaging environment at Microsoft consists of more than 200
servers, including 190 Exchange 2003 servers (113 of which are mailbox servers) in 75
locations worldwide, including servers in additional cross-forest test environments. This
environment supports:

• Global mail flow of 6,000,000 messages per day, with 2,500,000 average Internet e-mail
messages per day, 70 percent of which is filtered out as either unwanted spam e-mail,
virus-infected, or to invalid e-mail addresses. Comparing bytes over the wire, the size
ratio of blocked message content versus accepted message content received at
Microsoft is 40:1. The average size of a typical e-mail message is 44 KB.
• Approximately 85,000 mailboxes, each being increased from a 100 MB to 200 MB limit.
Average 100 MB mailbox was only 44 MB in size.
• More than 85,500 distribution groups.
• More than 230,000 unique public folders managed on public folder servers.
The Microsoft IT server infrastructure includes:

• Corporate standard client configuration comprised of Windows® XP Professional and


Microsoft Office Outlook® 2003.
• Legacy, stand-alone mailbox server configurations of 500, 1,000, or 1,500 mailboxes on
stand-alone servers. Stand-alone servers are being replaced by clustered SAN solutions
worldwide and have been scaled per server to support 2,700 user mailboxes in regional
locations and 4,000 user mailboxes in the headquarters data center.
• One centrally located support organization in headquarters supports all Exchange
servers worldwide.
• In addition to the Main corporate Exchange Active Directory® forest, three additional
forests are used to host Exchange mailbox servers at Microsoft:
• A Level A Test forest dedicated that runs development and test code for Exchange,
operating in a frequently changing server software environment.
• A specialized Level B Test forest, serving as a limited-use production environment used
by one product division that hosts a limited number of user mailboxes. Specialized
hardware configurations and test scenarios can be run in this environment. Level B Test
uses a two-node server cluster connected to a SAN scaled to support 5,000 user
mailboxes.
• A legacy test environment forest that is used for testing Windows server operating
system versions one version back from the currently released version (specifically
Windows 2000 Service Pack-specific testing) with Exchange.

Exchange 2003 Deployment and Architecture Page 7


Note: Microsoft IT uses both Level A Test and Level B Test forests to test cross-forest
behavior and support with the Main Microsoft corporate production forest.

The Microsoft IT service levels include:

• The global service availability Service Level Agreement (SLA) goal in the Main corporate
forest, calculated as the availability of mailbox databases per minute (including both
planned and unplanned outages), was 99.9 percent for stand-alone server designs. This
was increased to 99.99 percent for the new clustered server designs used with
Exchange 2003.
• Worldwide e-mail delivery in less than 90 seconds, 95 percent of the time.
• Backup and restore operation SLA of less than one hour per database.

Note: For security reasons, the sample names of forests, domains, internal resources,
and organizations used in this paper are fictitious. They do not represent real resource
names used within Microsoft and they are in this document for illustration purposes only.

Sites and Locations


Following the lead of the Exchange 2000 deployment, Microsoft IT continued the strategy of
deploying Exchange servers in dedicated roles. Table 1 shows the distribution of
Exchange 2003 servers by server role. Microsoft IT grouped the Exchange 2003 servers into
37 Exchange routing groups that were interconnected with 79 site connectors.

Table 1. Exchange 2003 Server Distribution by Server Role at Microsoft

Server Role Exchange 2000 Exchange 2003 (post-


consolidation goal *)

Mailbox 113 38

Public Folder 20 11

Messaging Hub 12 7 **

Instant Messaging 4 0 ***

Internet Gateway 22 18

Dedicated Free/Busy 6 0 ****

Front-End ***** 14 12

Antivirus 9 7

* The mailbox server consolidation project is slated to be completed as of the end of the calendar
year 2003.
** Microsoft IT will set up seven messaging hubs and four additional dual-purpose servers that
will provide messaging hub services.
*** Exchange Instant Messaging servers will be eliminated as the messaging service is migrated
to Windows Real Time Communications (WinRTC) servers.
**** All of the Free/Busy server services will be provided by existing Public Folder servers.
Microsoft IT will not set up any dedicated Free/Busy servers at Microsoft.
***** Front-End servers were consolidated with the deployment of Exchange 2003 since the
technology formerly included in Mobile Information Server (MIS) 2002 product was added into
Exchange 2003. To increase system availability, each Exchange 2003 front-end server
deployment site was configured with a pair of load-balanced servers.

Exchange 2003 Deployment and Architecture Page 8


Routing Group and Administrative Group Structure
In all Exchange deployments prior to Exchange 2000 (including versions 4.0, 5.0, and 5.5),
Microsoft IT grouped Exchange servers into sites based on the network topology. For
Exchange 5.5, Microsoft IT designed the environment to strike a balance between the need
for large sites and the limitations of network bandwidth within those sites because of directory
and public folder replication and message routing traffic.

Since the release of Exchange 2000 on Windows 2000, the limits and boundaries imposed
by the Exchange 5.5 model were no longer a concern. The ability to place servers in routing
groups independent of their administration group membership allowed Microsoft IT to
optimize the routing topology without losing the advantages of large administrative groups.

Directory replication is now a function of Active Directory and is an operating system-level


issue that is no longer a key concern of the Exchange deployment. Since routing groups and
administrative groups need not be the same (as was the case in Exchange 5.5 and earlier
versions), the Microsoft IT Messaging operations staff is free to place Exchange 2003 servers
into groups that match their administrative and operational structure, and into routing groups
that match the WAN topology. This leaves directory replication concerns to another Microsoft
IT team specifically focused in that area. As of this writing Microsoft IT maintains 31
Exchange Server 2003 routing groups and 11 administration groups.

Exchange 2003 Deployment and Architecture Page 9


EXCHANGE 2000 LEGACY ARCHITECTURE
Microsoft IT began its deployment of Exchange 2003 when the product was still in an early
beta version. To fully grasp the scope of this project, let us review the previous messaging
infrastructure under Exchange 2000, the compelling reasons why Microsoft IT had to upgrade
to Exchange 2003, and what Microsoft IT did to make the upgrade a success. Various
challenges and discoveries made by Microsoft IT during this experience are included to
provide some guidance and considerations as you plan your Exchange 2003 deployment.

Overview of Exchange 2000 Infrastructure


The Microsoft Exchange Server platform is the fastest selling Microsoft server product in
history. Since 1996, when Exchange 4.0 was released, Exchange Server has sold more than
50 million seats. Table 2 provides an overview of the evolution of the internal deployment of
Exchange Server at Microsoft since 1996 when Microsoft first released Exchange Server.

Table 2. The Evolution of Exchange Server Deployment at Microsoft

Exchange Exchange Exchang Exchang Exchange


4.0 5.0 e 5.5 e 2000 2003

Mailboxes/Server 305 305 1,024 3,000 4,000

Mailbox Size/User 50 MB 50 MB 50 MB 100 MB 200 MB

Restore Time/Database ~12 hours ~12 Hours ~8 Hours ~1 Hour ~25 minutes *

Total number of Mailboxes ~32,000 ~40,000 ~50,000 ~71,000 ~85,000

* It takes 25 minutes to restore a database from backup disks.

Legacy Server and Storage Design


Microsoft IT used stand-alone servers in both the headquarters data center and in all regional
deployments. The servers were categorized into four basic mailbox server configurations as
shown in Table 3.

Table 3. Microsoft Microsoft IT Exchange 2000 Server Configurations

Exchange 2000 Server Configuration Mailboxe


s

Small Configuration Regional Mailbox Server 500

Medium Configuration Regional Mailbox Server 1,000

Large Configuration Regional Mailbox Server 1,500

Data Center Configuration Mailbox Server 3,000

The storage design varied depending upon the requirements of each server configuration. All
Exchange 2000 mailbox servers supported 100 MB mailboxes. The regional server
configurations used direct attached SCSI storage disk arrays that were backed up over the
100 Mbps LAN. The data center configuration servers used three SAN arrays, each one
comprising one SG. They were backed up over the Gigabit LAN.

Microsoft IT used best practice guidelines when designing their original Exchange servers
with consideration towards maximizing system performance and availability with both the
server and storage hardware. To optimize the disk input/output (I/O), each volume of an SG

Exchange 2003 Deployment and Architecture Page 10


was designated as a Logical Unit Number (LUN). Since each LUN was assigned a drive
letter, each server, hosting three SGs comprised of three LUNs each, used nine drive letters.

Microsoft IT configured each SG to maintain three separate LUNS. The mailbox data LUN
using 24 18-GB disks and the Log LUN using six 18-GB disks were both configured using a
striped mirror configuration, known as Redundant Array of Independent Disks (RAID)-10. The
SAN also maintained a dedicated backup LUN utilizing 12 36-GB disks in a RAID-5
configuration. This LUN was used to support two days of online, disk-to-disk backup
retention.

Each SG supported five databases, and each database supported 200 mailboxes, meaning
that they could support up to 1,000 mailboxes per SG and 3,000 mailboxes per server.

Performance, Scalability, and Supportability Challenges


Exchange 2000 was a major upgrade from previous versions of Exchange. However, as
powerful as Exchange 2000 was, Microsoft IT still had to work around some limitations.

Number of Servers to Manage Too High


Due to an inability to consolidate servers and sites effectively, the number of sites with
servers drove support costs significantly higher and added complexity into the messaging
environment. Some of the more common cost factors associated with the distributed
environment included:

• More systems to backup


• Additional maintenance of backup systems at larger number of sites
• More personnel added to administer backup processes
• Greater power and cooling resources required at additional sites
• More onsite support staff added for hardware maintenance at multiple sites
From a complexity perspective, the larger number of systems meant more moving parts in a
complex machine; i.e. the more backup jobs required, even with the same success rate,
means a higher number of failures to troubleshoot and resolve. The planned 90 percent
reduction in the number of sites with servers dramatically reduces the number of moving
parts in the messaging machine, thereby reducing the exposure to failure on a number of
fronts.

Recoverability of Databases within Service Level Agreement (SLA) Time Difficult


Even small efforts to consolidate resulted in higher scaling on servers in a number of sites.
As the number of mailboxes on a server continued to increase with scalability improvements
in the product, database sizes grew as well. More significantly, the initiative to increase the
maximum mailbox size from 100 MB mailboxes to 200 MB mailboxes promised an immediate
doubling in the size of databases.

Since Exchange 2000 does not offer support for new recovery options such as Recovery
Storage Group (RSG) functionality or Volume Shadow Copy Service (VSS), a database
outage due to corruption on an Exchange 2000 Server meant that the process of database
restoration would result in an extended outage. In many sites, backups were managed
across multiple computers in a datacenter, which resulted in backups and restores occurring
over the 100 MB LAN, for which restore times averaged, at best, 16 GB per hour. The
original restore SLA was full database restore in one hour, a goal that was quickly becoming
unattainable.

Exchange 2003 Deployment and Architecture Page 11


Cluster Scalability Limitations
Windows 2000 Advanced Server supported two-node clusters and Windows 2000 Datacenter
Server supported four-node clusters. With Exchange 2000 running on Windows 2000
Advanced Server, for an optimized configuration, Microsoft IT needed to have multiple drive
letter volumes associated with each SG. There were also additional drive letters used in the
server configuration, such as the Simple Mail Transfer Protocol (SMTP) drive (a dedicated
inbound/outbound queue device). As a result, each virtual Exchange server within the cluster,
after accounting for the collective SGs and the SMTP drive, used ten extended drive letters.
This does not account for the required, reserved drive letters used by the server node itself,
such as for the floppy disk, operating system volumes, and a CD drive. Microsoft IT could
only use two servers in a cluster before it exhausted the supply of available letters assignable
to disk volumes. The lack of available drive letters prevented Microsoft IT from adding
additional instances of Exchange servers into a clustered environment.

Backup Infrastructure Inflexible


Microsoft IT processed a single-stage backup for regional servers. The regional servers used
the 100 Mbps LAN to perform a direct, disk-to-tape backup. In Redmond, servers performed
a two-stage backup process: first disk-to-disk within the SAN, and then disk-to-tape. To
ensure that the backup process completed during non-business hours, Microsoft IT needed
to deploy Gigabit Ethernet network adapters in each Exchange server to ensure that they
could get the throughput necessary to push the data across the LAN and onto tape.

Data restoration required the creation of a temporary restoration server to serve as a staging
server for retrieving data from tape. Microsoft IT learned that it in addition to the time it took to
restore the data, before that process could start, a tape drive had to read and seek the
starting point of that particular database on a tape. This process often entailed a wait of 90
minutes or more before any data actually transferred to disk. The typical throughput for data
restoration (once data began to flow) on the Microsoft IT 100 Mbps network was
approximately 300-350 MB per minute. With a selective restoration of a sample 15 GB
database, the total time needed to complete the job was often more than two hours – far in
excess of the SLA.

In the end, Microsoft IT based its entire architecture of Exchange 2000 on the technical
requirements for meeting backup and restore efforts within the allotted SLA time window.

Exchange 2003 Deployment and Architecture Page 12


REASONS FOR MICROSOFT IT TO UPGRADE
Microsoft IT had many compelling reasons to upgrade to Exchange 2003. Of course, in its
special role as a group running Microsoft product group dog food software, Microsoft IT was
committed to deploying Exchange 2003. This deployment was an effort to improve the
product with real world, enterprise experience and feedback, long before any customers
would receive the product.

In addition, Exchange 2003 resolved the Exchange 2000 challenges for Microsoft IT as
described earlier. The deployment of Exchange 2003 enabled Microsoft IT to improve service
to its customers and to reduce operations requirements. Microsoft realized the following
business benefits:

• Reduced number of servers


• Improved server availability, reliability, and manageability
• Improved clustering support
• Improved security
• Improved data backup and recovery
• Improved support for mobile users
• Improved integration with Office 2003

Site and Server Consolidation


As of this writing, with the deployment of Exchange 2003 completed, Microsoft IT is in the
process of implementing a long-planned consolidation of regional mailbox servers and
locations. Microsoft IT had 113 mailbox servers in 75 locations around the world. The end
goal of the consolidation plan is to reduce the number of locations by 90 percent, down to
seven worldwide, using 38 clustered Exchange virtual mailbox servers. This level of server
reduction will significantly reduce the administrative workload required of the messaging
infrastructure in the Microsoft IT group.

Normally an increased number of mailboxes per server and a greater amount of data per SG
would present an increased risk in the event of failure. Indeed, Microsoft IT measures
database service availability as a factor of downtime multiplied by the number of databases
affected. For example, a one-minute outage affecting a single SG of five databases on a
server containing three SGs (containing 15 databases) is measured as five minutes of
downtime. In addition, Microsoft IT studied its downtime incidents and learned that its
planned downtime exceeded its unplanned downtime by a factor of 6:1.

Despite the fact that the number of mailboxes per server is growing, and that mailboxes are
doubling in size, the site and server consolidation project is expected to improve Microsoft
IT’s overall availability as well as its backup and restore performance SLAs. It is also
expected to reduce the Microsoft IT server management workload significantly, thereby
reducing costs.

For more information about Microsoft IT’s Exchange Server 2003 site consolidation plan, see
the iT Showcase technical white paper titled, “Exchange 2003 Site Consolidation” at
http://www.microsoft.com/technet/itshowcase.

Availability/Reliability/Manageability Enhancements
Exchange 2003 offers a variety of enhancements that make it a compelling upgrade.

Exchange 2003 Deployment and Architecture Page 13


Virtual Memory Management
The virtual memory improvements to Exchange 2003 reduce memory fragmentation and
increase server availability. Specifically, Exchange is much more efficient in the way it reuses
blocks of virtual memory. These design improvements reduce fragmentation and increase
availability for higher-end servers that have a large number of mailboxes.

Virtual memory management for clustered Exchange servers is also improved. In


Exchange 2003, when an Exchange virtual server is either moved manually or failed over to
another node, the MSExchangeIS service on that node is stopped. Then, when an Exchange
virtual server is moved or failed back to that node, a new MSExchangeIS service is started
and, consequently, a fresh block of virtual memory is allocated to the service.

Exchange System Manager (ESM)


Administrator functionality using ESM has been enhanced in Exchange 2003 with these key
updates:

• Improved method for moving mailboxes. The Exchange Task Wizard now allows you
to select as many mailboxes as you want and then, using the task scheduler, to
schedule the move to occur at some point in the future. You can also use the scheduler
to cancel any unfinished moves at a selected time. Using the wizard’s multi-threading
capabilities, you can move up to four mailboxes simultaneously.
• Improved Public Folder interfaces. To make public folders easier to manage,
Exchange 2003 includes several new public folder interfaces in the form of tabs.
• The Content tab displays the contents of a public folder in Exchange System Manager.
• The Find tab enables searches for public folders within the selected public folder or
public folder hierarchy. A variety of search criteria can be specified, such as the folder
name or age. This tab is available at the top-level hierarchy level as well as the folder
level.
• The Status tab displays the status of a public folder, including information about servers
that have a replica of the folder and the number of items in the folder.
• The Replication tab displays replication information about the folder.
• New Mailbox Recovery Center. Using the new Mailbox Recovery Center, you can
simultaneously perform recovery or export operations on multiple disconnected
mailboxes.
• Enhanced Queue Viewer. The Queue Viewer improves the monitoring of message
queues. Enhancements include:
• The X.400 and STMP queues are displayed in Queue Viewer, rather than from their
respective protocol nodes.
• The Disable Outbound Mail option allows you to disable outbound mail from all SMTP
queues.
• The refresh rate of the queues can be set using the Settings option.
• Messages are searchable based on the sender, recipient, and message state using Find
Messages.
• Queues are clickable for displaying additional information about that queue.
• Previously hidden queues, DSN messages pending submission, Failed message
retry queue, and Messages queued for deferred delivery, have been exposed.

Exchange 2003 Deployment and Architecture Page 14


• Enhanced control of message tracking log files. When using Exchange System
Manager, you have greater control over your message tracking log files. Exchange 2003
automatically creates a shared directory to the message tracking logs and allows you to
change the location of the message tracking logs.
• Improved error reporting. Error reporting allows server administrators to easily report
errors to Microsoft. Although error reporting was included in Exchange 2000 SP2 and
SP3, its implementation is improved in Exchange 2003. For example, if users do not
want to view the standard error reporting dialog box, they can configure Exchange to
send service-related error reports to Microsoft automatically.

Improved Cluster Support


Clustering in Windows Server 2003 provides a number of improvements that allows Microsoft
IT to take full advantage of this technology to provide a solid clustered server standard to
support its global Exchange mailbox server consolidation initiative. The new standard
provides for a better level of scalability and availability over any previous deployment
methodologies used for corporate Exchange deployment at Microsoft.

Support for Up to Eight Nodes


Exchange has added support for up to 8-node active/passive clusters when using Windows
Server 2003 Enterprise Edition or Windows Server 2003 Datacenter Edition. This enabled
Microsoft IT to boost the number of servers in their Exchange Server 2003 clusters, thereby
substantially improving server availability and reliability while reducing the number of
Exchange deployments necessary to manage the Microsoft corporate messaging
environment.

Support for Volume Mount Points


Exchange now supports the use of volume mount points when using Windows Server 2003
Enterprise Edition or Windows Server 2003 Datacenter Edition.

A volume mount point is a feature of the NTFS file system that allows linking of multiple disk
volumes into a single tree, similar to the way the Distributed File System (DFS) of a server
links remote network shares. Administrators can link many disk volumes together with only a
single drive letter pointing to the root volume. The combination of an NTFS junction and a
volume mount point can be used to graft multiple volumes into the namespace of a host
NTFS volume.

Improved Failover Performance


Exchange has improved clustering performance by reducing the amount of time it takes a
server to failover to a new node. Exchange specifically optimized the process of shutting
down services on the running active node, expediting the failover and the startup of services
on an alternative node, thereby improving overall system performance.

Improved Security
When Microsoft prioritized security as its first order of business, Exchange 2003 realized
many benefits:

Kerberos
Exchange 2003 uses Kerberos delegation when sending user credentials between an
Exchange front-end server and Exchange back-end servers. In previous versions of

Exchange 2003 Deployment and Architecture Page 15


Exchange, when users opened applications such as Outlook Web Access (OWA), Exchange
used Basic authentication to send the user’s credentials between an Exchange front-end
server and Exchange back-end servers. As a result, companies had to use a security
mechanism such as IPSec to encrypt the information.

Exchange 2003 also uses Kerberos when authenticating users of Microsoft Office
Outlook 2003.

Forms-Based Authentication in OWA


Exchange 2003 enables a new logon page for OWA that will store the user's name and
password in a cookie instead of in the browser. When a user closes the browser, the cookie
is cleared. Additionally, after a predefined period of inactivity, the cookie is cleared
automatically. The new logon page requires users to enter their domain and network user
names and passwords or their full user principal names (UPN), e-mail addresses, and
passwords to access their e-mail. This feature is also known as cookie authentication.

User Selectable Security Options in OWA


The OWA logon page allows users to select the security options that best fits their needs.
Based on the cookie authentication technology, the Public or shared computer option
(selected by default) provides a short default timeout option of 15 minutes. Alternatively,
OWA users who are using computers in their offices or homes where they are the sole
operators, can select the Private computer option. When selected, the Private computer
option allows a much longer period of inactivity before automatically ending the session. Its
internal default value is 24 hours. To match enterprise security needs, an Exchange 2003
administrator can customize the inactivity timeout values for both option settings.

Blocking Attachments in OWA


Similar to existing functionality found in Microsoft Outlook 2002 and later, the OWA feature of
Exchange 2003 can be configured to block users from accessing certain file type
attachments. This feature is useful in stopping untrustworthy attachments from potentially
compromising corporate security.

Secure/Multipurpose Internet Mail Extensions (S/MIME) Support


in OWA
S/MIME increases the security of Internet e-mail by enabling digital signing of messages as
well as message encryption. Digital signatures provide authentication, non-repudiation, and
data integrity. Message encryption provides confidentiality and data integrity. Within Microsoft
IT’s configuration, when configured to use S/MIME, private keys are stored in a roaming
profile, which is made available when the user logs onto a computer connected to the
corporate network. All S/MIME encryption, decryption, and messaging signing operations are
performed on the local computer using the private key. All public keys, necessary for non-
repudiation and decryption, are stored in the Active Directory. User private keys are never
passed, in any form, between the user's computer and the Exchange server.

Restricted Distribution Lists


In Exchange 2003, you can place restrictions on those who can send e-mail messages to an
individual user or a distribution list. Submissions can be restricted to specific users, groups,
or all authenticated users. Restricting submissions on a distribution list prevents non-trusted
senders, such as unauthorized Internet users, from sending mail to an internal-only
distribution list.

Exchange 2003 Deployment and Architecture Page 16


Improved Security with Clustering
Exchange 2003 clustering, when run on Windows Server 2003, includes the following
security features:

Permission improvements mean the Windows Cluster Service no longer requires Exchange
Full Administrator rights to create, delete, or modify an Exchange virtual server.

• The Kerberos authentication protocol is enabled by default


• Internet Protocol security (IPSec) support for front-end and back-end servers
• Internet Message Access Protocol 4 (IMAP4) and Post Office Protocol 3 (POP3)
services are no longer included by default when creating virtual servers

Improved Recoverability Technologies to Better Meet


SLA Requirements
Backing up and restoring large databases or SGs take a long time even over the fastest
network connections. However, the coupling of Exchange 2003 with Windows Server 2003
offers an alternative solution that takes a small fraction of the time needed by tape media
methodologies for backup and restore.

Volume Shadow Copy Service (VSS) Integration Framework


VSS, a feature of Windows Server 2003, provided Microsoft IT with the ability to perform
online snap and clone functions on the databases. This allowed Microsoft IT to have a mirror
copy of the data in a single point-in-time. VSS enables Microsoft IT to get either a mirror copy
or a snap copy of the production data. Depending upon the type of failure, be it a mailbox
store, an SG, or multiple SGs affected by corrupted data, or a massive spindle failure where
the entire data structure is lost, Microsoft IT can recover upwards of 800 GB of data in
minutes, as opposed to standard restoration methodologies that would take many hours to
recover that amount of data.

Recovery Storage Group (RSG)


The new RSG is a specialized, offline SG that can be created alongside the standard SGs on
the production server in Exchange. RSG provides added flexibility in quickly restoring
mailboxes and databases. With this new feature, a damaged Exchange database can be
quickly restored in an offline mode to a production server in an offline status. Once the
database has been restored to the RSG, the Exchange tool ExMerge can be used to export
the contents from one or more mailboxes back into production. RSG eliminates the need for
dedicated restore servers for single mailbox restore operations, thereby reducing server
downtime.

Mailbox Recovery Center


The new Mailbox Recovery Center makes it easy to perform simultaneous recovery or export
operations on multiple disconnected mailboxes. This is a significant improvement over
Exchange 2000, where such operations had to be performed individually on each
disconnected mailbox. With this new feature, you can quickly restore Exchange mailboxes,
and thereby reduce downtime.

Mobility Features/Enhancements
Significant enhancements were made in Exchange 2003 for the mobile, client-side
experience. All of the mobility features previously found in Mobile Information Server 2002

Exchange 2003 Deployment and Architecture Page 17


(MIS), a separate, adjunct solution to Exchange 2000, were incorporated into Exchange
2003.

Outlook Web Access (OWA)


The new version of OWA in Exchange Server 2003 represents a significant upgrade from
OWA in Exchange 2000. The new version is a full-featured e-mail client, with support for
rules, spelling checker, signed and encrypted e-mail, and many other improvements. A
redesigned interface provides an enhanced user experience similar to that of Outlook 2003,
including a new Reading Pane (previously called the Preview Pane in Outlook) and an
improved navigation pane.

For OWA users connecting by means of either dial-up, low bandwidth wireless networks, or
by using Secure Sockets Layer (SSL), the new use by Exchange 2003 of data compression
technology provides substantial overall performance improvements compared to those
realized from using previous versions of Exchange Server. Additional performance
improvements were attained by the elimination of all ActiveX controls required to use OWA
on client computers connecting to Exchange 2003. When using earlier versions of Exchange
Server, these controls, when not available in the client computer’s Internet Explorer cache,
had to be downloaded each time OWA was run.

Outlook Mobile Access (OMA)


Exchange 2003 now includes the OMA application previously offered in MIS. OMA allows
users with browser-based mobile devices to use mobile devices to access their e-mail,
Contacts, Calendars, Tasks, and to search the global address list. Users can use OMA with a
mobile device that has a mobile browser.

MIS had to be installed in every network domain where these services were needed. Since
Exchange 2003 comes with built-in mobile services, installation on network domains is no
longer necessary.

Furthermore, Exchange 2000 users were limited to using only the MIS servers located in their
home domains. Users from a domain within the Microsoft corporate network in which the MIS
server was off-line could not use the MIS servers from other sub-domains to access these
services.

Exchange 2003 has eliminated the domain boundary limitations for OMA. Any user enabled
for OMA use can use mobile services on any of the front-end servers, regardless of their
network domain. As an added benefit for Microsoft IT, if one region’s Exchange front-end
servers had to be taken offline for service, the user could still access those services from the
remaining servers on the network, thereby all but eliminating downtime for this service.

Exchange ActiveSync (EAS)


The Exchange ActiveSync feature previously offered in MIS server, which enabled users to
securely and remotely synchronize their mobile devices directly with the Exchange server,
has also been incorporated into Exchange 2003 and enabled by default. By synchronizing a
mobile device to an Exchange server, users can access their Exchange information without
having to be constantly connected to a mobile network. In addition, users are no longer
subject to the same EAS domain boundary limitations that affected OMA in MIS.

Exchange 2003 Deployment and Architecture Page 18


Up-To-Date Notifications
Exchange 2003 introduces a new feature within EAS called up-to-date notifications. In the
past, the push notifications featured in MIS used the Short Messaging Service (SMS) of a
wireless carrier for sending text messages consisting of the first 160 characters of a
redirected e-mail. Since SMS used non-encrypted text to transmit its messages, the security
of message content was a major concern. Instead of transmitting the first 160 characters of
the actual message, up-to-date notifications transmits only a binary command to the mobile
device that causes it to start securely synchronizing e-mail over the SSL-protected EAS link.
This way, the binary command never contains any portion of the message body yet the user
still receives the latest e-mail.

To reduce the amount of traffic a device might receive for a user who regularly receives large
quantities of e-mail, Windows Mobile 2003 devices offer the user the option to either specify
time ranges during the day called Peak Time in which the synchronization only occurs at
specified intervals or synchronize continuously at all times. During Off Peak Time, however,
the mobile device is synchronized by up-to-date notifications every time a message arrives.
Support for up-to-date notifications requires the use of Windows Mobile 2003 devices such
as Pocket PC Phone Edition devices or Smartphones.

Office 2003 Integration


Exchange 2003 is more tightly integrated than ever with its primary client application,
Outlook 2003. The combination of the two offers users many enhancements.

Exchange Cached Mode


The use of Exchange cached mode, a feature of Microsoft Office Outlook 2003, enables the
user to work in a messaging environment with a perceived connection between the
Outlook 2003 client and the Exchange Server. Exchange cached mode isolates the client
from most network and server latencies that, in the past, have caused Outlook to appear as if
it had stopped responding. Outlook, using Exchange cached mode, connects to the
Exchange Server and automatically downloads all incoming content, such as e-mail, meeting
requests, and tasks to a dedicated .OST file, which serves as a local cache on the client
computer. Once the download has completed, the user can read, reply to, create new, and
delete e-mail as well as sending tasks and meeting requests. Outlook, working continuously
in the background, connects the local cache file to the Exchange Server to upload the new
outgoing content and download any additional new incoming content. Users typically do not
notice any difference in messaging performance when using Exchange cached mode, other
than the clear benefit of being free of slow network connections or poor server performance.

Exchange cached mode, a feature of Outlook 2003, is supported under both Exchange 2000
and Exchange 2003, but several performance improvements have been implemented
specifically to enhance the performance of Outlook 2003 clients when used in conjunction
with Exchange 2003.

Exchange cached mode is considered a key requirement toward the Exchange Server
consolidation effort. Exchange cached mode will prevent regionally located users from
suffering from the effects of system latency when working with Outlook over WAN links
connected to remote mailbox servers.

Exchange 2003 Deployment and Architecture Page 19


Data Compression
To reduce the amount of information sent between the Outlook 2003 client and
Exchange 2003 servers, both Exchange 2003 and Outlook 2003, when working in tandem,
perform data compression that significantly reduce network traffic. Microsoft IT found that it
reduced the total Exchange 2003-Outlook 2003-related network traffic by an average of
40 percent. Exchange 2003 also reduces the total requests for information between the client
and server, thereby optimizing the communication between the client and the server.

This significant level of data compression between client and server helped Microsoft IT
mitigate the effect of additional WAN usage generated when local servers were consolidated
onto regional servers. What was formerly all SMTP network traffic locally has now become all
Messaging Application Programming Interface (MAPI) Remote Procedure Call (RPC)
network traffic across the WAN, but the quantity of that traffic was significantly reduced when
compared to traffic generated by previous versions of Exchange and Outlook.

Remote Procedure Call (RPC) over Hypertext Transfer Protocol


(HTTP)
Exchange 2003 and Outlook 2003, combined with Windows Server 2003, support the use of
RPC over HTTP to access Exchange. Using the Microsoft Windows RPC over HTTP feature
enables the secure use of Outlook 2003 over the Internet without setting up a virtual private
network (VPN) tunnel with remote access or using OWA. Outlook always communicates with
the Exchange server using RPC. When Outlook is configured to use this new feature, it will,
by default, first attempt to connect to its corporate Exchange mailbox server by means of
RPC over Transmission Control Protocol/Internet Protocol (TCP/IP) as it would in a corporate
network setting. If the server cannot be located this way, then Outlook attempts to connect to
its corporate Exchange mailbox server by means of RPC over a secure HTTP link on the
Internet using SSL. RPC over HTTP comes through the same Exchange front-end servers
that serve users of OWA, OMA, and EAS. Effectively this service is identical to OWA to the
Exchange back-end servers, but instead of using Internet Explorer as the e-mail client, the e-
mail client is Outlook 2003. Similar to OWA, if the RPC connection is made through the
Internet, users are prompted to enter their network logon credentials before access to the
Exchange Server data is granted.

Note: The feature named RPC over HTTP actually uses Secure Hypertext Transfer Protocol
(S-HTTP) over an SSL connection.

Users who use notebooks as their primary Outlook computer will find this feature to be
especially useful. Users who travel to customer sites and often end up waiting for the
opportunity to make presentations can use RPC over HTTP to keep in touch with their
corporate Exchange server without the need for a VPN connection. RPC over HTTP enables
a user to make a connection through firewalls at customer sites (which typically block VPN
connections) to the corporate Exchange Server, thereby improving their accessibility and
productivity.

Unlike OWA, the contents of locally stored personal folder files are available in Outlook on a
remote connection in exactly the same way they would be while connected to the corporate
network in the office.

Exchange 2003 Deployment and Architecture Page 20


Note: Unlike OWA, RPC over HTTP downloads e-mail information when the user connects
to the Exchange Server (assuming the use of Outlook cache mode). Therefore, RPC over
HTTP should only be used on computers the user controls, such as corporate notebooks,
instead of on shared computers or public kiosks.

Microsoft IT is optimistic that the use of RPC over HTTP will reduce the number of VPN
servers required to meet the needs of the company. Most employees use VPN to connect to
the corporate network primarily to use Outlook. To quantify the level of VPN usage, Microsoft
IT is analyzing the matter to better understand employee needs in an effort to reduce the
number of VPN servers deployed without reducing needed connectivity services.

Exchange 2003 Deployment and Architecture Page 21


EXCHANGE 2003 ARCHITECTURE DESIGN DECISIONS
The successful Microsoft IT deployment of Exchange 2003 required the integration of many
disparate elements. Not only was the Exchange server software new, but it also required the
addition of other new technologies, such as server and storage hardware from third-party
sources and Microsoft Windows Server 2003 and Microsoft Office 2003 software, for
Microsoft IT to gain the maximum benefit from the deployment. Design considerations for the
network, including bandwidth requirements and SLA agreements for backup and restore,
were also considered. Because of the design decisions made, the resulting changes also led
to operational changes in Microsoft IT.

Topology
Microsoft IT used the topology from the Exchange 2000 on Windows 2000 Server as its basis
for designing the topology in the Exchange 2003 deployment. Active Directory was a key
element in the organizational structure and administrative requirements for Exchange 2000.
Microsoft IT was able to use the existing Active Directory structure for the Exchange 2003
deployment.

Microsoft IT was already deeply involved in the deployment of Windows Server 2003 in its
worldwide network infrastructure when the initial deployments of Exchange 2003 began. This
development was critical, for while Exchange 2003 can run on Windows 2000 Server,
Exchange 2000 cannot run on Windows Server 2003. Running Exchange 2003 on Windows
Server 2003 presents many additional benefits to Exchange, which are discussed in detail
later in this paper. Those benefits enabled Microsoft IT to begin implementing plans for
consolidating the number of servers in the messaging infrastructure worldwide, which drove
the design for the Exchange 2003 topology.

For more information about Microsoft IT’s Exchange Server 2003 topology, see the iT
Showcase technical white paper titled, “Exchange 2003 Site Consolidation” at
http://www.microsoft.com/technet/itshowcase.

Mobility Design and Configuration


The definition of mobility at Microsoft has grown to include systems not typically associated
with mobile technologies. Devices using Microsoft IT’s mobile infrastructure include more
than just Pocket PCs and Smartphones. Microsoft employees using notebook computers or
Tablet PCs running Outlook 2003 can use RPC over HTTP to access the Microsoft corporate
Exchange servers with just an Internet connection. Any remote, Internet-accessible computer
can serve as an OWA client for Microsoft employees. All of these technologies go through
the same mobile infrastructure to access Exchange 2003.

The mobility enhancements in Exchange 2003 enabled Microsoft IT to modify the design of
its mobile messaging infrastructure with additional server consolidations and improved
security. The mobility infrastructure in Microsoft IT includes such services as OWA, OMA,
EAS, RPC over HTTP, and up-to-date notifications.

Consolidation of Front-End Servers


In addition to the mailbox server site and server consolidation project, Exchange 2003 has
also enabled Microsoft IT to consolidate its mobility server infrastructure (also known as
Exchange front-end servers). Microsoft IT no longer has to deploy a multiple-server
infrastructure within each domain to provide mobility services. Deploying OWA and MIS with

Exchange 2003 Deployment and Architecture Page 22


Exchange 2000, on the other hand, required an Exchange front-end server dedicated to
OWA and separate servers for MIS. By using Exchange 2003, all the mobile messaging
features reside on one physical front-end server, enabling Microsoft IT to consolidate the
number of front-end servers dedicated to hosting mobility features.

Microsoft IT reduced its server population from seven OWA servers and seven MIS servers
(one set for each domain in the Microsoft corporate network) to seven Exchange front-end
sites hosting OWA, OMA, EAS, and RPC over HTTP services. Each Exchange front-end site
worldwide hosts a pair of non-clustered, network load balanced Exchange front-end servers.
While Microsoft IT theoretically could have consolidated to a single set of Exchange front-end
servers, the project team decided to retain the larger number due to the network latency that
is caused by the great geographic distances between Exchange front-end servers and
regional Exchange mailbox servers. If Microsoft IT had consolidated to a single set, user
performance would have suffered. Network latency would have been particularly evident
among those users with slow Internet connections or mobile devices.

Mobile Security Enhancements


Microsoft IT also used the enhanced security features for OWA offered in Exchange 2003 for
its front-end server deployment, such as time-based logoff and forms-based authentication.
Unlike OWA under Exchange 2000, a secure, HTML forms-based, authentication screen
appears when a user navigates to a front-end server instead of an NTLM-based dialog box.
In addition to logon credentials, the form asks two additional questions:

1. Is the user logging on from a public kiosk/shared computer or from a private home
computer?

2. Does the user want to use basic or premium OWA user interface (UI) feature sets? (The
answer typically depends on whether the connection is a fast or a slow data link.)

All of the UI elements displayed in the OWA logon page are customizable, enabling the
inclusion of company logos, specific URLs to regional front-end servers, custom usage
instruction text, and more. Microsoft IT created its customized OWA page using these
features.

Once the form has been filled out and the user clicks Log On, the data is encapsulated and
sent by means of an SSL connection to the front-end server specified by the user when they
navigated to the specific server to bring up the authentication form. Once the logon
credentials have been sent over the Web, a special time-out cookie is created on the local
client computer. Depending upon whether the user indicated the client is a public or private
computer, the time-out cookie starts counting up to a threshold of inactivity. Once that
threshold is met with no activity having taken place for that duration, the session connection
is automatically closed, and requires reauthentication if the user wants to regain access to
the Exchange mailbox. Microsoft IT configured the time-out cookie to close out inactive
sessions on public or shared computers after 15 minutes, whereas inactive sessions on a
user’s private home computer were configured to last for two hours of inactivity before
closing. The session time-out periods are enterprise customizable to meet any security
requirements.

In order to provide an additional level of security, Microsoft IT chose to deploy Internet


Security and Acceleration (ISA) servers to act as the reverse proxy for all Exchange front-end

Exchange 2003 Deployment and Architecture Page 23


servers. This allowed the front-end servers for Exchange 2003 to be placed behind the
firewall, safely within the corporate network, no longer directly connected to the Internet.

Server Design and Configuration


In designing the server platform for its Exchange 2003 deployment, Microsoft IT considered a
variety of factors. Aside from the normal hardware issues of system reliability and vendor
support, the key technical issues considered included new processor technology, cluster
implementations, server designs, and mobility issues. As a result, Microsoft IT has moved all
its Exchange mailbox servers to running in a clustered environment.

Processors
Processor technology continues to advance, improving performance in processing speeds,
increasing the number and enlarging the size of on-board caches, and increasing the number
of tasks that can be processed in parallel. Most of the servers of the Exchange 2000
infrastructure were based on Intel Pentium II and Pentium III processors running in the 500 to
700 MHz range, with a 100 or 133 MHz front-side bus (FSB).

Given the advances in processor technologies since Microsoft IT ‘s deployment of


Exchange 2000, Microsoft IT chose to deploy Exchange 2003 on new systems based on the
Intel Xeon Processor MP Hyper-Threading processors employing a 400 MHz FSB.

Hyper-Threading enables a single processor to process information as if it were two separate


processors sharing the same memory bus and cache. In effect, the four-processor, Hyper-
Threading servers implemented by Microsoft IT functionally serve as virtual eight-processor
servers. However, a processor equipped with Hyper-Threading technology does not offer the
same performance benefits as a genuine dual-processor system. Because Hyper-Threading
processing shares the same on-chip memory cache and main memory bus, Microsoft IT has
measured an actual Exchange performance increase benefit of approximately 25 percent
higher than that of a normal, non-Hyper-Threading processor of the same clock speed.

Clustered Server Design


All the new servers Microsoft IT purchased to host Exchange 2003 mailbox servers were set
up as clusters and equipped with Xeon Processor MP microprocessors.

Through a combination of Exchange Server 2003, Windows Server 2003, third party SAN
technology, and faster servers, Microsoft IT decided to create a clustered server design that
offers greater operational reliability and a reduction in administrative overhead. Their design
choice allowed them to achieve the following specific benefits:

• Reduced service outages by having active node mailbox servers automatically failover to
passive node servers.
• Clustered Exchange Virtual Server (EVS) failover performance of just two minutes was
achieved, regardless of the amount of the mailbox data contained within the SAN
attached to the failed node.
• Increased the number of EVSs as well as the number of supported SGs per EVS within
the cluster. Each SG was configured to use three LUNs. Volume Mount Points were
used with these LUNs to minimize the number of drive letters used.
• Enabled server consolidation by hosting many more mailboxes per server.
• Reduction in administration and maintenance overhead by consolidating more than 113
mailbox servers in 75 locations into 38 servers in seven locations.

Exchange 2003 Deployment and Architecture Page 24


• Reduced potential server outage impact to users (previously six hours or more per user)
from a database restoration.
• Improved backup and restore times to less than one hour.
• Achieved server availability of 99.9 percent with a fiscal year 2004 SLA goal of achieving
99.99 percent.
• Enabled the implementation of rolling upgrades to minimize the impact of service
outages while speeding up server operating system and application upgrades and
patching.
• Doubled the user mailbox limit (to 200 MB)
Microsoft IT’s design goal was to support 8,000 mailboxes per SAN, with 200 MB mailbox
limits, 99.99 percent cluster server availability, and less than one hour per database backup
and restore time. The scaling of the data center EVSs in the Main corporate forest was
designed to reach 4,000 mailboxes.

Multi-Node Cluster Design


Microsoft IT chose to use a multi-node cluster design using multiple active (online) and
passive (offline) nodes. This design enables a failed active node to be immediately replaced
by an identically configured passive node and for the resources of the failed active node,
such as storage, to be immediately transferred to the passive node, thereby insuring that the
end user experience is minimized by the failover.

Microsoft IT implemented two separate types of passive nodes: primary passive nodes and
alternative passive nodes. A primary passive node is a server using equivalently equipped
hardware to the active node servers. This allows for full functionality upon an active node
failover. The alternative passive node is a server equipped with lower-scaled hardware that is
used primarily for tasks such as streaming backup data from disk to tape. It also serves as a
reduced performance failover server. Both types of passive nodes are leveraged for rolling
software upgrades.

Microsoft IT’s multi-node cluster design employs both primary and alternate passive nodes.
Unlike primary passive nodes, alternative passive nodes are smaller servers primarily
designed to carry out disk-to-tape backup tasks. Microsoft IT uses all of the passive nodes in
the cluster when rolling upgrades of the operating system and/or Exchange are required.
Instead of failing an active node to the primary passive node, upgrading the offline active
node, then restoring the upgraded node to active status again and rolling through this cycle
for every active node in the cluster, Microsoft IT’s deployment of alternative passive nodes in
conjunction with primary passive nodes speeds up the process. Microsoft IT first patches all
the offline passive nodes, then fails over the number of active nodes equivalent to the
number of available passive nodes. These offline nodes are then upgraded in parallel and
restored to service when ready. This process is repeated once to upgrade the one remaining
active node server.

Microsoft IT Cluster Designs


Microsoft IT implemented two primary cluster designs for the Exchange 2003 deployment in
the Main corporate forest: a regional design and a headquarters data center design. A
separate, scaled validation design was also deployed in the Level B Test limited-use
production forest. All used the multi-node, Active/Passive cluster design. Table 4 shows the
Microsoft IT cluster configurations.

Exchange 2003 Deployment and Architecture Page 25


Table 4. Cluster design specifications per deployment

Regional Headquarters Level B Test

Number of four-processor Active Nodes 3 4 1

Number of four-processor Primary 1 1 1


Passive Nodes

Number of two-processor Alternate 1 2 0


Passive Nodes

Number of SGs per Active Node 4 4 4

Number of mailboxes per Active Node 2,700 4,000 5,000

Number of databases per Active Node 20 20 20

Number of mailboxes per database 135 200 250

Maximum size of database 27 GB 40 GB 50 GB

Number of mailboxes per cluster 8,000 16,000 5,000

• Regional Design. The server specification for the regional cluster implementation
consists of one SAN enclosure per cluster, with three active nodes, one primary passive
node, and one alternate passive node (designated as AAAPp).
• Headquarters Design. The headquarters clustered implementation is similar in design.
It consists of two SAN enclosures, four active nodes, one primary passive node, and two
alternate passive nodes (designated as AAAAPpp).
• Level B Test Forest Design. The Level B Test server specification is similar to the
regional cluster in design but with greater mailbox capacity. It consists of one SAN
enclosure, one active node, and one primary passive node (designated as AP).
To get the best performance at the best price point, Microsoft IT standardized on the four-
processor, 1.9 GHz Intel Xeon Processor MP server for its active and primary passive cluster
nodes for both regional and headquarters data center deployments. For alternative passive
cluster nodes, Microsoft IT uses two-processor 2.4 GHz Intel Xeon Processor MP servers.
Because of this new processing platform, Microsoft IT has seen substantial performance
improvements in its Exchange 2003 infrastructure.

Microsoft IT’s cluster design supports a significant increase in both the number and size of
mailboxes per Exchange server. It helps eliminate performance impact to users during the
second stage backup process because it offloads that stage of the backup process to non-
active servers within the cluster, thereby maintaining the SLA.

Storage Design and Configuration


The entire design of Microsoft IT’s storage configuration was based on effectively managing
peak time disk I/O. Microsoft IT studied the usage trends of its Exchange 2000 messaging
storage infrastructure and learned that the peak period of usage is typically Monday
mornings. Microsoft IT took that usage data and made it a baseline for designing the
Exchange 2003 SAN solution. Microsoft IT calculated the average amount of peak time disk
I/O per second attributed to each mailbox. They calculated the total I/O rate for a server as
the product of the number of mailboxes multiplied by the I/O rate.

Exchange 2003 Deployment and Architecture Page 26


For example, on a server supporting 4,000 mailboxes with a peak time I/O rate of 1.2 per
mailbox per second, the total I/O rate for that server equates to 4,800 I/Os per second. The
amount of data in each I/O transfer in Exchange is four KB, which at that I/O rate, equates to
nearly 20 MB of I/O per second. Add to that the fact that each SAN enclosure serves two
hosts in the headquarters data center configuration, the I/O rate doubles to nearly 10,000
I/Os per second.

In Microsoft IT’s design for meeting this demand, each SAN enclosure selected by Microsoft
IT can support up to 12,000 I/Os per second, affording a margin of headroom for unusual
spikes in activity but expected to perform adequately in normal peak periods of I/O activity.
Any significant load beyond this would likely result in disk read and write latencies, which
would adversely affect the performance of all the mailboxes attached to that SAN. Microsoft
IT system architects deemed this an acceptable risk, given anticipated conditions, the cost of
additional hardware, and monitoring and alerting improvements in Microsoft Operations
Manager.

To determine the messaging storage requirements for any enterprise, one must measure
average peak time I/O per mailbox user per second, the maximum size of mailboxes, the
length of time items are retained in deleted item retention, and the typical usage e-mail
patterns turnover rate in an organization. These are the factors Microsoft IT considered when
designing their Exchange 2003 SAN solution.

Microsoft IT allocated additional capacity to each LUN supporting mailbox stores in an


attempt to mitigate any requirement for future resizing based on unexpected growth. The
LUN was sized to support six and half production databases with a “fluff factor” of 1.4.

Fluff factor is what Microsoft IT refers to as the average capacity allocation to support a given
mailbox on disk based on deleted item retention, database overhead, non-limited mailboxes
etc. For example, creating 100 MB mailboxes for users on Exchange 2000 actually required
them to reserve 140 MB of space per user. The value of 1.4 was trended over the years on
production Exchange servers supporting 100 MB mailboxes and was maintained as a basis
for designing the new solution with support for 200 MB mailboxes.

Microsoft IT’s 100 MB mailbox size limit was a hard and fast disk quota limitation set and
enforced at the Exchange level by means of policy, but if the user consumed the entire 100
MB of available space, it was often because they had exceeded the amount on the back end.
This usually happened when a user deleted e-mail from a mailbox. The e-mail was actually
not immediately deleted from the mailbox database on the server. Rather, it was temporarily
retained in the database, held in a space known as deleted item retention. Only after three
days was the deleted e-mail actually purged from a mailbox database. Microsoft IT needed to
account for that level of usage overhead when planning its storage needs for Exchange
2003.

Additionally, Microsoft IT sized each data LUN to support six and a half databases even
though they would only support five in production. This allowed them to duplicate a single
corrupted database on the same LUN and then run an integrity check on it. This ability to use
the same LUN enabled Microsoft IT to provide the fastest possible response to database
corruption.

Exchange 2003 Deployment and Architecture Page 27


Selecting a SAN
Like many organizations, Microsoft IT decided to make a clean break from the paradigm of
local (host-based) direct attached SCSI storage to SAN-connected storage. In the past,
server storage was treated as a key server component that was closely married to the server
hardware. SAN technology has made storage become more like a utility service; it is no
longer as closely tied to the server. While this arrangement has both pros and cons, Microsoft
IT opted for SAN storage because it meets Microsoft IT’s requirements for future
performance, scalability, and capacity. Those requirements could not be satisfied by locally
attached storage arrays.

The deployment of Exchange 2003 gave Microsoft IT the opportunity to assess how SAN
technology had matured since it had last been studied. Microsoft IT embarked on a project to
qualify and test technology and products from SAN vendors. Microsoft IT required that any
new SAN technology standard implemented at Microsoft needed to be easily supported in
remote locations. Microsoft IT required that a storage solution be easy to deploy, modular in
design, and remotely manageable.

Within each HP StorageWorks Enterprise Virtual Array 5000 (eva5000) SAN used by
Microsoft IT are 168 disks. Each SAN enclosure supports approximately 8,000 200 MB
mailboxes. Each SAN enclosure has the ability to process about 12,000 I/Os per second
before disk latency becomes evident. Each mailbox server in the headquarters data center
will support 4,000 mailboxes and is expected to process a peak-time load of between 5,000
and 6,000 I/Os. As a result, one SAN enclosure supports two mailbox servers in the
headquarters data center. Regional mailbox servers will support just under 2,700 mailboxes,
so the resultant peak-time load of three regional servers is supported by one SAN enclosure.

Storage Allocation Using Volume Mount Points


Microsoft IT used the new cluster support for volume mount points in Windows Server 2003
to eliminate the drive letter as a scalability blocker towards bringing on multiple Exchange
instances within a single cluster. The chosen design used a drive letter assignment per data
LUN (one per SG) with four data LUNs per cluster node (Microsoft IT configures each node
to support one Exchange Virtual Server). The corresponding log LUNs were configured as
volume mount point clustered resources, each of which was dependant upon its parent data
LUN. Included in the design is a dedicated Queue LUN that is also maintained as a volume
mount point clustered resource, which is dependant on the data LUN assigned to SG1.

The use of volume mount points allowed Microsoft IT to configure an optimized disk layout
using four drive letters to maintain nine physical LUNs. This design allowed for the creation of
four Exchange instances that mapped across thirty-six physical LUNs utilizing only sixteen
drive letters.

Subsequent LUNs were maintained to support online backup-to-disk with a single disk
allocated per SG per node. The disk assigned for SG1 per node support three additional
volume mount point LUNs as backup targets for SG2, SG3, and SG4. The backup resources
were configured across sixteen physical LUNs addressable by four drive letters.

Exchange 2003 Deployment and Architecture Page 28


A representation of the drive letter allocation for the first node and corresponding allocation to
support the online backup devices is given in Figure 1.

Node 1 Backup
Node 1
SMTP
M:\Exsrvr03
50 GB (VMP)

Data SG1 Logs Backup


M:\ M:\SG1_Logs Z:\
350 GB 40 GB (VMP) 350 GB

Data SG2 Logs Backup


N:\ N:\SG2_Logs Z:\Backup_2
350 GB 40 GB (VMP) 350 GB (VMP)

Data SG3 Logs Backup


O:\ O:\SG3_Logs Z:\Backup_3
350 GB 40 GB (VMP) 350 GB (VMP)

Data SG4 Logs Backup


P:\ P:\SG4_Logs Z:\Backup_4
350 GB 40 GB (VMP) 350 GB (VMP)

Figure 1. Drive letter allocation per node.

Note: In the context of Figure 1, VMP represents a volume mount point.

In all, a total of 53 physical LUNs are addressable using 21 drive letters within the clustered
design. This allows for easy disk subsystem optimization with LUNs distributed across
controllers and Fibre Channel Adapters (FCAs) to ensure peak disk transfer requirements are
met as required within the Microsoft production environment.

Redundant Storage System Paths Using Secure Path


Microsoft IT’s deployment of SAN technology includes an I/O design that not only provides
redundancy but also uses that redundancy for optimal data flow.

Microsoft IT uses HP StorageWorks Secure Path for Windows to provide many benefits
within its SAN infrastructure. Secure Path provides three key benefits:

1. Eliminates the risk of a single point of failure supporting the server and SAN
interconnect.

2. Allows for LUN distribution to maintain optimized I/O required on a busy Exchange host,
reducing peak read/write disk latency and substantially improving online backup
throughput to disk.

3. Insures only single LUN presentation independent of the number of paths to the host.

Microsoft IT’s implementation of Secure Path uses two FCAs per host, two fibre channel data
switches, and two storage controllers. Each FCA, switch, and controller group makes up what
is known as a fabric. Secure Path allows the use of two separate fabrics per SAN, and each

Exchange 2003 Deployment and Architecture Page 29


element of the fabric is interconnected with subordinate elements from both fabrics. More
precisely, each active node host in a cluster connects to each switch by means of the two
FCAs installed in each host (one FCA per switch). Each switch takes inbound data from each
host and has two outbound data connections, one to each controller. Each controller has two
inbound data connections, one from each switch, and has one outbound data connection to
the SAN enclosure. Secure Path enables Microsoft IT to be operationally tolerant to a single
component failure in an FCA, a connecting cable, a switch, or a controller. Service
performance would be affected in the event of a component failure, but it would be able to
continue to operate seamlessly.

Secure Path also assists with eliminating many single points of failure between the nodes
and the connected SAN storage. Microsoft IT can maintain service in the event of a
component failure affecting a single FCA per host, multiple fiber cables, fiber channel
switches, or a single storage controller that makes up the SAN fabric. The component
failure is detected by Secure Path, which ensures that I/O is maintained by moving LUNs
from the failed path to an available path. This process, called failover, requires no resource
downtime while maintaining LUN availability. Failed-over LUNs can be failed-back using HP's
Secure Path Manager to restore optimized I/O, once failed components have been replaced.

The headquarters data center cluster implementation using Secure Path to connect to a
16,000 mailbox SAN is shown in Figure 2.

SAN Controller Two SANs totaling 16,000


Pair 1 mailboxes with 200 MB limit
SAN Controller 14
Pair 2 72GB disks
14
72 GB disks14
72GB
14 disks
Switch A Switch B 72 GB disks
14
14
72GB disks
72 GB disks
14
14
72GB disks
Storage Management Appliance 72 GB disks
14
14
72GB disks
Alternative Passive Node 72 GB disks
Dual FCAs 14
14
72GB disks
Alternative Passive Node 72 GB disks
Public Dual FCAs Controller
Network Controller
Controller
100 MB Primary Passive Node Controller14
Full
Dual FCAs 72GB disks
Duplex 14
72 GB disks14
72GB disks
14
Active Node
72 GB disks
Dual FCAs 14
14
72GB disks
Active Node 72 GB disks
14
Dual FCAs 14
72GB disks
72 GB disks
14
Active Node
14
72GB disks
Dual FCAs
72 GB disks14
Fabric Path A
14
72GB disks
Active Node 72 GB disks
Fabric Path B
Dual FCAs

Figure 2. Secure Path Connecting a Data Center Cluster to a Pair of SANs

Backup and Recovery


With the implementation of Exchange 2003 in a clustered server environment, Microsoft IT
designed a two-stage backup process (disk-to-disk and disk-to-tape) to meet its SLAs better.

Exchange 2003 Deployment and Architecture Page 30


This process prevents the tape backup process from affecting the production server
performance, and provides greater flexibility in managing the data restoration process. The
solution is based on a combination of:

• Exchange Server 2003


• Microsoft Windows Server 2003, Enterprise Edition
• Windows NT® Backup for disk-to-disk backup
• Veritas Storage Management solution for disk-to-tape backup
In the past, it was challenging to maintain the one-hour backup restore SLA on direct
attached SCSI storage server implementations. These server designs used a one-step
backup process (disk-to-tape), where backups were performed to tape libraries over the
Gigabit LAN. Microsoft IT’s experience showed that they could move data at a rate of
approximately 36-37 MB per second, or about 33+ GB per hour. Backups were limited to
non-business hours to minimize any impact to clients with mailboxes hosted on these
servers. However, if a backup failed to complete by 7 A.M., it had to be canceled. Otherwise,
the continuing backup process would have a significantly negative impact on the system
performance of the messaging infrastructure for clients.

Recovering a mailbox store affected by corruption in Exchange 2000 meant that 1,000
mailboxes were out of service for six or more hours during the restore operation. This
represented a cost in lost productivity of $60-$80 per hour per user. Single mailbox restore
operations required dedicated restore servers. This configuration is shown in Figure 3.

Regional
Configuration
100 MB LAN

1,000 mailbox Tape


Exchange 2000
Server
1,000 mailbox
Exchange 2000 100 MB LAN
Server
1,000 mailbox Dedicated
Exchange 2000 Exchange 2000
Server Restore Sever
Figure 3. Previous Regional Messaging Backup Environment

Two-Stage Backup Solution


To solve these problems and support server consolidation, Microsoft IT designed a flexible,
two-stage process to backup data within a multimode clustered configuration—disk-to-disk
(stage 1) and disk-to-tape (stage 2).

Microsoft IT leveraged the fact that resources within a cluster resource group can move
within that resource group independent of other resource groups. For example, an active

Exchange 2003 Deployment and Architecture Page 31


node of a clustered Exchange server is attached to a separate cluster resource group of
dedicated backup LUNs in addition to the resource groups used for storing production data,.

In the first stage, backup runs on all active nodes within the cluster to complete an online,
disk-to-disk backup from the LUNs in the production data resource groups to the LUNs in the
backup resource group over a direct attached fibre channel. The backup resource group has
the capacity to support two-day online retention. Once that process has completed, the
control of the LUNs in the backup resource group is transferred to an alternative passive
node. At this point, passive node initiates the second stage, disk-to-tape backup from the
backup resource group to the tape library over a direct attached fibre channel. This process
frees up the active nodes from the time consuming disk-to-tape data transfer, thereby
minimizing the amount of time required of the active nodes for processing data backup
operations. This process is shown in Figure 4.

Figure 4. Two-stage Backup Process

Microsoft IT elected to use this two-stage process rather than using a single stage, disk-to-
tape backup over a direct fibre attachment to a tape library. While the single-stage process
would eliminate the need for backup LUNs in the SAN, which would free up additional
storage capacity in the SAN for more mailboxes, Microsoft IT realized that it could not take
the risk of losing valuable production time in the event that the node in the cluster might
become disconnected from the tape library. If that happened, the node server would be
required to reboot to reattach the server to the library. If the active node were the server
performing this work, Microsoft IT would be required to failover the node so it could reboot
and reconnect to the library. Microsoft IT considered that an unacceptable risk to system
availability. Instead, by placing the burden of backing up to tape on a passive node that does
not support users, no loss of production service occurs when the passive node needs to be
rebooted to restore the server-to-library connection.

Per-database online backups are scheduled at regular intervals that let Microsoft IT back up
each entire server between 8:00 P.M. and 1:30 A.M. The databases are backed up
concurrently per SG. An important feature here is that Exchange 2003 allows parallel backup
and restore operations on a per-SG basis. Therefore, backup operations for each database
can be interleaved.

Exchange 2003 Deployment and Architecture Page 32


Recovery Solution
With Microsoft IT’s new clustering solution, a server hardware failure is simply a matter of an
automatic cluster node failover; service is negligibly affected. If there is a disk failure, different
recovery scenarios are implemented, depending upon the scope of the failure and the time of
day at which it occurs.

Methodology is No Longer Scenario-Dependent


The method of recovery employed used to be based on the type and scope of failure incurred
and business priorities. With Exchange 2000, organizations had a choice between restoring
their messaging service quickly while giving up immediate access to old mailbox data, or
restoring full access to their service but taking more time to do it.

For example, if a single database was lost, up to 200 people could have been affected.
Because up to two days of backup data was available on disk and could be restored online in
less than an hour (restore rates of up to 2 GB per minute were achieved), regular Exchange
restore procedures were used to get user mailboxes quickly back online with their data.

Note: Each Exchange database consists of two files: the Exchange Database (EDB) file and
the Streaming Media (STM) file.

With Exchange 2000, if an entire SG was lost, the time of day of the failure was often the
deciding factor on how to proceed. If the failure was during the business day, restoration of
service usually took precedence over restoration of data, which could be restored later. In
that scenario, the damaged databases are deleted and recreated (a process known as
“stubbing a database”).

If the failure occurred in late, non-business hours, Microsoft IT chose to sacrifice the
immediate return of service in favor of a faster restoration of all lost data. In that situation,
they elected to perform the restoration without stubbing the affected databases.

Exchange 2003 Deployment and Architecture Page 33


The decision tree used by Microsoft IT to determine whether to restore service first and data
later or restore data and service simultaneously is illustrated in Figure 5.

1. Is problem
Move mailboxes and
known and No
investigate further
resolvable?

Yes

2. Is a
database
Fix problem No Complete
restore
required?

Yes

After business hours,


swap stubbed
Bring up stubbed database and RSG
3. Is it between 8 databases to allow database(s) and run
AM and 4 PM on Yes users to work on e- Exmerge on small
business day? mail and begin amount of content
restore to RSG from stubbed
databases now
located in RSG.
No

4. Can database
restore be
No Complete
completed by 8 AM
next business day?

Yes

Start database
Complete
restore

Figure 5. Microsoft IT Production Restoration Decision Tree

Employing the Recovery Storage Group (RSG)


With Exchange 2003, service is typically restored very quickly regardless of the time of day a
database failure occurs. Instead of waiting until off-business hours, the process of restoring
the data from the prior evening’s disk-to-disk backups is immediately started.

Exchange 2003 Deployment and Architecture Page 34


To restore that data as quickly as possible, Microsoft IT can use a new Exchange 2003
feature called RSG, a special, offline SG built solely for the task of rebuilding a lost SG from
backup. Even though Exchange 2003 only supports four SGs in production for users, it now
supports the RSG as an additional offline SG – one that does not support production user
access.

Microsoft IT creates a temporary RSG and restores the corrupted databases from the backup
source into it. Once the restore from backup is complete, the data generated between the
time of the backup and the point of failure is restored through replaying the transaction logs.
This process immensely speeds up the recovery of both the users’ messaging service and
their data from corrupted databases. When the replaying of transaction logs is complete, the
restored database is swapped places between the RSG and new stubbed SG database.
Then any new data generated between the time e-mail service was restored and the data
recovery was completed is exported from the stubbed database and imported into the
restored database using the Microsoft Exchange Mailbox Merge Wizard (also called
ExMerge). The RSG is then deleted afterward. Because database restore speed is restricted
to LAN-based tape, this method is also used for the legacy non-clustered servers that are
currently in the process of being consolidated. In a large storage failure, a large amount of
data must be recovered, and many mailboxes might wait for an extended period before the
data is restored.

For more information about Microsoft IT’s Exchange Server 2003 Backup and Restore, see
the iT Showcase technical case study titled, “Messaging Backup and Restore at Microsoft” at
http://www.microsoft.com/technet/itshowcase.

Future Backup Technology


Microsoft IT is currently testing a new feature of Window Server 2003 called Volume Shadow
Copy service (VSS) for use as a one-step Exchange backup. This service allows local file-
system or specific vendor storage-based data snapshot functionality.

VSS offers the ability to clone disk data, creating an image of that data at a single point in
time. Microsoft IT’s goal is to end its reliance on the current two-stage online backup process
and instead use VSS to clone its servers at midnight and then use VSS differential snapshots
again at 12 noon and at 6 P.M. to a new set of clone LUNs. In an incident, the scope of data
loss incurred and time of day the incident occurs would determine whether Microsoft IT would
use the last known good VSS clone or snapshot to restore data. For example, if after 2 P.M.,
a database goes offline due to corruption, the easiest and fastest method of restoring the
data and service of that database would be to restore that data from the noon snapshot. If the
corruption is detected late in the evening hours, due to the reduced traffic load, restoring from
the last clone would be the preferred method. The recovery of large amounts of data using
VSS should occur in a matter minutes as opposed to hours as it does today.

VSS as a backup solution has many third-party dependencies to make it work efficiently. A
requestor, a provider, and a writer are needed. Microsoft IT is testing the operational benefits
of VSS as a possible solution for “snap and clone” integration. As of this writing, VSS is not
used for production backups in Microsoft IT and is still in a testing phase.

Exchange 2003 Deployment and Architecture Page 35


Management and Monitoring using Microsoft
Operations Manager (MOM) 2000
With Exchange 2000, Microsoft IT used an internally developed tool called Prospector for
monitoring Exchange servers. Prospector monitored key indicators such as services running,
databases mounted, and disk usage. Prospector was very efficient, but it was limited in what
it could do.

Just before Microsoft IT migrated to Exchange 2003, Microsoft IT decided to migrate from
Prospector to MOM 2000 with the MOM Exchange Management Pack for the management of
its Exchange servers. MOM is an enterprise systems management application that uses a
client agent to collect predefined events into a central database from event logs on monitored
servers. It also creates, in response to the predefined events, alerts that are routed to central
consoles monitored by the Data Center Operations staff.

In addition to many other capabilities, MOM provides specific instrumentation for Exchange
Server. The key Exchange 2003 management data monitored include server state,
performance metrics, and messaging queue status. MOM also provides customizable
“Knowledge Scripts” (KS) that enable system managers to create specific management
objects for the operating system or applications. Microsoft uses the MOM KS functionality
extensively for managing the Exchange 2003 environment. Table 5 provides an overview of
some of the key MOM Knowledge Scripts that Microsoft uses for Exchange 2003.

Table 5. Key MOM Knowledge Scripts for the Microsoft Exchange 2003
Deployment

Knowledge Script Purpose

Service Monitor Polls important Exchange services such as STORE.EXE and


generates alerts when these services are down.

Backup Monitor This script looks at backup operations and databases to verify that
regular backup operations are occurring. The script enumerates SGs,
verifies log files and database headers to ensure they have been
backed up.

Disk Space Monitor This script verifies that there is sufficient disk space available for
transaction log, database, and backup volumes. The script verifies that
at least 20 percent free space is available.

Event Log Monitor This script checks for critical Exchange 2003 event log errors. It also
looks for databases that have been dismounted.

Availability Monitor This script verifies that Exchange services are available to users by
performing test logins on each information store.

Discovery This script performs version discovery for configuration management


purposes on items such as software versions, service packs, drivers,
etc.

Active Directory Monitor This script looks at the Exchange 2003 server to discover problems
with access to the AD. Global Catalog and DS_Access errors are of
key concern to this KS.

MOM uses a store-and-forward technique to collect events so that events are reliably
delivered, even if temporary network outages occur during normal operation of the servers.

Exchange 2003 Deployment and Architecture Page 36


The MOM Application Management Packs are a series of predefined events and thresholds
designed to capture the most relevant data for a particular server application.

MOM uses an organizational structure, called the configuration group, to manage monitored
servers. A configuration group typically consists of one database, one or more DCAM (Data
Access Server + Consolidator and Agent Manager) servers, and one or more agents that run
on all monitored computers.

Once the system was up and running, especially after the MOM Exchange Management
Pack was applied and finely tuned to the Microsoft IT group’s needs, there was very little
network traffic overhead in using MOM to monitor servers over the WAN. Because of this
efficiency, early plans for using five MOM configuration groups to better manage the MOM
traffic over the WAN were deemed unnecessary and were scrapped. The process was so
efficient that Microsoft IT needed only one MOM configuration group to monitor all Exchange
servers worldwide, holding the cost for the single MOM configuration group server
deployment to $50,000.

In tuning the MOM Exchange Management Pack, Microsoft IT chose not to modify the default
management pack but instead to create a custom Microsoft IT management pack that
maintained the new and modified rules. This included collecting data not specified by default,
changing default data collection parameters and thresholds, and so forth. Microsoft IT still
uses its custom management pack to manage specific backup events that are unique to its
processing environment. Microsoft IT brought all this tuning and consolidation feedback to
the product development group for inclusion in the released product.

For more information about MOM, see the iT Showcase Technical Solutions Brief titled
“Monitoring Messaging at Microsoft” and the iT Showcase technical white paper titled,
“Monitoring Enterprise Servers at Microsoft” at http://www.microsoft.com/technet/itshowcase.

Application Management
Once MOM detects an alert from a remote server, Microsoft IT can access that server to
further investigate and diagnose the problem by using the remote administration tools built
into Windows Server 2003.

Remote Desktop for Administration & Remote Desktop Protocol (RDP)


Microsoft IT uses the Remote Desktop for Administration and RDP features of Windows
Server 2003 and Windows XP Professional to maintain remote Exchange 2003 servers.
Enabled by Terminal Services technology, Remote Desktop for Administration is specifically
designed for server management. As a result, Remote Desktop for Administration can be
used on an already busy server without noticeably affecting processor performance. This
makes it a convenient and efficient service for remote management. In essence, Remote
Desktop for Administration is used to log on to the server remotely as though it were a local
logon.

Server Management
Microsoft IT uses MOM to create long-term trending data about server performance.
However, the most aggressive trending cycle MOM can manage is recording a data
checkpoint every five minutes or so. For more real-time performance monitoring, Microsoft IT
uses Performance Monitor (PerfMon), a tool provided in Windows Server 2003.

Exchange 2003 Deployment and Architecture Page 37


MOM performance data is maintained on an eight-day schedule (the current day and the
seven previous days). Microsoft IT uses the trending data captured in MOM to track the
performance implications of adding a software patch or hardware driver to an Exchange
server. By noting when a trend in performance data changed and comparing it to the server
change records maintained at end-of-shift staff changes for the Exchange server
environment, Microsoft IT can more quickly attribute performance problems and benefits to
specific changes made at a specific time. Given the extremely high rate of change seen in
the Microsoft IT environment, this is a vital tool in Microsoft IT’s diagnostic process.

HP Insight Manager
HP Insight Manager was the first server element manager available for PC servers. It was
released in 1992. Since that time, Insight Manager has established itself as the leading
management application for server platforms. Microsoft IT uses Insight Manager extensively
to monitor HP hardware-specific information. While Insight Manager has no specific
Exchange management data, system managers use this tool to correlate events from other
management applications with hardware-specific conditions on Microsoft IT’s Exchange 2003
servers. HP Insight Manager also integrates closely with MOM to provide a unified
management platform for system managers. Table 6 shows a few key objects for which
Insight Manager provides management data.

Table 6. HP Insight Manager

Object Data Provided by Insight Manager

Disk Subsystem Insight Manager provides extensive disk monitoring and diagnostic information
that can be correlated with application events such as I/O errors.

Environment Insight Manager provides information on server environment characteristics such


as temperature, fan status, and critical BIOS errors.

Version Control Insight Manager’s version control feature provides detail version information on
firmware, software, and drivers useful for configuration management purposes

Utilization Insight Manager provides hardware-based statistics on processor and I/O bus
utilization.

Storage Management
Events that occur on the SAN enclosure are not recorded into a server’s Event Log, which is
where MOM picks up many of the alerts. Instead, SAN enclosure events are stored in the HP
Storage Manage Appliance (SMA). Microsoft IT also configured MOM to monitor events on
the SMA to keep on top of SAN enclosure events. In headquarters, one SMA was installed
per pair of SAN enclosures. In the regions, one SMA was installed per SAN enclosure. Using
SMAs with MOM ensures that Microsoft IT’s SAN enclosures are as effectively monitored as
its Exchange servers.

Exchange 2003 Deployment and Architecture Page 38


BEST PRACTICES AND LESSONS LEARNED
As part if its early adopter deployment of Exchange 2003, Microsoft IT learned a number of
lessons and also discovered and established a number of best practices for enhancing and
optimizing the service provided by Exchange.

Topology Best Practices


Microsoft IT made many discoveries and overcame many obstacles during the deployment of
Exchange 2003. Several were related to the topology of the network.

Windows Server 2003 Requirements


When upgrading an Exchange 2000 cluster topology to Exchange 2003, Microsoft IT learned
that it was required to upgrade each of the Exchange virtual servers and cluster nodes in its
cluster group, one at a time, for the server cluster to come online successfully. Additionally,
the servers to be upgraded to Exchange 2003 had to be running Exchange 2000 SP3 first.

Exchange 2003 can run on either Windows 2000 Server or Windows Server 2003 computers
and it is supported in all Active Directory environments, including Windows 2000 mixed,
Windows 2000 native, and Windows 2003 domain and forest functional levels. When running
in an environment with Windows 2000 domain controllers and global catalog servers, the
domain controllers and global catalog servers that Exchange 2003 uses must all be running
Windows 2000 SP3 or later. This requirement affects both Exchange 2003 servers and the
Exchange 2003 version of Active Directory Connector (ADC). ADC does not work with
domain controllers or with global catalog servers that are running a version of Windows 2000
Server earlier than SP3.

Mailbox Moves
The new Exchange cached mode feature of Outlook 2003 made the mailbox move process
during consolidation easier to manage. From a client perspective, Exchange cached mode
mitigated any significant performance impact that might have occurred as a result of
migrating from the use of many small Exchange servers toward fewer, larger Exchange
servers.

Microsoft IT took a performance baseline both before and after moving mailboxes from local
to regional services during its mailbox server consolidation effort. Microsoft IT did this to
ensure that post-migration client performance was equal to or better than pre-migration
performance.

This performance data also served a public relations role with customers. Many people are
hesitant to change, and once a change takes place, they often feel as if the change has had
an adverse affect upon client performance. By taking baseline performance data both before
and after the moves, Microsoft IT not only demonstrated that it was concerned about
maintaining good service, but it also showed empirical measurements that proved there had
been no performance degradation.

Offline Address Book (OAB)


Once Exchange cached mode was used on a widespread basis, Microsoft IT encountered a
performance challenge related to the OAB, an offline version of the Global Address List
provided by Exchange. In the past, each individual Exchange Server created its own version
of the OAB. While these versions were all identical in primary content, the OAB identified the
server that created it. This became a problem when mailboxes were moved, even

Exchange 2003 Deployment and Architecture Page 39


temporarily, for the new server failed to recognize the previous server’s version of the OAB
and forced another full download of the OAB, consuming network bandwidth unnecessarily.
Microsoft IT learned from that experience. In Exchange 2003, rather than identifying OABs
with specific sites and servers, Microsoft IT associated them on a regional basis. The OABs
were created by one primary server in each region and then replicated to the other servers in
the region by means of public folder replication.

By associating OABs with regional servers, Microsoft IT was able to eliminate the repetitive
full downloads of the OAB on client computers. Additionally, Exchange 2003 filtered
certificate data from the OAB to reduce its size from 100 MB (300 MB uncompressed) to
approximately 43 MB compressed (about 150 MB uncompressed). The differential OAB
updates, used to update the OAB once a full download is complete, were also reduced to
about 50 percent of their original size.

Public Folder Access


In Exchange cached mode, 90 percent of normal Outlook user tasks, such as creating
mailboxes and performing calendar look-ups, take place in the background. However, some
specific tasks still require real-time access. These include:

• Accessing a public folder.


• Delegating mailbox access (used by people who have permissions to work in another
person’s mailbox, such as an administrative assistant who schedules appointments for a
manager).
• Checking free/busy status of other users (used to check the schedule availability of a
prospective meeting request recipient)
The consolidation of Exchange servers into major regional sites required that Microsoft IT
employ higher capacity public folder servers to provide the same consistent level of
performance to all users. Microsoft IT expected use of these servers to increase significantly,
as more people per server used free/busy publishing, retrieved Outlook 2003 security
settings, and accessed general public folders. Each consolidated site employed two non-
clustered, public folder servers for redundancy and load sharing.

Server Configuration Best Practices


Microsoft IT’s primary discoveries and obstacles with regard to server configuration consisted
of the following:

Server Optimizations
Microsoft IT’s server configurations came equipped with four GB of RAM. These servers ran
Windows Server 2003, Enterprise Edition and Exchange Server 2003 with the following
modifications:

• /3GB switch set in the Boot.ini file


• /USERVA=3030 parameter set in the Boot.ini file.
Windows Server 2003, Enterprise Edition, supports up to 32 GB of RAM. However, a four GB
RAM installation is divided, by default, into two GB for applications and two GB for operating
system use. Because Windows Server 2003 enables RAM tuning, Microsoft IT employed
the /3GB switch to make available to Exchange one GB of the RAM that is normally reserved
for operating system use. However, Microsoft IT soon encountered problems with the
operating system because it was starved for memory. Microsoft IT then used the /USERVA

Exchange 2003 Deployment and Architecture Page 40


switch to further specify how much of the total supply of RAM would be allocated to the
application portion of RAM. Microsoft IT found that setting the USERVA switch to 3030,
allocating 42 MB of RAM back to the operating system, resolved the memory starvation
problem while providing the maximum amount of memory for use by Exchange.

Use the Front-End Server Closest to the Mailbox Server for Best
Performance
Microsoft IT learned that, when using any of the mobile features over the Internet, the best
performance was achieved when users selected the Exchange front-end servers physically
located closest to their mailbox servers, not the front-end servers closest to their present
locations. For example, when an employee from Australia traveled to the United States, her
online OWA experience was optimized when using the front-end server closest to her
mailbox server. After discovering this, Microsoft IT modified the OWA logon Web page to
include links to all available front-end servers and included instructions on which one to use.

Consider Exchange Universal Resource Locator (URL) Names


Carefully
OMA clients use the same Exchange front-end servers used by OWA. Keying a normal URL
into a Smartphone device can be time-consuming. A long and complex OMA URL is likely
prevent most users from regularly using the service.

Resolving Processor Utilization Bottleneck


Microsoft IT conducted extensive testing in its effort to identify its next server platform. The
eight-processor Xeon 550 MHz server platform used with Exchange 2000 servers had been
running at approximately 80 percent processor utilization during peak load. That figure was
used as the baseline for new system testing.

After extensive testing of various processing platforms, Microsoft IT concluded that it could
gain substantially greater overall system performance by addressing memory bus limitations
rather than processor limitations. Microsoft IT tested a beta version of a four-processor Xeon
Processor MP 1.6 GHz Hyper-Threading-enabled server running on a 400 MHz FSB. The
performance test on this system confirmed Microsoft IT’s assumptions that the processor
utilization never peaked beyond 40 percent. Based on these tests, and to optimize server
performance, Microsoft IT planned the Exchange 2003 server migration around Xeon
processor systems that employ the new, faster FSB technology.

Storage Design Best Practices


Microsoft IT encountered and resolved several issues during the deployment of the new SAN
solutions.

Using Volume Mount Points in Clusters


Microsoft IT’s clustered server configurations used SANs to maximize storage capacity and
improve backup and restore performance. The following points were instrumental in the
successful deployment of SANs with Exchange 2003:

Mount Points were used to eliminate drive letter limitations for supporting the log, SMTP and
backup drives. Volume mount points were introduced with Windows 2000. However,
Windows 2000 did not support volume mount points on NTFS volumes within a cluster.
Windows Server 2003 introduced that feature. Because the lack of available drive letters was
no longer a problem, using Exchange 2003 running on a Windows Server 2003 cluster

Exchange 2003 Deployment and Architecture Page 41


enabled Microsoft IT to associate four SGs with an Exchange server and retain optimum I/O
throughput.

Microsoft IT implemented the same type of infrastructure design as it had with


Exchange 2000. However, each server now only consumes four drive letters instead of 10,
enabling the association of the full four SGs per server and allowing many more servers
within a cluster. The use of volume mount points on Microsoft IT group’s Exchange servers
means that, effectively, four drive letters can support 20 databases instead of 10 drive letters
supporting 15 databases.

Putting Backup Disk LUNs in a Separate Cluster Resource Group


Backup disks are maintained in a separate cluster resource group to allow independent LUN
movement between cluster nodes between the first stage, disk-to-disk backup and the
second stage, disk-to-tape backup.

Note: Windows clusters organize resources into functional units, called resource groups,
which are assigned to individual nodes. If a node fails, the Cluster service transfers the
groups that were being hosted by the node to other nodes in the cluster. This transfer
process is called failover.

Baselining the Peak Period Traffic Rate


Before it could begin designing the new SAN implementation, Microsoft IT needed to
understand what the peak I/O requirements were for the existing Exchange implementations.
The best way for Microsoft IT to gather this data was to log the messaging infrastructure
activity on a series of Monday mornings, the peak period for user messaging activity at
Microsoft. Microsoft IT looked for trends and collected information about what peak period
traffic was like, then added 20 percent, and used this figure as a baseline to plan for future
growth.

Microsoft IT used real-time Performance Monitor (PerfMon), the performance-monitoring tool


built into Windows Server, and trended MOM data to validate key performance counters
during peak periods on some of its busiest production Exchange servers. Specific counters,
as shown in Table 7, were selected to get an understanding of the total number of disk
transfers (disk operations per second) relative to read and write activity against latency. The
key objective was to understand the average disk transfer per mailbox per server, which
trended between 0.6 and 0.8 transfers per second based on 100 MB mailbox limits with
acceptable level of disk latency.

Table 7. PerfMon Counters Used by Microsoft IT to Monitor Disk


Performance with Exchange

Counter Physical Disk MSExchangeIS


Object

Counter Average Disk Reads/Sec RPC Average Latency


names Average Disk Writes/Sec RPC Requests
Disk Transfers/Sec RPC Operations/Sec
Disk Reads/Sec
Disk Writes/Sec

Exchange 2003 Deployment and Architecture Page 42


Other counters used for validation were associated and MSExchangeIS RPC operations.
Microsoft IT was concerned that increasing the number of mailboxes per server could
adversely affect the user experience. These counters were monitored closely to ensure that
RPC latency and outstanding requests were maintained within Exchange product group
recommendations. RPC latency and outstanding requests can be adversely affected because
of slow disk read and write performance.

Additional production validation was done for the Level B Test forest project where Microsoft
IT completed performance analysis on a 5,000 200 MB mailbox cluster, which was trended
towards 1.0 to 1.2 disk transfers per second per mailbox at peak periods. The increase in
disk transfers resulted in poor performance in the form of unacceptable read and write disk
latencies when the server was scaled to support over 2,500 mailboxes. The default SCSI
miniport FCA driver parameter for queue depth was identified as the bottleneck and was
adjusted from a default of 32 to 128. The parameter change allowed Microsoft IT reach the
5,000-mailbox target with better than expected levels of read/write latency, enabling Microsoft
IT to move forward with the decision to make 200 MB mailboxes a standard on all new server
designs.

Evaluating SAN Performance


In the testing phase of evaluating prospective SAN solutions, Microsoft IT increased its
mailbox size limit from 100 MB to 200 MB and increased the number of mailboxes from 3,000
to 5,000 on a single Exchange server on new server hardware and new storage hardware.
Microsoft IT was looking to find the level of performance that was possible using these new
hardware platforms. Microsoft IT first encountered very large read/write latencies (40 to 50
milliseconds) on the data LUNs when the number of mailboxes was scaled above 2,000 on a
single server. Microsoft IT testing revealed that the default Host Bus Adapter (HBA)
parameter setting on the SAN constrained its performance. Microsoft IT reset the default
Queue Depth parameter setting from 32 to 128, which improved the read latency to
12 milliseconds and write latency to 2 milliseconds at peak load, thereby resolving the SAN
performance problem.

Migrated from Gigabit to 100 Mbps Ethernet


On the old SANs used with Exchange 2000, Microsoft IT used Gigabit Ethernet to maximize
network throughput on stand-alone Exchange servers during the backup process. Each of
these servers typically contained 200 to 300 GB of data. Once Microsoft IT started using
clusters, it was no longer solely dependent upon network throughput capabilities to process
the disk-to-tape backup. Instead, Microsoft IT now uses the alternate passive nodes in each
cluster to push the backup data through to the tape libraries by means of direct fibre
connections.

Microsoft IT’s experience with Gigabit Ethernet showed a gradual trend of network adapter
performance degradation. The administration effort required to manage and resolve the
degradation was quite time and resource consuming. Once the use of clusters with fiber-
attached libraries eliminated Microsoft IT’s dependency upon extremely fast network
throughput, Microsoft IT simplified the server maintenance effort by replacing the Gigabit
Ethernet network adapters with 100 Mbps Ethernet network adapters. These adapters
provide more than enough network performance capacity to meet the Exchange server
requirements (normal network utilization typically peaks at around 20 percent of capacity)
since the network itself was no longer a bottleneck for backup throughput. Moreover, the
100 Mbps Ethernet adapters required much less maintenance overhead.

Exchange 2003 Deployment and Architecture Page 43


Management and Monitoring Best Practices
Microsoft IT’s experience in learning to manage Exchange with MOM produced some
valuable experience that is applicable to other organizations.

Client-Side Monitoring
The use of Outlook 2003 and Exchange 2003 together enables the gathering of valuable
client-side performance monitoring data. Outlook 2003 collects client messaging performance
data, including messaging system successes, failures, and latencies, and reports it to the
Exchange 2003 mailbox server. The Exchange 2003 server pools the client performance
information for its mailboxes and makes that data available to the Performance Monitor tool
as well as storing it in the server’s event logs. Using MOM from the Exchange 2003
Management Pack, Microsoft IT accesses that information from the server event logs to
provide reports and, if necessary, generate alerts when problems arise. Microsoft IT uses the
data gathered by MOM to investigate client-side outages and report performance metrics on
client performance and availability. While MOM reports are based on consolidated client
data, Microsoft IT also uses WMI scripts to get more detail about the messaging client
performance of smaller groups such as those in remote offices on the WAN that have been
consolidated from a local server onto a regional server.

Disable Event Log Replication with Clusters


When Microsoft IT began monitoring Exchange in a clustered environment, they discovered
that, for each event collected, they received as many notifications as they had nodes in the
cluster. This was the result of event log replication. As a best practice, Microsoft IT disabled
event log replication in its Exchange cluster nodes.

Monitoring Backups on Remote Servers


With regard to monitoring backups on remote, regional Exchange servers, Microsoft IT used
the MOM Exchange Management Pack script that checks the date stamp of the transaction
log. If the date is older than 24 hours, it is an indication that the previous night’s backup did
not complete successfully.

Mail Flow Analysis


Microsoft IT used the MOM Exchange Management Pack script that performs mail flow
analysis by using a hub and spoke model to monitor the time taken for a test e-mail sent from
headquarters to be received by all of the regional data centers. The delta between the time
sent and the time received determines how quickly e-mail is delivered. If the time spans more
than five minutes, Microsoft IT configured MOM to generate an alert notification.

Custom Rules
With the default MOM Management Packs, the level of granularity for a threshold breech of
any particular monitored event did not correspond to all the various server configurations
used by Microsoft IT. For example, a small configuration regional server in Bombay, India,
supporting 100-200 mailboxes may not trigger an alert for a problem indicator configured for
the threshold of a data center configuration server in headquarters supporting over 4,000
mailboxes. When creating a custom rule, Microsoft IT disables the rule in the default
Exchange Management Pack, copies those rules into its own custom management pack, and
creates multiple child processing rule groups. Those rule groups define differing threshold
levels to meet the specific needs of each server configuration in the Microsoft IT messaging
infrastructure. This practice preserves the original rules for easier upgrade.

Exchange 2003 Deployment and Architecture Page 44


Operational Best Practices
The deployment of Exchange 2003 into the Microsoft IT messaging infrastructure was a
relatively simple transition and generated a few notable, operational best practices.

Backup Throughput Adjustment


Microsoft IT discovered a way to more than double disk-to-disk backup rates using the
Windows Backup utility through a registry adjustment. The adjustment increased average
throughput from 600 MB per minute to 1,200 MB per minute per SG. The adjustment is set
on the user profile that Microsoft IT uses to execute the backup script
(HKEY_CURRENT_USER).

Microsoft IT runs two concurrent backup jobs per active Exchange instance, providing an
aggregate data throughput rate of approximately 2.4 GB per minute per server, with two to
three servers per SAN enclosure (depending on headquarters data center or regional
design). Microsoft IT has monitored maximum throughput without excessive read and write
disk latencies at approximately 6.3 GB per minute per SAN enclosure. Throughputs are
dependent on LUN distribution across controllers with Data, Log, and Backup LUNs per SG
assigned per controller.

The mode used for optimized throughput is:

• SG 1 and 2 – Data, Log, and Backup on Controller one


• SG 3 and 4 – Data, Log, and Backup on Controller two
• Job concurrency limited to two per server with SG 1 and SG 3 run concurrently followed
by SG 2 and SG 4.
• RAID:
• Target LUNs for backup were RAID-5
• All RAID-5 LUNs with write back cache disabled.

Note: RAID-1 targets would provide better throughputs, and Microsoft IT is currently
considering them as an option for use with the 146 GB disks for the first stage backup (disk-
to-disk).

Managing Transaction Logs


Microsoft IT found that increasing the number of mailboxes per servers also increases the
number of transaction logs per server. The time it takes to replay transaction logs significantly
impacts the time it takes to restore a server. As a best practice, calculate the time to replay
the logs, monitor the average number of logs per day, and then adjust recovery plans
accordingly.

Backup Synchronization
Now that clusters are in place, before Microsoft IT activates the daily backup scripts (at
8 P.M. local time), it verifies that each virtual instance of Exchange is active on its predefined
node. If any nodes have been moved, Microsoft IT must either move it back to the proper
node or configure the automated scripts that run the backup process to run on the passive
node; otherwise, the scheduled backup processes, set for each physical active node server,
will fail for the server that was moved.

Exchange 2003 Deployment and Architecture Page 45


CONCLUSION
Enhancements to the Exchange Server 2003 platform, especially when combined with
enhancements to Windows Server 2003 and Office 2003, enabled Microsoft IT to redeploy
Exchange worldwide in a consolidated, fully clustered environment, using advanced SAN
technology in all locations, and providing Microsoft employees with improved service. More
user mailboxes were added to each server as part of the ongoing server and site
consolidation efforts. Advanced SAN technology enabled Microsoft IT to double the space
allocation for all user mailboxes and to double the size of allowed attachments without
compromising service availability or backup/restore SLAs. By clustering servers attached to
SANs, Microsoft IT significantly improved server availability and streamlined its backup and
restore methodology. The use of MOM improved Microsoft IT’s ability to monitor and maintain
the messaging infrastructure. Exchange Server 2003 has significantly improved the
messaging services Microsoft IT provides to its customers, the employees of Microsoft.

Exchange 2003 Deployment and Architecture Page 46


FOR MORE INFORMATION
For more information about Microsoft products or services, call the Microsoft Sales
Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information
Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact your
local Microsoft subsidiary.

To access information via the World Wide Web, go to: http://www.microsoft.com.

A series of related papers and case studies is available at:


http://www.microsoft.com/technet/itshowcase.
The information contained in this document represents the current view of Microsoft Corporation on the issues
discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it
should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the
accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Microsoft grants you the right to
reproduce this White Paper, in whole or in part, specifically and solely for the purpose of personal education.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights
covering subject matter in this document. Except as expressly provided in any written license agreement from
Microsoft, the furnishing of this document does not give you any license to these patents, trademarks,
copyrights, or other intellectual property.

Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses,
logos, people, places and events depicted herein are fictitious, and no association with any real company,
organization, product, domain name, email address, logo, person, place or event is intended or should be
inferred.

© 2003 Microsoft Corporation. All rights reserved.

Exchange 2003 Deployment and Architecture Page 47

You might also like