SQL Server 2008 R2 High Availability Architecture White Paper
SQL Server 2008 R2 High Availability Architecture White Paper
Disaster Recovery
SQL Server Technical Article
Technical Reviewers: Darmadi Komo, Vineet Rao, Gopal Ashok, Kimberly L. Tripp
(SQLskills.com)
Published:May 2010
Applies to:SQL Server 2005, SQL Server 2008, SQL Server 2008 R2
Summary:
This whitepaper describes five commonly-deployed architectures using SQL Server 2005 and
SQL Server 2008that are designed to meet the high-availability and disaster recovery
requirementsof enterprise applications. The whitepaper will describe the architectures and also
present case studies that illustrate how real-life customers have deployed these architectures to
meet their business requirements.
This whitepaper is targeted at architects, IT Pros, and senior database administrators tasked
with architecting a high-availability and disaster-recovery strategy for their mission-critical
applications. It assumes the reader has a good understanding of Windows and SQL Server
technologies and has sufficient knowledge of transaction processing. These basic features and
topics are not covered.
Proven SQL Server Architectures for High Availability and Disaster Recovery
Copyright
The information contained in this document represents the current view of Microsoft Corporation
on the issues discussed as of the date of publication. Because Microsoft must respond to
changing market conditions, it should not be interpreted to be a commitment on the part of
Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the
date of publication.
This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES,
EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the
rights under copyright, no part of this document may be reproduced, stored in, or introduced into
a retrieval system, or transmitted in any form or by any means (electronic, mechanical,
photocopying, recording, or otherwise), or for any purpose, without the express written
permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual
property rights covering subject matter in this document. Except as expressly provided in any
written license agreement from Microsoft, the furnishing of this document does not give you any
license to these patents, trademarks, copyrights, or other intellectual property.
Microsoft, <plus, in alphabetical order, all Microsoft trademarks used in your white paper> are
trademarks of the Microsoft group of companies.
2
Proven SQL Server Architectures for High Availability and Disaster Recovery
Contents
Introduction and Overview........................................................................................................3
Failover Clustering for High Availability with Database Mirroring for Disaster Recovery..............4
Deployment Example: CareGroup Healthcare System.............................................................5
Database Mirroring for High Availability and Disaster Recovery..................................................7
Deployment Example: bwin Corporation..................................................................................9
Geo-Clustering for High Availability and Disaster Recovery......................................................11
Deployment Example: QR Limited.........................................................................................11
Failover Clustering for High Availability Combined with SAN-Based Replication for Disaster
Recovery................................................................................................................................... 13
Deployment Example: Progressive Insurance........................................................................14
Peer-to-Peer Replication for High Availability and Disaster Recovery.......................................15
Deployment Example: An International Travel Industry Company..........................................16
Conclusion................................................................................................................................. 17
It is imperative that the high-availability and disaster-recovery requirements of the business are
the drivers when evaluating whichtechnologies are suitable as part of the architecture. The two
majorbusiness needsto consider are:
3
Proven SQL Server Architectures for High Availability and Disaster Recovery
The ability to accept potential data loss from an outage (i.e. the defined Recovery Point
Objective—RPO).
However, there is a lack of information regarding proven architectures and real-life customer
deployments, where the high-availability and disaster-recovery architecture was chosen after
careful requirements analysis and technology evaluation.
This whitepaper provides a consolidated description of five proven and commonly deployed
high-availability and disaster-recovery architectures, in terms of the technologies used and the
business requirements they are able to meet.
Together these two whitepapers will provide the information necessary to allow the design of an
appropriate and successful high-availability and disaster-recovery architecture.
Database mirroring is one way to provide a redundant copy of a single database on a separate
physical server, where the server can be in the same data center or geographically separated.
This architecture is widely adopted by customers who are familiar and comfortable with the
installation, configuration, and maintenance of failover clusters.
A typical implementation of this architecture involves a failover cluster in the primary data center
with database mirroring to a secondary data center or disaster-recovery site, as shown in Figure
1 below.
4
Proven SQL Server Architectures for High Availability and Disaster Recovery
There are a number of variations and configuration options for this architecture depending on
the business requirements, including the following:
1. Each data center has a failover cluster with database mirroring between them. If the
business requirements state that the workload performance should not be impacted after
a failover to the secondary data center, the mirror server needs to have the same
hardware configuration (and hence workload servicing capability) as the failover cluster
in the primary data center. The alternative, of course, is to have a less capable stand-
alone server as the mirror server—however, this is not a recommend best practice.
2. Synchronous vs. asynchronous database mirroring. Synchronous database mirroring
can allow a zero data-loss requirement to be met, potentially with some workload
performance impact depending on the type of workload and the network bandwidth
between the two data centers. Asynchronous database mirroring does not guarantee
zero data loss in the case of a disaster, but has no impact on workload performance.
3. Automatic client connection to the secondary data center. If explicit client redirection is
used, the client specifies the FAILOVER_PARTNER in the connection string. After a
database mirroring failover has occurred, the clientsimply has to reconnect and the
connection will automatically be made to the secondary data center. Alternatively, some
form of external routing can be used (some installations have used DNS routing, for
instance).
The RPO and RTO requirements for their databases depend on the importance of the data
contained within the database. CareGroup defined three tiers to classify this:
5
Proven SQL Server Architectures for High Availability and Disaster Recovery
CareGroupalso wanted to remove the need to hard-code the database mirroring partner server
names in the application connection string to redirect client connections during a disaster
recovery failover.
Using these requirements, they were able to determine that a combination of SQL Server
failover clusters in two data centers with database mirroring between the data centers was the
appropriate solution. For the ‘AAA’ databases, database mirroring is configured synchronously
to avoid data loss, and for the lower-classed databases it is configured asynchronously. In the
event of a failure, DNS routing is used to redirect traffic to the secondary data center.
The Global Site Selector (GSS) enables the various applications at CareGroup to seamlessly
connect to the appropriate database mirroring principal server, without having to specify partner
server names in the connection string for the client redirection. This is necessary as some of the
applications that CareGroup uses are from 3rd-party vendors that do not permit (or require too
much work for) the client connection string to be altered to use explicit client redirection.
Instead, the applications specify one SQL Server instance name in the connection string, of the
form “Green\SQL1”. In this connection string where the server name “Green” is a DNS alias that
resolves to the GSS device, which in turn translates the alias “Green” into the appropriate IP
address of the current database mirroring principal server.
Usingthis architecture, CareGroup was able to meet their availability requirements, including
performing an upgrade to SQL Server 2008 using database mirroring that only involved a few
minutes of downtime.
6
Proven SQL Server Architectures for High Availability and Disaster Recovery
As an aside, by upgrading to SQL Server 2008, CareGroup can also take advantage of some of
the other features in the product:
http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?
casestudyid=4000001003
Another example of this architecture is described in the case study of the deployment by
ServiceU Corporation, available at:
http://sqlcat.com/whitepapers/archive/2009/08/04/high-availability-and-disaster-recovery-
at-serviceu-a-sql-server-2008-technical-case-study.aspx
If a failure occurs, the mirror database can be brought online as the new principal database and
client connections can be failed over. As long as the mirror database remains synchronized with
the principal database, zero data loss results when a failoveris necessary.
There are a number of variations and configuration options for this architecture depending on
the business requirements, including the following:
1. Configuring a third server, the witness. When a witness server is included as part of a
synchronous database mirroring architecture, a failover can be performed automatically
when a failure is detected, providing the highest availability of the data.If database
mirroring is used between two data centers, it is recommended to place the witness in a
third data center, for the highest availability.
7
Proven SQL Server Architectures for High Availability and Disaster Recovery
2. Configuring asynchronous database mirroring. When the network link between the
principal and mirror servers is not sufficient to synchronously send the transaction log
records without leading to workload performance degradation, database mirroring can
be configured to send the transaction log records asynchronously. While this removes
the performance degradation, it also removes the assurance of zero data-loss if a
failover is necessary. This may be perfectly acceptable depending on the desired RPO.
3. Configuring database mirroring and log shipping. Database mirroring allows a single
mirror of the principal database, so for added redundancy, one or more log shipping
secondary servers can also be configured as warm-standby databases.
This architecture is typically lower cost than one involving failover clustering, as the principal
and mirror servers can be standalone servers with direct-attached storage, rather than each part
of a multi-server failover cluster with SAN storage. It is most commonly used when the business
requirements call for databases to be protected for disaster recovery purposes and for some
businesses, when there is some technical or operational reason for not using failover clustering.
A typical implementation of this architecture involves a principal server in the primary data
center with a mirror server in a secondary data center or disaster-recovery site. There is often a
third server, the witness, included in the architecture as shown in Figure 3 below.
8
Proven SQL Server Architectures for High Availability and Disaster Recovery
They wanted to be able to cope with complete loss of their primary data center, and their budget
allowed them to implement a solution which meets their business requirements. They also want
zero data-loss and 99.99% availability 24x7. The solution they chose involved synchronous
database mirroring over dark-fiber between two data centers that are 11 kilometers apart. They
also maintain two log shipping secondaries—one in each data center. The log shipping
secondary in the main data center is configured with 1-hour restore delay to allow recovery from
accidental user errors (such as delete or update).
9
Proven SQL Server Architectures for High Availability and Disaster Recovery
This architecture was deployed on SQL Server 2005 and enabled bwin to meet all their
business requirements around high availability and disaster recovery, while also being able to
service their peak workload. Bwin plans to upgrade this architecture in future to add a database
mirroring witness server to allow automatic failovers.
After moving to SQL Server 2008, bwinis planning to take advantage of some of the new
features in the product:
More information on bwin’s testing an migration to SQL Server 2008 can be found at:
http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?
casestudyid=4000001470
If the servers fail in the main data center, the SQL Server instances are started in the secondary
data center in a manner similar to when the servers are collocated and the clients reconnect in
the same way as for a failover of a regular failover cluster (and vice-versa). To achieve this it is
often necessary to use a very fast network link (like dark fiber) and a network configuration that
abstracts the physical location of the cluster nodes from the clients.
The cluster nodes themselves are unaware that they are part of a geo-cluster so all replication
must be handled at the storage level.If the data disks are synchronously mirrored between sites,
then zero data-loss will occur if a failover is necessary, but requires sufficient network
bandwidth.
This architecture is deployed when seamless failover of an entire SQL Server instance is
required between multiple data centers, avoiding the potential downtime of having to perform a
disaster recovery operation.
A typical implementation of this architecture involves the main failover cluster nodes in the
primary data center with the other failover cluster nodes in the secondary data center or
disaster-recovery site, as shown in Figure 5 below.
10
Proven SQL Server Architectures for High Availability and Disaster Recovery
QR Limited migrated their SAP databasesfrom a legacy mainframe onto a SQL Server 2005
and wanted to provide high availability and disaster recovery capabilities for the various SAP
databases and the one terabyte ERP database, but with the ability to seamlessly protect against
loss of a data center without having to perform protracted disaster recovery.
They chose to implement a geo-cluster between two data centers 5 kilometers apart, with a fiber
link between them to accommodate the SAN replication network traffic and all client
communications to the active cluster nodes. The data disks are synchronously from the
production data center to the disaster recovery data center.
11
Proven SQL Server Architectures for High Availability and Disaster Recovery
By switching from mainframe-based DB2 to SQL Server 2005, they realized the following
additional benefits to their enhanced high availability and disaster recovery:
http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?
casestudyid=4000003421
12
Proven SQL Server Architectures for High Availability and Disaster Recovery
If the main data center is lost, there is no automatic failover of a SQL Server instance to the
server in the secondary data center, but there is a redundant copy of the databases that can be
mounted and attached to Windows and to a SQL Server instance.
This architecture is often used when a business requires that databases from different vendors,
used by related but distinct applications, be logically consistent to maintain data integrity in the
case of a disaster.
A typical implementation of this architecture involves a failover cluster in the primary data center
with SAN-based replication of the storage used by the various SQL Server instances to a SAN
in the secondary data center or disaster-recovery site, as shown in Figure 7 below.
There are a number of variations and configuration options for this architecture depending on
the business requirements, including the following:
13
Proven SQL Server Architectures for High Availability and Disaster Recovery
independent insurance agencies. When fully deployed, the total data size will be 10 terabytes
and the largest table will have almost 2 billion rows.
As well as replacing the legacy application, Progressive required no more than 1 hour of data
loss and a maximum allowable downtime of 24 hours.
Progressive chose to use failover clusters in two active data centers for local high availability,
with asynchronous SAN replication between them to provide data redundancy in the event of a
disaster.
The SQL Server 2005 architecture that Progressive deployed is illustrated in Figure 8 below.
The OC-48 links provide 2.5 gigabits per second and are shared with other Windows servers
and mainframe to provide asynchronous replication between the EMC SymmetrixDMX 3 and 4
series SANs.
Progressive is also making use of the following SQL Server 2005 features to enhance
availability:
http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?
casestudyid=4000002133
14
Proven SQL Server Architectures for High Availability and Disaster Recovery
Peer-to-peer replication involves some latency between a transaction committing on one node
and the change being replayed on all other nodes in the replication topology, so it is not suitable
for satisfying zero data-loss requirements. It also does not provide automatic detection of
failures or automatic failover. It does, however, allow multiple copies of the protected data to be
made, and furthermore, those copies are available for read and (with a lot of planning and care)
write activity.
This architecture is used when the secondary data copy is required to be available for reading
or writing, and/or when multiple copies of the data must be maintained.
A typical implementation of this architecture involves a peer-to-peer node in each data center,
with updates occurring and being received by all other nodes in the other data centers, as
shown in Figure 9 below.
15
Proven SQL Server Architectures for High Availability and Disaster Recovery
They wanted to remove the single point of failure—the data center in Asia—by having all data
available at both data centers, and either data center able to handle write requests. They chose
to implement a combination of peer-to-peer replication as well as traditional transactional
replication to use the disaster-recovery hardware to process the read-only workload.
Database mirroring and log shipping were not options as both data centers had to be able to
handle write requests—which neither technology permits. Failover clustering was similarly
discounted, and also because of a desire to limit the capital expenditure on hardware.
The architecture that the travel company deployed is illustrated in Figure 10 below.
The SQL Server Customer Advisory team worked closely with this customer to produce a very
detailed whitepaper describing the requirements analysis, technology analysis, replication
solution design, and testing strategy. It is available at
16
Proven SQL Server Architectures for High Availability and Disaster Recovery
http://sqlcat.com/whitepapers/archive/2009/09/23/using-replication-for-high-availability-and-
disaster-recovery.aspx.
Conclusion
This whitepaper has highlighted five commonly deployed high-availability and disaster-recovery
architectures using SQL Server technologies, along with examples of real-life customer
deployments of these architectures.
The high-availabilityand disaster-recovery technologies providedin SQL Server 2005 have been
further enhanced in SQL Server 2008. It is very important to select architecturesafter carefully
considered business requirements, and then deploy the technology tomeet those requirements.
It can be tempting to select a new and interesting (or possibly incumbent) technology,
regardless of the business requirements, but that can be counterproductive in the long run.
It can be very useful to review published reference implementations from SQL Server
customers, both to see what technology choices worked for the customers’ requirements, and
also to potentially learn from their experiences.
Finally, while SQL Server 2005 provides all the technologies needed to implement a successful
high-availability and disaster-recovery architecture, SQL Server 2008 has many enhancements
to these technologies, and includes many others that can aid with security, maintainability, and
performance
The information presented in this whitepaper, and in those to which it links, should provide a
basis for anyone tasked with evaluating and choosing SQL Server 2008 technologies, with the
goal of protecting and increasing the availability of critical business data.
http://www.microsoft.com/sqlserver/2008/en/us/high-availability.aspx
http://sqlcat.com/tags/Availability/default.aspx
http://blogs.msdn.com/psssql/
http://blogs.technet.com/dataplatforminsider/default.aspx
http://www.sqlskills.com/blogs/paul
http://www.sqlskills.com/blogs/kimberly/
http://www.sqlha.com/blog/default.aspx
Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5
(excellent), how would you rate this paper and why have you given it this rating? For example:
Are you rating it high due to having good examples, excellent screen shots, clear writing,
or another reason?
Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing?
This feedback will help us improve the quality of white papers we release.
17
Proven SQL Server Architectures for High Availability and Disaster Recovery
Send feedback.
18