Backup in A SAN Environment
Backup in A SAN Environment
Applications that require the transfer or movement of large amounts of data are prime candidates for
SAN.
These applications may refer to horizontal applications (e.g., backup, archiving, data replication,
disaster protection, and data warehousing) or vertical applications (e.g., online transaction
processing (OLTP), enterprise resource planning (ERP) business applications, electronic
commerce, broadcasting, prepress, medical, and geophysics). SAN is also well suited to making
performance and high availability more scalable and more affordable in applications such as
clustering and data sharing. This article discusses two major horizontal applications, backup and
data sharing, and how they interact with SAN.
Backup in a SAN Environment
One of the first applications that users want when implementing SAN is to be able to back up
and protect their data through the SAN. They want to offload heavy backup traffic from the
LAN, free system bandwidth for production operations, and gain the speed and security
advantages of centralized management that SAN offers.
Effectively protecting data on a SAN requires a number of elements. Many of them are currently
in the early stages of implementation. These items include:
Centralized management
Support for sharing removable-media libraries
LAN-less and server-less backup
Heterogeneous platform support
Remote vaulting and mirroring
Realtime backup
Centralized management: Ideally, a central console would manage all the logical and physical
storage resources of an enterprise network. The console would automatically collect, correlate,
and analyze capacity, configuration, use, and performance information on all storage resources.
The logical resources monitored would include file systems, directories, files, and application-
specific storage repositories. The physical resources tracked would include disks, RAID systems,
tape libraries, optical jukeboxes, Fibre Channel components, Network Attached Storage (NAS),
and SAN switches and hubs. Nearly every vendor offers some degree of centralized
management. The leaders in this area are Veritas, Legato, Computer Associates (CA), and IBM.
Support for sharing removable-media libraries: Performing backups often involves backing
up many different servers to locally attached tape drives. One benefit of SAN and NAS
connectivity is the ability to share resources (e.g., a large tape library) among multiple backup
servers. Shared resources enable administrators to consolidate backups into one tape library.
However, the support must extend beyond simple connectivity to a library and into management.
Managing a library means managing access to the media stored within it and requires dynamic
drive allocation among servers, so the server that needs a drive most at a given time can get it
(e.g., when recovering a large database). Managing a library involves managing not just backup
but any application that might need access to tape or optical storage.
In many cases, the ability to connect a library to multiple backup servers via the SAN will justify
the expense of automation. In this environment, Hierarchical Storage Management (HSM)
becomes economically desirable. Legato, Veritas, CA, and Seagate Software are the leaders in
developing shared tape-library support.
LAN-less and server-less backup: Backup is evolving in three phases when it comes to data
movement. Currently-the first phase-data moves from the disk, to the server it directly connects
to, through the LAN, to another server that, in turn, transfers data to the tape. In the second
phase, SAN lets you perform backup outside the LAN. Data moves from the disk to the server,
which retransmits it through the SAN to a SAN-connected library. This setup is sometimes
called LAN-less backup . In the third phase, the server initiates the backup command. Data
moves directly from disk to tape through the SAN fabric without further involving the server or
the LAN. This configuration is called server-less backup. Intelliguard, which Legato recently
acquired, has led the development of server-less backup.
Heterogeneous platform support: Early SAN implementations are generally homogeneous. As
SAN environments mature, they will become more heterogeneous. Effective SAN management
software will need to be able to manage any vendor's server communicating with any vendor's
storage, hosting any database, application, or file system, backing up to any tape drive or library,
through any switch, hub, router, or bridge. EMC and Veritas are examples of vendors supporting
heterogeneous platforms.
Remote vaulting and mirroring: The connectivity distances that Fibre Channel allows-10 to 20
km., depending on usage-make it easier to deploy remote sites for business comtinuance and
disaster recovery purposes. Use of remote backup, remote vaulting, and remote mirroring
techniques are likely to increase due to this capability. SANs can also connect to WANs to
achieve additional levels of connectivity and protection. CommVault is one of the vendors
offering remote vaulting capability. CNT offers a SAN-to-WAN solution in SCSI connectivity
and Enterprise Systems Connectivity (ESCON), and is also developing support for remote Fibre
Channel.
Realtime (or window-less) backup: The importance of window-less backup (also called hot
backup) becomes obvious when it addresses the large volume of data in a SAN centralized
backup library. Realtime backup essentially lets you back up a volume or file periodically and
automatically without affecting normal system operations. The technique commonly used is
called a snapshot, where you make a copy of the volume needing backup, and then back up the
copy while accessing and modifying the original volume in normal operations. Network Integrity
leads in development, and EMC and HDS have implemented solutions in currently available
products.. Major providers of total backup solutions include ADIC, ATL, StorageTek, Hewlett-
Packard (HP), Exabyte, and Overland.
Resource and Data Sharing
In a heterogeneous environment where platforms are by definition different, the distinction
between resource sharing, data copy sharing, and true data sharing must be made.
Resource sharing: A storage subsystem attached to multiple computer platforms is divided into
partitions, each partition being accessible only to its owning platform or to a certain number of
homogeneous platforms. The administrator can reassign storage capacity to different platforms
as needs change. One of the benefits of SAN connectivity is its ability to share resources (e.g., a
large tape library) among multiple backup servers. Such sharing enables administrators to
consolidate backups-from many different servers to locally attached tape drives-into one tape
library.
Dynamic resource sharing: All storage is available to any connected host; hosts are allocated
storage as they need it. If one host needs the storage, it can use any or all the available space. If a
host deletes a file, that space is available to any other host. This dynamic storage sharing
operates automatically and transparently. Dynamic resource sharing means that the systems
administrator doesn't have to partition the storage before storing the data.
Data copy sharing: This process involves replication of the data. Data is the same across copies
at the time of copy creation, but the copies can change independently afterward. There is no
assurance that they will remain identical. Data access is usually prevented during replication so
the copy accurately reflects all the data at a particular time. For large amounts of data, the time
needed to copy it may be important, , and the amount of storage necessary to store the copy
could be very large. SAN facilitates data-copy sharing by allowing high-bandwidth connections
to transfer large volumes of data.
True data sharing. If you are sharing data without making a copy, multiple computer platforms
can access the same physical instance of the recorded data on a storage subsystem. This type of
sharing is called true data sharing. Different levels of performance and complexity exist in
implementing true data sharing: The first level is when heterogeneous platforms can access data,
but only the original data owner can modify it. The second level is when multiple heterogeneous
platforms can update and rewrite a data item, but only one at a time. In this case, you must use a
locking mechanism to momentarily prevent a platform from updating the data. The third level is
called concurrent data sharing and exists when all platforms can either read or update the data at
the same time. The advantages of true data sharing are numerous. With only one copy of data,
you never need to replicate the data for use elsewhere, you simplify data maintenance, and you
eliminate problems due to out of sync conditions. True Data Sharing among platforms running
heterogeneous operating systems requires translating to one common operating system (see File
management discussion under SAN Management Software on page XX). Examples of vendors
offering implementations of true data sharing in a SAN architecture are Sequent, Mercury
Computer Systems, DataDirect, Transoft, Retrieve, and Network Disk. In a NAS architecture,
NetApp, EMC, Sun, IBM, and Procom offer true data sharing solutions. ...Peripheral Concepts
profile, SAN