OpenText Archive Server WHITEPAPER (Why Archiving Matters)
OpenText Archive Server WHITEPAPER (Why Archiving Matters)
January 2010
Abstract
Faced with today’s climate of strict compliance and demands for maximizing ROI,
archiving matters to your organization more than ever. Using a purpose-built solution
such as OpenText Archive Server, part of the OpenText ECM Suite, allows for quick and
easy retrieval, sharing, forwarding, and reuse of content. With OpenText Archive Server,
your organization can electronically archive its content in ways that meet regulatory
requirements. Large volumes of fixed content can be stored and retrieved efficiently, and
users have access to the content they need in the correct business context.
This whitepaper explains why, with the need for greater efficiency and compliance,
archiving really does matter to your business. It introduces OpenText Archive Server, and
it describes the features that make Archive Server a world-class archiving solution.
Conclusion ................................................................................................................ 53
OpenText Archive Server, which is included as part of the OpenText ECM Suite, allows for
quick and easy retrieval, sharing, forwarding, and reuse of content. With OpenText
Archive Server, your organization can electronically archive its collective memory
permanently and in auditable form. And your content is put into context—linking the
documents with your unique business processes. Large volumes of fixed content for
international applications such as call centers, and for external customer-facing
applications such as Web-enabled bill presentment, are stored and retrieved using
OpenText Archive Server.
Figure 1:
ECM Services Architecture
• Archiving forms the basis for records management (automating the management of
record archiving and retention policies).
• Archiving saves costs in the IT department, such as for archiving transactional data
from business applications, legacy system decommissioning, consolidation of
archiving landscape, and file system archiving.
• Archiving preserves your company’s knowledge and makes accessible such vital
items as construction drawings and drug development documents. Archiving also
enables review of previous work when starting new projects.
• Backup: The function of backup is to create a duplicate copy of primary data in order
to protect that data against loss due to hardware failures, user errors, or data
corruption. With backup, a copy of production data is stored in a low-cost format such
as tape and often warehoused offsite. Retrieval of historical data from tape backup is
a slow, tedious process. Typically backup covers only a limited period of time, for
example, one to three years.
Records management legislation governs the content of information in question but not
how it is stored, communicated, or conveyed. That means all forms of electronic
communication, such as email and instant messaging, are covered. However, due to the
Failing to archive email information can lead to major losses. For example, a major
tobacco manufacturer was fined $2.75 million U.S. in August 2004 because key
executives there did not comply with a court order to retain emails relevant to pending
litigation.
Customer examples
Here are just a few of the high-volume document management customers of OpenText
who have benefited from using Archive Server:
Customer Highlights
Typical scenarios for such standard business processes include the following:
• Accounts Payable processing means dealing with the inevitable volume of related
paperwork. Because manual processes are resource and time-intensive, they
increase costs, create employee inaccuracies, and decrease efficiency.
• Human Resources departments need to store employee records for many years for
active and retired employees. Paper-based records require huge efforts for manual
processing, and many employee records are incomplete or faulty. Paper-based
storage, access, and manual routing result in high costs, long processing times, and
decisions based on incomplete or faulty information. The average annual cost of
manually handling one employee record is $15 to $30 US; retrieving a misfiled
document can cost as much as $120 US.
• During the sales process, failing to have the complete customer folder available leads
to fewer opportunities to leverage cross-selling.
• Self-service scenarios could save the company much of the cost of sales if customers
were provided access to such information as complete delivery and invoice history. In
the case of a pharmaceutical wholesale company, the use of self service reduced
customer inquiries substantially.
• Other standard business processes that inherently deal with manual paper
processing include Contract Management, Quality Management, Customer
Complaint Management, and Product Lifecycle Management.
The Solution
Archive Server, a core component of the OpenText ECM Suite, provides services for
effectively taking in content, integrating content into leading applications, and
functionalities for securing and auditing content and its access. All these basic services
are mandatory for making business processes faster. Handling paper documents
electronically speeds up processes by giving users a way to instantly access any
business document—no matter how, where, or when!
For example, Archive Server is integrated in leading applications like SAP or Groupware
(Microsoft Exchange®, Lotus Notes), stand-alone or other applications—ideal for back-
office processes. Archive Server also allows for geographically independent access, in
the case of larger companies. And distributed companies benefit since Archive Server
enables online, parallel access 24 hours a day, 7 days a week.
• Standard business processes with a high throughput of documents profit the most
from Archive Server. Faster Accounts Payable processing ensures vendor discounts
and good vendor relations. Automatic invoice capturing reduces transaction costs by
half (typically $4.50 US per transaction).
• Faster payment collection in Accounts Receivable means that invoice disputes can be
clarified faster. This reduces Days Sales Outstanding and decreases bad debts, so
the need for bank credits is decreased.
• Employee records in HR are stored securely, completely, and without the need of a
physical storage place. Administration processes like applicant processing runs faster
• Incoming order processing works quickly and along a standardized workflow. This
guarantees fast and standardized reaction on incoming orders, such as ensuring
same-day delivery.
Functionalities
Integrated into standard processes
Many standard processes are covered by Enterprise Applications like SAP. OpenText
ECM solutions integrate into these applications and speed up processing through instant
access to all relevant documents in the context of the business transaction in the leading
ERP/CRM system. Details on how Archive Server integrates with leading business
applications are described in the last section of this white paper, covering the following:
Enterprise-wide deployments
Standard processes do not stop at the border of a company’s site; they run across the
whole enterprise globally or involve partners. Many companies set up shared service
centers, such as for Accounts Payable processing. Such centers require that the
underlying business document technology can be accessed across many business sites
and even countries. Also 24-hour access is mandatory when business takes place
worldwide.
Details on how Archive Server can be deployed globally are described in the last section
of this white paper:
High-volume management
The fastest ROI results from using ECM in mission-critical business processes, which is
also where masses of transactions take place. Archive Server is prepared for this by
supporting priority document volumes.
For example, the core processes of a retail company are buying and selling. Some
customers receive as many as 60,000 invoices per day and generate over 8 TB of data
every quarter. In the insurance industry, millions of documents need to be stored for many
years and still be instantly accessible, such as during claims processing.
Details on how Archive Server supports high-volume management are described in the
last section of this white paper:
• Industry standards
As a result of these rules, companies across all industries face an increasing need to
make compliance an integrated part of their document management processes.
• Are you appropriately storing electronic documents that may contain evidence for a
future dispute?
• Can you be certain that electronic documents will meet admissibility standards and
requirements in the event of litigation or a regulatory audit?
• Are you following established and proven best practices for electronic documents
retention?
Accepted guidelines and best practices on electronic document handling define how
electronic documents should be handled to ensure evidential weight both in court and
when under regulatory audit.
The Solution
Archive Server provides a host of functionalities to address compliance requirements
associated with electronic document retention for many regulations, including SEC,
Sarbanes-Oxley, FDA, GOB, and GDPdU. Most regulations place strict requirements on
corporations to manage content not only through its active lifecycle but also to retain it for
Functionalities
Long-term readability and accessibility
Archive Server stores content on a medium that will be accessible throughout the
required retention period. Archive Server can store content in any format. (Nevertheless,
we recommend archiving standard formats, such as TIFF or PDF, rather than proprietary
formats). Archive Server also supports the transfer to alternate storage media if required.
Details on how Archive Server ensures long-term readability and accessibility are
described in the last part of this white paper:
Archive Server mainly stores content on a medium that is unalterable. Not only is tamper-
proof storage ensured, but several functions prove that content has not been changed.
Details on how OpenText Archive Server addresses long-term readability and accessibility
are described in the last part of this white paper:
Retention handling
Retention handling defines and handles retention of documents and data on the basis of
a corporate-wide policy. Archive Server provides retention handling functionality that
allows a leading application, such as SAP Applications or OpenText Email Archiving for
Microsoft® Exchange, to define and manage the lifecycle of archived documents and
data.
Retention handling must address this complex environment to ensure that business
values and business risks are managed concurrently. Once a document’s active
processing phase is complete, its classification determines the rest of its lifecycle. At this
point, the technologies that automate records retention and destruction come into play.
How long should the records be saved, and when can they be safely destroyed?
Companies often have written policies on document retention. They should define and
document policies for records management and ensure that the policies are implemented
and maintained at all levels in the organization.
One would think that since almost all documents are now electronic, control and access
would be a snap. Sadly, that is not the case. Electronic records exist in many different
locations, both on-site and remotely. Employees are accessing and storing records
electronically at home and even on handheld computers. Document retention policies are
difficult to create and even more difficult to enforce.
For example, when one of the largest US software vendors was fighting its anti-trust case
with the US Department of Justice, the prosecution was able to bring forth emails that had
been circulated between employees as evidence of anti-competitive business methods.
Had the company been more diligent in enforcing its records retention policies, those
emails might have been legally destroyed.
Details on how Archive Server handles retentions are described in the last section of this
white paper:
Retention management
Controlled deletion
Each type of document has its retention periods. After expiration of the retention period,
content must be deleted. Combining Archive Server with the optional Records
Management module ensures that content is deleted. In addition, associated content,
such as meta data and indexes, is securely deleted. Details about how Archive Server
integrates with Records Management are described in the last section of this white paper:
Another aspect of security is protection from unauthorized access. For example, when
Archive Server is used for offering hosting services, it’s necessary that the hosting
company (or its customers) cannot access any foreign content. Data encryption helps
protect privacy and content.
Other security issues involve protection from unauthorized access during transmission of
content via networks and protection against re-usage of URLs. The technologies secKey
and SSL help to protect from these risks.
Details on how Archive Server prevents unauthorized access are described in the last
section of this white paper:
Archive Server can retain detailed records of all activities performed on content stored in
the archive, including the date and time. Details on how Archive Server facilitates audits
are described in the last section of this white paper:
Long-term storage surely is the basis for compliance of ECM systems. However,
companies also benefit from long-term storage from the standpoint of process efficiency.
Insurance contracts, technical drawings or outgoing customer correspondence (such as
utility invoices), often need to be accessed during customer complaint management or
repairs. Fast access to archived documents helps in solving business problems or repairs
faster.
Details on how Archive Server provides long-term storage are described in the last
section of this white paper:
Digital signatures have two aspects of usage: fulfilling compliance (authenticity of content,
securing evidence) and speeding up standard business processes by emulating personal
signatures.
Details on how Archive Server addresses the compliance issue of digital signatures are
described in the last section of this white paper:
• Accounting: Who uses the system to what extend—especially when the ECM
services are running centralized at an IT service provider?
The Solution
Archive Server is a part of the OpenText ECM suite, which allows companies to be
flexible in using this infrastructure—either for a point solution for a specific department or
as the basis for many solutions running on the ECM backbone. Existing point solutions
from other vendors are simple to migrate to Archive Server.
This reduces TCO because of reduced administration, know-how, and hardware costs.
Archive Server is part of the OpenText ECM Suite. All OpenText solutions are based on
this repository when it comes to the archiving of content. Customers profit from an
integrated solutions suite that fits to all ECM-relevant needs.
Scalability
Archive Server provides archiving and storage management capabilities for all
applications that plug into the OpenText ECM Suite framework. This large-scale
integration enables you to save costs by using the same archiving framework and
capabilities for all of your enterprise content. Even if you originally deploy an OpenText
ECM Suite-based solution for the purpose of archiving email and attachments, you can
seamlessly extend that solution to quickly and cost effectively archive SAP content and
any other type of enterprise content.
True enterprise scalability means that organizations can extend a system in any
dimension—whether by geographic distribution, number of users, or volume of content.
Archive Server scales in each of these dimensions.
Details on how Archive Server provides true enterprise scalability are described in the last
part of this white paper:
Accounting is required and used to reflect the usage of documents and scenarios.
Application Server providers, as well as outsourced IT departments of large companies,
need statistics about accessed content and billing. Through this, document storage and
document retrieval can be charged.
In addition to the accounting data, usage statistics related to performance monitoring can
be gathered and used to optimize the system performance.
Details on how Archive Server provides powerful accounting functionalities are described
in the last part of this white paper:
Archive Server virtualizes storage and accessibility, which increases flexibility in storage
management by using your choice and combination of storage hardware. Typically, the
lifetime of business documents exceeds the lifetime of storage hardware. Compatibility
with all major storage providers ensures that companies can seamlessly migrate content
to alternate storage media in the future.
Details on how Archive Server optimizes storage management are described in the last
part of this white paper:
If highly critical or worldwide processes rely on content provided by the Archive Server, it
is essential to provide access to the content 24 hours a day, 7 days a week. For these
requirements, the Archive Server supports high-availability deployments.
Archive Server supports replication and distribution scenarios, so that data sets can be
kept redundantly for additional safekeeping. For example, to help safeguard against the
risks of physical disasters and environmental instability, redundant data sets can be
stored in multiple physical locations.
Archive Server doesn’t just archive content: it also affects how that content is organized.
By retaining information about the hierarchy of data, Archive Server can rebuild not just
the content itself but also the structure of the information store. The administration
interface facilitates disaster recovery processes where administrators can reconstruct
Archive Server from storage media.
Details on how Archive Server provides content protection and availability are described
in the last part of this white paper:
Operating systems:
• Microsoft Windows Server, Sun Solaris, HP HP-UX, IBM AIX, Novel SUSE Linux, Red
Hat Linux
Database systems:
Storage hardware:
• HSM Systems
• Cloud Storage
Central administration and monitoring of the server and storage functionalities simplifies
the lives of administrators. The Administration Server of Archive Server is used to
manage and configure the system components. The entire archiving system can be
managed locally or remotely via the user-friendly administration client.
The Server Monitor checks the availability of system resources and monitors the activity
of the individual archive components. It is used proactively to quickly detect problems and
pinpoint the source of any errors. The Server Monitor client can also be used remotely via
a Web-based client.
Details on how Archive Server simplifies the work of administrators are described in the
last part of this white paper:
If we look back 10 years at how archive systems have been structured, and compare that
with an up-to-date ECM system, we realize that the complexity has increased, especially
in terms of applications using archive systems and available storage systems. Some
storage technologies are not used anymore for long-term archiving, such as Microfiche
and CD.
Archive Server provides extensive functionality for data migration and hardware
abstraction in order to give corporations the required flexibility in their hardware strategy
for decades.
A leading application is one that generates archived documents (such as print lists in
SAP) or with whose business objects the archived documents are linked (e.g., inbound
documents in SAP). SAP, OpenText Content Server, Microsoft Exchange, Lotus Notes,
and Microsoft SharePoint can be linked as leading applications.
Companies are increasingly aware that leading applications change. However, the
documents referenced by those leading applications may have to be kept over long
periods of time—sometimes even 20 years. In order to stay independent from leading
applications, customers choose to archive sufficient metadata with the documents. In
case that a leading application is being discontinued, the metadata ensures an easy
migration path.
Archive Server comprises multiple services and processes, amongst which the Storage
Manager, the Document Service, and the Administration Server are the most important
ones. The Storage Manager is responsible for storing documents and data, whereas the
document management functionality, the storage of metadata and other properties, and
the entire communication is done by the Document Service. Client applications “talk” to
the Document Service. (In the following, these two sub-services are referred to as Archive
Server.)
Depending on the business process, the document type and the storage media, Archive
Server uses different techniques to store and access documents. This guarantees optimal
data and storage resource management. Mass data, which is not changed anymore, can
be stored as ISO images. The Storage Manager provides access to ISO images within a
physical or virtual jukebox. Content that is prone to change and has an individual lifecycle
will be stored as single file.
More complex OpenText ECM Suite implementations may consist of several Archive
Servers, for example, to reduce access time in large—possibly worldwide—networks or to
improve reliability by mirroring an entire Archive Server. If an Archive Server acts as a
mirroring system of another server, it is called a Replication Server. Additional Cache
Servers complement these servers to a complete, worldwide storage landscape.
• Document Service - Controls the storage and retrieval of the individual components.
• Document Pipeline - Used to transport and process the data and documents to be
archived. (The Document Pipeline is optional.)
• Cache Server - Speeds up the access to the archived documents. The Cache Server
is optional and used in ECM environments, mostly with worldwide, distributed
departments, and low network bandwidth. The Document Service itself contains a
service to cache content from slow media like WORMs.
In addition, Archive Server supports COLD (Computer Output on Laser Disks) and
archives COLD and spool data from host systems. The Document Pipeline controls data
processing and archiving.
Individual tools (called DocTools) retrieve the documents from the conveyer belt, process
them one by one, and then return them to be processed by the next tool. The last tool in
the pipeline generally removes the document from the conveyor belt. Depending on the
configuration, Document Pipelines can contain various different DocTools to implement all
different kinds of document processing, and further tools can be added as required.
An application called “Document Pipeline Info” displays the status of all document
pipelines and their DocTools. In the picture below, you see the status of the document
pipeline “Import content and attributes into DocuLink, which adds documents to the
Content Server application known as OpenText DocuLink for Content Server”. Currently
no documents are being processed and none are in the Error queue, indicated by the
zero in the columns on the right-hand side.
Document Pipelines are available for all major target systems: SAP, TCP, Content Server,
Enterprise Library, and File System Archiving.
Figure 7:
Document Pipeline
for batch import with
attribute extraction
High-volume filing
An important principle for all Document Pipelines is that processing is always
transactional. That means the processing status of the document is always defined: either
it has been processed by a specific DocTool or not, and no documents can get lost. If for
any reason the Document Pipeline is aborted, or processing is cancelled at any time, the
document is considered to be unprocessed by the last active DocTool. The current status
is retained at all times. Therefore, when the Document Pipeline is started again,
processing can continue at precisely the same step the document was at when the
program was aborted. This re-entrance provides the security required for high-volume
filing.
Some customers archive as many as one million documents within 10 hours. The high-
volume filing capability of Archive Server allows such large migration projects to be
conducted. At a large German bank, the migration of a decommissioned archive system
included 160 million documents, 1,000 online and 1,800 offline partitions, which make a
total of 2.8 TB online data and 1.8 TB offline data.
Archive Server stores any content, regardless of its format. Storage of some forms of
content is trimmed to optimize the use of storage space or document access. For
example, outgoing invoices that may be numerous but very small. OpenText Content
Server based applications come with a set of clients for imaging and displaying
documents.
These clients support existing imaging standards such as TIFF, JPEG, and PDF, as well
as SAP formats such as OTF, ALF, and ADK. All the desktop applications and the different
Windows clients use the Open Document Management API (ODMA) to communicate with
the archive system. The ODMA interface allows for seamlessly integrating most
applications with the business document system.
One vendor, OpenText, offers a complete ECM suite that is integrated in all components.
This reduces TCO and enhances stability and security of the system.
• Strong capabilities in the ability to distribute the system to all business regions.
For instance, two logical archives are created, one to store contracts and another one for
personal signatures. Personal signatures need to be accessed very fast, and therefore
the logical archive should be attached to an HD. Contracts need to be stored on a save
medium, which ensures they cannot be manipulated. Thus, the logical archive for
contracts may be attached to a device with WORM support. Furthermore, retention
periods may be different for individual document types. By assigning and explicitly
naming logical archives or pools to individual fiscal years, the administrator is given an
immediate overview on retentions.
Archive Server can adapt to changing business needs flexibly and cost effectively.
Archive Server scales both vertically by adding additional worker processes and
horizontally by load balancing. As the number of users grows, it is possible to connect
new clients to Archive Server or to install additional Archive Servers or Cache Servers.
• Hard disk Write Once media with WORM feature and retention handling
• HSM systems
• Cloud storage
For details, please see the Archive Server Storage Platform Release Notes.
OpenText Content Server is the leading collaboration and content management software
for global organizations that brings together people, processes, and content. The
information managed within the Content Server can be safely stored with Archive Server.
For this purpose, a software option called OpenText Archiving for Content Server has
been developed.
OpenText Archiving for Content Server adds the full capabilities of Archive Server—
including compliance—to the Content Server. Organizations can deploy a robust solution
for managing content throughout the entire ECM lifecycle—from creation through
publication to archival and eventual deletion.
The process is completely transparent to end users. The user creates document versions
within the Content Server. The document itself is stored on Archive Server, from where it
is quickly and reliably accessible. No user interaction is required to store documents on
Archive Server. Based on rules, configured by an administrator, the system decides which
storage provider is used upon document creation. Multiple logical storage providers can
be configured, each related to a logical archive. The logical archives in turn refer to
specific storage locations. While storing documents with Archive Server, they are
automatically full-text indexed.
• Capture (high-volume imaging, indexing, reports and print lists, faxes, office
documents)
Transactional Content Processing provides the tools to quickly and easily build customer-
specific business applications such as customer folders in insurance, banking, utility, or a
patient folder in healthcare. Production Document Management solutions quite often work
closely together with leading applications such as SAP, CRM, or host systems. The
various interfaces allow customers to build the integrations and thereby support the end
user’s daily business. The tight integration with Transactional Content Processing allows
building document-centric process solutions to improve business efficiency and achieve
faster response times. It leverages Archive Server as a highly scalable and secure
repository for business critical data and documents and is designed for the complete
range of business content and its lifecycle management.
Records management is the practice of both retaining and destroying records, enabling
organizations to:
• Ensure that all information is retained for at least as long as it must be retained,
• Ensure that discovery requests and audits can be performed in an efficient and cost-
effective manner (i.e., information can be reliably retrieved), and
In managing the lifecycle of email, hard-copy documents, file boxes, and more, a
company can provide litigation support, identify vital records, automate, and administer its
corporate retention program efficiently. The organization can also apply descriptive
metadata, ensuring integrity of business-critical knowledge and reducing risk due to audit,
regulatory compliance, and litigation. By integrating Archive Server, you can ensure
compliance and implement your corporate retention program through all layers of the
application down to the hardware components.
The Imaging DesktopLink module, which is part of the Imaging package, can archive
documents from any ODMA-compatible desktop application and integrate them, like with
SAP business transactions, for example.
The integration between Archive Server and SAP is based on and certified for various
standard SAP interfaces:
• SAP ILM WebDAV Interface (together with other components of Enterprise Library)
The SAP HTTP Content Server Interface is the successor of the ArchiveLink interface
and allows for connection to the SAP Knowledge Provider, which is used for SAP PLM
and SAP DMS, for example.
The SAP ILM WebDAV interface is the successor of the SAP WebDAV XML Data
Archiving Interface. The ILM WebDAV interface is used to manage the complete lifecycle
of archived SAP data. Together with the Archive Server, Enterprise Library enforces the
retention periods and holds, which are transmitted by SAP for the data archiving files and
also for the attached documents.
The Archive Server is certified to be Solution Manager Ready. Even more so, it integrates
in the SAP support infrastructures at a customer, which is based on Solution Manager.
All these integrations into standard SAP interfaces allow customers to leverage the
document functionality of SAP in each and every SAP module. Also, through the usage of
these standard interfaces, Archive Server can be rapidly connected to SAP.
The add-on product, OpenText ECM Suite for SAP® Solutions, uses these interfaces to
manage and archive all kinds of SAP documents, such as ArchiveLink and SAP
Knowledge Provider (KPro) and including the following:
• Outgoing SAP documents (documents that were created by the SAP system, such as
purchase orders, invoices, reminder letters, delivery notes)
• Other SAP ArchiveLink and SAP KPro documents created by the SAP system or
users in different SAP modules and applications.
OpenText also provides a comprehensive product portfolio for all document archiving and
document management needs in an SAP environment:
Email traffic becomes more and more complex in the daily business, which has led to a
high volume of documents that must be stored in the email systems. Some of these
documents even need to be stored for several years because of legal requirements, so
deleting documents is not always a solution to save disk space. Furthermore, deleting
emails can be time consuming and tedious.
An archiving solution helps save disk space on email systems, and integration into
Microsoft Exchange speeds up operations. Integrating Archive Server into Microsoft
Exchange considerably reduces the amount of data stored on the MS Exchange servers,
enabling them to perform better. Significantly less hard disk capacity is required, resulting
in additional savings. Backup and data storage activities are reduced, as well as the
amount of administration time, by archiving emails, attachments, PST files, and public
folders.
Your archiving environment can be customized: you may archive your emails
automatically or interactively. Interactive archiving and the display of archived objects are
based on MS Exchange custom forms. No extra software is necessary on MS Outlook®
clients, whether for automatic or manual archiving.
To save even more disk space, we provide Single Instance Archiving. An attachment that
several users want to archive is archived once only and is referenced with the individual
emails.
With the add-on product, OpenText Email Archiving for Lotus Notes, it‘s easy to archive
Lotus Notes emails simply by selecting them and using the respective menu option or the
Archiving Toolbar options.
Via a Lotus Notes client or Domino Web Access, the emails can later be retrieved for
viewing, restored to their original condition, or copied back to the Lotus Notes database. It
is also possible to delete archived documents from the archive and to retrieve archived
documents into a local replica.
High-volume management
The value of ECM is strategic. It’s an important factor in the organization’s overall
financial performance and a competitive advantage. An ECM system must be planned
very carefully in order to meet performance requirements—especially, when the daily
created volume of documents or the number of users is very large.
Archive Server provides long-term storage for high volumes of data. Since storage
technology has a lifecycle, Archive Server takes responsibility for the reliable, seamless,
and transparent migration of content from outdated storage to recent storage technology.
Archive Server helps you to adopt your content and storage strategy to changing
requirements and new technology in a cost-effective way.
OpenText customers already manage high volumes of data, as these figures show:
• Current document stock up to 15 TB, expected in the next three years up to 150 TB
• Current document volume up to 200 million, expected within the next three years up
to two billion
Fast access is among the tasks of Archive Server, which require high performance.
Archive Server fulfills performance requirements for filing (store), backup, replication,
migration, deletion, and administration.
The size of business documents may vary from a few kilobytes up to several gigabytes,
and both sizes challenge storage systems. Very small documents may waste much space
due to big block size of storage media and decrease filing performance. Very large
documents may exceed physical partition limits.
Furthermore, high-end storage systems and modern file systems cannot handle an
unlimited number of files. Limitations are the number of files within one directory up to a
total number of entries within the index of a storage system. Archive Server addresses
these limitations with a special container file technology. Depending on the document
type, the business scenario and the storage media, Archive Server supports several types
of container files:
• Archive Server uses ISO9660-Images as container files. A container file may contain
several thousands of documents but occupies just one index entry. This technique
dramatically relieves the index of a storage system and increases write performance.
ISO-images are best suited for mass data that will not change after it has been
archived.
Compression
In order to save storage space, you can activate data compression for each individual
logical archive or content type. All important formats including email and office formats are
compressed by default. Compression rates depend on file format and content and
correspond roughly to gzip level 6.
Retentions handling
As physical storage may not allow immediate physical deletion or even physical
destruction of documents,Archive Server provides policies (depending on capabilities of
After the retention period has expired, the document can be deleted. This has two
aspects. The first is if an administrator wants to delete documents to get rid of old
volumes. In this scenario, it is sufficient to delete sooner or later. The other aspect is that
the content of a document could compromise someone. In this case, the document must
be deleted immediately after the retention period has been expired.
A leading application may specify a retention period (and a retention behavior) during the
creation and migration of a document. If nothing is specified, a default period and
behavior is used, configured per logical archive within the administration client.
Retention management
Retention management is performed by the leading application that accesses Archive
Server’s Retention Handling functionality. For instance, Records Management requires
classification, retention management, audit trails, and deletion of documents. Though
most of these requirements have to be met by a records management application,
Archive Server handles retention periods and keeps track of all changes on document
content.
Furthermore, Archive Server provides logical archives and monitoring functionality for
retention management. For example, all invoices from the current year are grouped
together into a logical archive so they can be deleted after the retention period has
expired.
For this purpose, Archive Server administration compiles a list with all volumes containing
mostly expired documents. Numerous volumes with mainly expired documents can be
reduced to a handful via automatic migration. When the migration is completed, the
expired volumes can be removed or purged, thus saving jukebox slots or storage space,
depending on the media.
Volume Migration also provides the flexibility to adjust the storage strategy or to move
from outdated storage media/devices to recent technology with more capacity (e.g.,
WORM toNetApp SnapLock).
Business documents are always accessed by business applications (and are mostly
worthless without their business context), but Archive Server follows a different concept.
The business application itself (SAP, Content Server, Transactional Content
Management)—and not single users—authenticates at Archive Server (signed URL resp.,
secKey, certificates). Archive Server expects that the business application has authorized
the user of the corresponding request and grants access to documents.
When a client sends a request to the application server, the trusted source checks the
access rights, and if they exist, signs the URL and sends it to the client. The client can
then access Archive Server with this URL. The signed URL contains an expiry time (e.g.,
two hours), after which it is no longer valid.
Within Archive Server, the URL signature is called secKey, which is part of the Server API
and used by all leading applications, such as Exchange Archive and Production
Document Management. Archive Server can be configured so that unsigned requests are
rejected; i.e. only requests from the explicitly authorized SAP application server are
accepted. Thus, even if an attacker obtains a document ID, unauthorized access to the
document will be denied.
OpenText Imaging Enterprise Scan generates checksums for all scanned documents and
passes them on to the Document Service. The Document Service verifies the checksums
Digital signatures
We distinguish two types of digital signatures: personal signatures to handle
authentication and timestamp signatures to ensure data integrity. Although personal
signatures are stored with Archive Server, the handling is controlled by the leading
application. Timestamp signatures provided by Archive Server are described below.
Timestamps
In order to avoid any unnoticed data loss, even the transmission of a document is
secured on its way with the help of checksums. From there, the integrity is secured with
the help of timestamps. Timestamps ensure that document components cannot be
modified unnoticed after they have been archived. Timestamps guarantee the authenticity
of archived business documents. When tax auditors examine a document several years
later, the company can prove that it was saved at a certain time and hasn’t been changed
since.
A timestamp is a signed datagram containing the document's hash value, the current time
and date, and additional information. The Archive Server supports interfaces to external,
certified timestamp service providers like timeproof and Authentidate.
A timestamp is valid for about eight years. After a certain time, it loses its security
because it’s based on a hash algorithm, which may be identified by hackers. Thus, after a
certain period of time, signature renewal must be performed.
The solution to meet these shortcomings is the ArchiSig concept. Archive Server supports
the ArchiSig concept. An ArchiSig-generated timestamp is valid for an unlimited period of
time.
An example scenario for ArchiSig can be found in the public services area. Masses of
historical and new documents have to be handled and stored. Some communities switch
to electronic processing of these documents. That also means a huge capturing effort for
historical documents. The digital signature during the capturing process keeps the legal
integrity of these documents. ArchiSig keeps the integrity of the digital signature. The
paper-to-electronic transformation is a secure process, and the electronic documents are
having the same legal force as their corresponding paper documents. Electronic
documents can now be integrated in such processes as SAP. By this, SAP users have the
full information overview for every transaction.
Archive Server backup concept provides maximum reliability. This includes backing up all
the hard disk partitions that contain archived documents before they are stored in the
optical archive, as well as the operating system and the application software. The system
can also generate backups of all the entries in the archive database and duplicate the
optical media, largely as automated functions. Furthermore, Archive Server can create
copies of volumes as backups.
To avoid losing data in the event of a hard-disk failure and resume using Archive Server
immediately, we recommend using Redundant Array of Independent Disks (RAID)
technology as an additional data backup mechanism.
High availability
To eliminate long downtimes, Archive Server offers high availability via “hot standby
server.”
The hot standby server is a cluster solution, in which a fully-equipped secondary Archive
Server monitors the current production system. If a server fails, the secondary server
automatically assumes all activities, with full transparency for end users. Archive Server
clusters run through a fast LAN and respond to end users in the same way as a single,
high-availability Archive Server.
If the production system fails, users can continue to work normally on the secondary
archive system. In contrast to the remote standby server scenario, both read (retrieval)
and write (archiving) access to documents is possible in this configuration.
Remote standby
With a remote standby server, all the documents in an archive are duplicated on a second
Archive Server—the backup server—via a WAN connection for geographic separation.
The remote standby server’s configuration is identical to that of the original Archive
Server. The archives and hard disk buffers of the original server are replicated
asynchronously.
The remote archive system generates backups of the original optical media. If the
production Archive Server fails, the backup server continues to provide read-access to all
the documents. Physically separating the two servers also provides optimal protection
against fire and other catastrophic loss.
Disaster recovery
The Archive Server stores the available meta data together with content on the storage
media (e.g. DocId, aid, timestamp). This allows Archive Server to completely restore
access to archived documents in case the Archive Server hardware has a major
breakdown or has been destroyed. Technically, the entire database can be restored from
Documents are related to a business process that is handled by a leading application. For
example:
• All invoices from the current year are grouped together so that they can be easily
deleted after the retention period has expired.
Logical archives make it possible to store documents in a structured way. You can
organize archived documents in different logical archives according to the following
criteria:
Key tasks of Archive Server include hiding specific hardware characteristics to leading
applications, providing transparent access, and optimizing storage resources.
The Archive Server looks like a “Janus”—on the one side, it can handle complex
hardware; on the other side, it provides hardware abstraction by offering a unified
storage. If a hardware vendor’s storage API changes or if new versions come up, it’s not
necessary to change all the leading applications using the hardware—only the storage
manager’s interface needs to be changed.
Storage reorganization
Content lifecycle may be different depending on the document type, thus imposing
different requirements on the storage sub-system. For example, many working copies will
be created until a conceptual document (such as a product specification or contractual
work) is finalized. Often, it is not necessary to store working copies in a long-term archive;
sometimes they even may be deleted once the content has been finalized. The finalized
version, however, needs to be stored on a save, non-alterable, long-term storage
medium.
Cache Server
Archive Server supports caching via the Cache Server. It gives users fast access to
archived documents. This is especially important in distributed network environments
(such as WAN) because it greatly reduces the network load. It stores all the recently read
documents locally and displays them on the client on request. When displaying
documents, the Cache Server ensures that the document in the cache is the most current
archived original. If several Cache Servers are used, even the logical archives and
subnets of the network can be individually configured.
The Cache Server normally operates in a write-through mode, where all documents that
are created locally are stored on the Cache Server and at the same time directly written
through to the Archive Server. The Cache Server can be switched into a write-back mode.
In this mode all the documents are cached in the local store of the Archive Cache Server
only. An administrative job will later transfer these documents to the central Archive
Figure 16:
Cache Server Scenario
The cache of the Cache Server is filled upon reading and writing documents (e.g., when
scanning with Enterprise Scan or importing documents via the Document Pipeline). Also
all applications using the Archive Server API will make use of the Cache Server
scenarios.
The Administration server of Archive Server is used to manage and configure the
following system components:
• The logical archive, which can be used to group the documents by department,
physical location, document type, etc. A retention period can be specified for each
logical archive.
• The optical media and media pools (e.g., automatic WORM finalization)
The entire archiving system can be managed either locally or remotely using the
Administration Client of the Enterprise Library.
Figure 17:
Enterprise Library
Administration
Server monitoring
Monitoring ongoing processes helps maintain optimal system performance. For this
reason, Archive Server includes various monitoring systems that help control the overall
system—from the resources for the storage hardware to the individual archiving
components’ processes.
The Monitor Server helps administrators locate and correct potential problems by using
remote procedure calls, SQL queries, and operating system calls to collect and monitor
data from the individual components. It continuously saves data about the components’
status and the available storage space.
The Monitor Server has a Web-based monitor client that enables the administrator to
monitor the Archive Server processes and ressources. The processes of the individual
Moreover, log files offer another powerful method for diagnosing Archive Server. All the
archive components generate log files, which record the activities of the different
processes. The log levels’ default setting records a minimum of information. If the
administrator suspects a problem with a certain component, however, he/she can
increase the log level for that component.
Figure 18:
Archive Server
Web Monitor
Logging
For long-term monitoring, you can have performance data written to log files. Logging for
each component of Archive Server can be individually switched on or off within the Server
Administration.