0% found this document useful (0 votes)
27 views

commvault guide

The Commvault® Engineer Student Guide provides comprehensive training resources and certification information for Commvault customers and partners. It includes details on course content, lab environments, and certification paths, emphasizing the importance of education in mastering Commvault software. The document also outlines copyright and confidentiality policies regarding the use of its materials.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

commvault guide

The Commvault® Engineer Student Guide provides comprehensive training resources and certification information for Commvault customers and partners. It includes details on course content, lab environments, and certification paths, emphasizing the importance of education in mastering Commvault software. The document also outlines copyright and confidentiality policies regarding the use of its materials.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 178

Commvault ®

Education Services
Commvault® Engineer
Student Guide
Copyright
Information in this document, including URL and other website references, represents the current view of Commvault
Systems, Inc. as of the date of publication and is subject to change without notice to you.

Descriptions or references to third party products, services or websites are provided only as a convenience to you and
should not be considered an endorsement by Commvault. Commvault makes no representations or warranties, express
or implied, as to any third-party products, services or websites.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos,
people, places, and events depicted herein are fictitious.

Complying with all applicable copyright laws is the responsibility of the user. This document is intended for distribution to
and use only by Commvault customers. Use or distribution of this document by any other persons is prohibited without
the express written permission of Commvault. Without limiting the rights under copyright, no part of this document may
be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic,
mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of
Commvault Systems, Inc.

Commvault may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering
subject matter in this document. Except as expressly provided in any written license agreement from Commvault, this
document does not give you any license to Commvault’s intellectual property.

COMMVAULT MAKES NO WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, AS TO THE INFORMATION


CONTAINED IN THIS DOCUMENT.

©1999-2020 Commvault Systems, Inc. All rights reserved. Commvault, Commvault and logo, the "C hexagon” logo,
Commvault Systems, Solving Forward, SIM, Singular Information Management, Commvault HyperScale, ScaleProtect,
Commvault OnePass, Commvault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor,
Vault Tracker, InnerVault, Quick Snap, QSnap, IntelliSnap, Recovery Director, CommServe, CommCell, APSS,
Commvault Edge, Commvault GO, Commvault Advantage, Commvault Complete, Commvault Activate, Commvault
Orchestrate, and CommValue are trademarks or registered trademarks of Commvault Systems, Inc. All other third party
brands, products, service names, trademarks, or registered service marks are the property of and used to identify the
products or services of their respective owners. All specifications are subject to change without notice.

Confidentiality
The descriptive materials and related information in the document contain information that is confidential and proprietary
to Commvault. This information is submitted with the express understanding that it will be held in strict confidence and
will not be disclosed, duplicated or used, in whole or in part, for any purpose other than evaluation purposes. All right,
title and intellectual property rights in and to the document is owned by Commvault. No rights are granted to you other
than a license to use the document for your personal use and information. You may not make a copy or derivative work of
this document. You may not sell, resell, sublicense, rent, loan or lease the document to another party, transfer or assign
your rights to use the document or otherwise exploit or use the Manual for any purpose other than for your personal use
and reference. The document is provided "AS IS" without a warranty of any kind and the information provided herein is
subject to change without notice.

©1999-2020 Commvault Systems, Inc. All rights reserved


V11 SP18 Commvault® Engineer February 2020

For comments, corrections, or recommendations for additional content, contact:


[email protected]

Commvault® Education Services Page 3 of 178


V11 SP18 Commvault® Engineer February 2020

Contents
Advanced Infrastructure Design .............................................................................................. Error! Bookmark not
defined.
Introduction ..............................................................................................................................................................................
7
Advanced Infrastructure Design Course Overview ..........................................................................................................
8
Education Advantage .......................................................................................................................................................
9
Class Resources ............................................................................................................................................................
10
CVLab On Demand Lab Environment ...........................................................................................................................
11
Commvault® On-Demand Learning ...............................................................................................................................
12
Commvault® Education Career Path .............................................................................................................................
13
Education Services V11 Certification .............................................................................................................................
14
Course Overview ...........................................................................................................................................................
17
CommCell® Environment Design ..........................................................................................................................................
18
CommCell® Structure Planning ......................................................................................................................................
19
CommServe® Server Design .........................................................................................................................................
20
CommServe® Availability ...............................................................................................................................................
26
MediaAgent Scaling .......................................................................................................................................................
29
Indexing ............................................................................................................................................................................. 32
Indexing Overview .........................................................................................................................................................
33
V2 Indexing Overview ....................................................................................................................................................
37
Index Process for Data Protection Jobs ........................................................................................................................
38
Index Database Backup Operations ..............................................................................................................................
39
Index Checkpoint and Backup Process .........................................................................................................................
43

Commvault® Education Services Page 4 of 178


V11 SP18 Commvault® Engineer February 2020

Index Database Recovery Process ...............................................................................................................................


44
Index Process Using Multiple MediaAgents ..................................................................................................................
45
Upgrading from V1 to V2 Indexing .................................................................................................................................
46
Storage Design......................................................................................................................................................................
49
Storage Infrastructure Design ........................................................................................................................................
50
Disk Library Design ........................................................................................................................................................
51
Data Server (SAN, iSCSI, IP) ........................................................................................................................................
54
Tape Library Design .......................................................................................................................................................
55
GridStor® Technology ....................................................................................................................................................
57
Cloud ................................................................................................................................................................................. 62
What is Cloud? ...............................................................................................................................................................
63
General Commvault® Feature ........................................................................................................................................
65
Cloud Computing and Storage ......................................................................................................................................
67
Disaster Recovery and Cloud ........................................................................................................................................
71
Disaster Recovery to Cloud using Live Sync .................................................................................................................
72
Deduplication .....................................................................................................................................................................
73 Components and
Terminology ....................................................................................................................................... 74 Deduplication
Database Reconstruction ........................................................................................................................ 75

Content Aware Deduplication ........................................................................................................................................


77
Partitioned Deduplication Database ..............................................................................................................................
80
Data Movement of Deduplicated Data ...........................................................................................................................
82
Deduplicated Data Aging and Pruning Process.............................................................................................................
88

Commvault® Education Services Page 5 of 178


V11 SP18 Commvault® Engineer February 2020

Deduplication Database Seeding ..................................................................................................................................


90
Deduplication Database Synchronization ......................................................................................................................
94
Commvault HyperScale Technology .................................................................................................................................
98
Commvault HyperScale Technology Overview .............................................................................................................
99
Commvault HyperScale Architecture – High Level ......................................................................................................
103
Commvault HyperScale Architecture – Network ..........................................................................................................
105
Storage Architecture ....................................................................................................................................................
109
Storage Policies ...............................................................................................................................................................
113
Storage Policy Design Methodology ............................................................................................................................
114
Approaching Storage Policy Design ............................................................................................................................
116
Basic Planning Methodology Approach .......................................................................................................................
118
Guidelines for Custom Storage Policies ......................................................................................................................
122
Retention .............................................................................................................................................................................
124
Retention Overview .........................................................................................................................................................
125
Job Based Retention .......................................................................................................................................................
127
Item Based Retention ......................................................................................................................................................
133
Virtualization ........................................................................................................................................................................
139
Virtualization Primer .....................................................................................................................................................
140
Transport Modes ..........................................................................................................................................................
142
Virtual Server Agent Backup Process ..........................................................................................................................
146
Virtual Server Agent Proxy Roles ................................................................................................................................
147

Commvault® Education Services Page 6 of 178


V11 SP18 Commvault® Engineer February 2020

Virtual Server Agent Settings .......................................................................................................................................


154
VSA Advanced Restore Options ..................................................................................................................................
159
Virtual Application Protection ..............................................................................................................................................
163
Virtual Application Protection Overview .......................................................................................................................
164
Agent Based Application Protection ............................................................................................................................
166
Virtual Server Agent Application Aware Backup ..........................................................................................................
167
Additional Application Protection Methods ..................................................................................................................
169
IntelliSnap® Technology ......................................................................................................................................................
170
IntelliSnap® Technology Overview ...............................................................................................................................
171
IntelliSnap® for VSA .....................................................................................................................................................
173
Block Level Backups ....................................................................................................................................................
175
IntelliSnap® Configuration ............................................................................................................................................
177
Performance ........................................................................................................................................................................
180 Performance Overview ................................................................................................................................................ 181

Performance Benchmarks ...........................................................................................................................................


181
Stream Management ...................................................................................................................................................
185
Meeting Protection Windows .......................................................................................................................................
187
Meeting Media Management Requirements ................................................................................................................
190
Meeting Restore Requirements ...................................................................................................................................

191

Commvault® Education Services Page 7 of 178


V11 SP18 Commvault® Engineer February 2020

COMMVAULT® ENGINEER

Commvault® Education Services Page 8 of 178


V11 SP18 Commvault® Engineer February 2020

INTRODUCTION

Commvault® Engineer Course Overview

Commvault® Education Services Page 9 of 178


V11 SP18 Commvault® Engineer February 2020

Education Advantage
The Commvault® Education Advantage product training portal contains a set of powerful tools to enable Commvault
customers and partners to better educate themselves on the use of the Commvault software suite. The portal includes:

• Training Self-Assessment Tools


• Curriculum Guidance based on your Role in your Commvault Enterprise
• Management of your Commvault Certifications
• Access to Practice Exams and Certification Preparation Tools
• And more!

Commvault® Education Services Page 10 of 178


V11 SP18 Commvault® Engineer February 2020

Class Resources
Course manuals and activity guides are available for download for Instructor-Led Training (ILT) and Virtual Instructor-Led
Training (vILT) courses. It is recommended to download these documents the day prior to attending class to ensure the
latest document versions are being used.

Self-paced eLearning courses can be launched directly from the EA page. If an eLearning course is part of an ILT or vILT
course, it is a required prerequisite and should be viewed prior to attending class.

If an ILT or vILT class will be using the Commvault® Virtual Lab environment, a button will be used to launch the lab on the
first day of class.

Commvault® certification exams can be launched directly from the EA page. If you are automatically registered for an
exam as part of an ILT or vILT course, it will be available on the final day of class. There is no time limit on when the
exams need to be taken, but it is recommended to take them as soon as you feel you are ready.

Commvault® Education Services Page 11 of 178


V11 SP18 Commvault® Engineer February 2020

CVLab On Demand Lab Environment


The Commvault Virtual Lab (CVLab environment) is now available to our global customers. The CVLab allows you
access to a vital learning tool that provides a flexible method for gaining hands-on experience with the Commvault®
software platform. You will have anywhere/anytime access to a powerful lab environment to practice installations, test
configurations, review current version capabilities or review any lab exercises. The CVLab shares a common console with
our Education Advantage (EA) portal and is accessible 24-hours a day up to the amount of connect time purchased.

The CVLab time can be purchased as standalone on-demand CVLab time, or to extend lab time for training courses
attended. Extending CVLab time must be purchased within 48-hours after class end time to maintain your lab progress
from the training course. Whether purchasing on-demand or extending; CVLab connect time may be purchased in
fourhour blocks in any quantity. Access will be available for 90 days from point of purchase and is priced at just one
Training Unit per four-hour block.

Commvault® Education Services Page 12 of 178


V11 SP18 Commvault® Engineer February 2020

Commvault® On-Demand Learning


Commvault On-Demand Learning offers an array of digital learning assets, selected virtual instructor-led events and other
learning development tools. With an annual subscription, you have continuous access to hundreds of hours of on-demand
learning, over a thousand pages of content and more than a hundred technical training videos. Content is created by
seasoned Commvault experts and updates are posted weekly so you can be sure you can take advantage of the full
breadth of the Commvault data platform when you need it.

Commvault On-Demand Learning is a convenient, flexible, and cost-effective training solution that gives you the tools to
keep a step ahead of your company’s digital transformation initiatives. You and your company will benefit by:

• Learning just what you need, when you need it


• Accessing exclusive expert sessions and on-demand content
• Receiving knowledge updates from Commvault experts in near real-time
• Building skill-sets that can be applied to Commvault certification
Applying knowledge and seeing impact immediately.

Commvault® Education Services Page 13 of 178


V11 SP18 Commvault® Engineer February 2020

Commvault® Education Career Path


The Commvault next generation platform leapfrogs legacy solutions in capabilities and functionality fully modernizing the
performance, security, compliance, and economic benefits of a holistic data management strategy. The key concepts
covered in this first step learning module highlight the core features of Commvault’s new platform. To realize the full value
of these features, Commvault provides multiple levels of education and certification from core training, through specialized
learning sessions, from introductory modules for those new to Commvault to master level training for Commvault
powerusers.

Commvault® Education Services Page 14 of 178


V11 SP18 Commvault® Engineer February 2020

Education Services V11 Certification


Commvault's Certification Program validates expertise and advanced knowledge in topics, including Commvault
Professional, Engineer and Master-level technologies. Certification is a valuable investment for both a company and the IT
professional. Certified personnel can increase a company's productivity, reduce operating costs, and increase potential for
personal career advancement.

Commvault's Certification Program offers Professional-level, Engineer-level, and Master-level certifications. This Program
provides certification based on a career path, and enables advancement based on an individual’s previous experience
and desired area of focus. It also distinguishes higher-level certifications such as Engineer and Master from lower-level
certification as a verified proof of expertise.

Key Points

• Certification is integrated with and managed through Commvault's online registration in the
Education Advantage Customer Portal.
• Cost of certification registration is included in the associated training course.
• Practice assessments are available at ea.commvault.com.
• The Commvault Certified Professional Exam Prep course is also available.
• Students may take the online certification exam(s) any time after completing the course.
• Although it is recommended to attend training prior to attempting an exam, it is not required.

Commvault Version 11 Certification Exams


Exams available for Commvault Version 11:

• Commvault® Certified Professional Foundations 2020 Exam


• Commvault® Certified Professional Advanced 2020 Exam
• Commvault® Certified Professional 2020 Update Exam
Commvault® Education Services Page 15 of 178
V11 SP18 Commvault® Engineer February 2020

• V11 Professional Upgrade Exam


• Commvault® Engineer Exam 2020
• Commvault® Engineer Exam 2020 Update Exam
• Commvault® Master 2020 Exam
• Commvault® Master 2020 Update Exam
• Master Upgrade Exam

Commvault® Certified Professional 2020


A Commvault® Certified Professional certification validates the skills required to install, configure, and administer a
CommCell® environment using both the CommCell® console and Commvault Command CenterTM. It proves a professional
level skillset in the following areas:

• CommCell Administration – user and group security, configuring administrative tasks, conducting
data protection and recovery operations, and CommCell monitoring.
• Storage Administration – deduplication configuration, disk library settings, tape library settings,
media management handling, and snapshot administration.
• CommCell Implementation – CommServe® server design, MediaAgent design and placement,
indexing settings, client and agent deployment, and CommCell maintenance.
Certification status as a Commvault Certified Professional requires passing the Commvault® Certified Professional Exam.

Commvault® Certified Engineer 2020


A Commvault Certified Engineer validates advanced level skills in designing and implementing Commvault software.

 Commvault® Engineer Exam – this exam validates expertise in deploying medium and enterprise
level CommCell® environments with a focus on storage design, virtual environment protection, and
application data protection strategies.
Certification status as a Commvault Certified Engineer requires certification as a Commvault Certified Professional and
passing the Advanced Infrastructure Design exam.

Commvault® Certified Master 2020


A Commvault Certified Master validates expert level skills in specific areas of expertise. This is the highest achievable
level of certification.

Certification status as a Commvault Certified Master requires certification as both a Commvault Certified Professional and
Certified Engineer, and successful completion of Master certification requirements. These Master certification
requirements include attending a Master class and passing the Master Certification exam.

Additional benefits of attaining the Master Certification include:

• Opportunity to attend free invitation only training events


• Opportunity to attend free beta and early release training courses
• Special benefits when attending Commvault GO conferences

Commvault® Education Services Page 16 of 178


V11 SP18 Commvault® Engineer February 2020

Course Overview

COMMCELL® ENVIRONMENT DESIGN


Commvault® Education Services Page 17 of 178
V11 SP18 Commvault® Engineer February 2020

CommCell® Structure Planning


Commvault® software is deployed in a cell-like structure called a CommCell® environment. One or more cells can be
deployed to manage small to enterprise global environments. Consider the following advantages and disadvantages when
planning for a single cell or multi-cell structure.

Design Type Advantages Disadvantages

Single CommCell • Provides central management.  If central site hosting the CommServe server
environment • Allows data to easily be restored goes offline, all data management activities
across all sites. will be disrupted.

Multi-CommCell • Provides full autonomy and resiliency.  Cross-site restore operations are more
environment • Allows each IT group to independently complicated if each site is its own CommCell
manage their environment. structure.

Commvault® Education Services Page 18 of 178


V11 SP18 Commvault® Engineer February 2020

CommServe® Server Design


The CommServe® server is the central management system within a CommCell® environment. All activity is coordinated
and managed by the CommServe server. The CommServe system runs on a Windows® platform and maintains a
Microsoft® SQL metadata database. This database contains all configuration information. It is important to note that
Commvault® software does not use a centralized catalog system like most other backup products. This means the
metadata database on the CommServe server is considerably smaller than databases that contain catalog data.

Based on the size of an environment, the CommServe server must be scaled appropriately. For current scalability
guidelines, refer to the Commvault Online Documentation section, ‘Hardware Specifications for the CommServe.’

Key points regarding the CommServe server:

• For CommServe server high availability the following options are available:
o The CommServe server can be clustered – This is recommended for larger
environments where high availability is critical. o The CommServe server can be
virtualized – This is suitable for small to mid-size environments.
• It is ABSOLUTELY CRITICAL that the CommServe database is properly protected. By default, every
day at 10 AM, a CommServe DR backup job is conducted. This operation can be completely
customized and set to run multiple times a day if required.
• All activity is conducted through the CommServe server. Therefore, it is important that
communication between the CommServe server and all CommCell ® components is maintained.

CommServe® Server Performance Requirements


CommServe® server performance is essential for a well performing data protection environment. Although data is moved
from client to MediaAgent or MediaAgent to MediaAgent; communication and job checkpoints are constantly occurring
between CommCell® components and the CommServe server. The CommServe server also serves other functions, such
as reporting, and the user experience may be impacted during peak periods of data protection operations.

Commvault® Education Services Page 19 of 178


V11 SP18 Commvault® Engineer February 2020

CommServe® Server Communication Services


During data protection jobs, the CommServe JobMgr process initiates job operations. The CVD process, which exists on
all CommCell components, provides communication with all resources. As each chunk of a job completes, it must be
registered in the CommServe database before the next chunk begins.

During auxiliary copy jobs, the JobMgr initiates the job and spawns the AuxCopyMgr process on the CommServe server.
This process is responsible for sending chunk information to the source MediaAgent and recording chunk updates from
the destination MediaAgent. In Commvault V11, a good portion of this workload is distributed to on demand services on
MediaAgents to assist in the workload. This offload is enabled using the ‘use scalable resource allocation’ setting in the
auxiliary copy configuration.

During data protection and auxiliary copy jobs, the CommServe server has a substantial responsibility. Consider this when
planning the resources for the CommServe server, especially in larger environments where hundreds of jobs will be
running in parallel.

CommServe® DR Backup
By default, every day at 10:00 AM, the CommServe DR backup process is executed. This process first dumps the
CommServe SQL database to a local folder path. An export process then copies the folder contents to a user defined
drive letter or UNC path. A backup phase subsequently backs up the DR Metadata and any user defined log files to a
location based on the storage policy associated with the backup phase of the DR process. All processes, schedules and
export/backup location are customizable in the DR Backup Settings applet in the Control Panel.

Additionally, a copy of the DR backup can be uploaded to Commvault® Cloud Services, which guarantees that an offline
copy exists and is accessible during recovery if a disaster was to occur.

CommServe® DR backup process overview

Database Dump
During the dump phase, the system stores the dump files in the following location:

• V11 upgraded environment:


<install path>\CommVault\Simpana\CommServeDR folder.
• V11 New Installation:
<install path>\CommVault\Content Store\CommServeDR folder.

Commvault® Education Services Page 20 of 178


V11 SP18 Commvault® Engineer February 2020

If available space is low, the location of the dump can be modified using the ‘ERStagingDirectory’ in the CommServe
Additional Settings tab.

Export
The Export process copies the contents of the \CommServDR folder to the user defined export location. A drive letter or
UNC path can be defined. The export location should NOT be on the local CommServe® server. If a standby CommServe
server is available, define the export location to a share on the standby server.

By default, five metadata backups are retained in the export location. It is recommended to have enough disk space to
maintain one weeks’ worth of DR exports and adjust the number of exports to the DR backup schedule frequency.

Backup
The Backup process is used to back up the DR Metadata to protected storage. This is accomplished by associating the
backup phase with a storage policy. A default DR storage policy is automatically created when the first library is configured
in the CommCell environment. Although the backup phase can be associated with a regular storage policy, it is
recommended to use a dedicated DR storage policy to protect the DR Metadata.

DR Storage Policy
When the first library in a CommCell environment is configured, a CommServe Disaster Recovery storage policy is
automatically created. The Backup phase of the DR backup process is automatically associated with this storage policy. If
the first library configured is a disk library and a tape library is subsequently added, a storage policy secondary copy is
created and associated with the tape library.

There are several critical points regarding the DR storage policy and backup phase configurations:

• Although the Backup phase can be associated with any storage policy in the CommCell ®
environment, it is recommended to use a dedicated DR storage policy. Using a dedicated policy
isolates DR Metadata on its own set of media making it potentially easier to locate and catalog in a
disaster situation.
• The most common reason the Backup phase is associated with regular data protection storage
policies is to reduce the number of tapes being sent off-site. If the backup phase is associated with
a regular storage policy, consider the following key points:
o Make sure the 'Erase Data' feature is disabled in the storage policy. If this is not done, the DR
Metadata will not be recoverable using the Media Explorer utility.
o When the storage policy secondary copy is created, ensure the DR Metadata is included in
the Associations tab of the policy copy.
o Make sure you are properly running and storing media reports. This is especially important
when sending large numbers of tapes off-site. If you don't know which tape the metadata is
on, you will have to catalog every tape until you locate the correct media which is storing
the DR Metadata.

DR Backups to the Cloud


Commvault® offers a free cloud service that allows DR backups to be uploaded to the cloud. The service stores the last
seven metadata backups to be downloaded if needed. This ensures that a recent copy of the database is offsite and
cannot be accessed by a rogue process such as a ransomware attack.

The free cloud service requires a Commvault Cloud Services account, which is created using the following URL:

Commvault® Education Services Page 21 of 178


V11 SP18 Commvault® Engineer February 2020

http://cloud.commvault.com
To configure DR Backups to the Commvault® cloud

1. Select the Configuration menu | DR Backup.

2. Check to enable backups to the cloud.

3. Click to define the account to use.

4. Check to provide the account.

5. Provide the cloud services account credentials.

Configure and Run DR Backups


DR backups are automatically configured and scheduled upon software installation. The default settings and schedule can
be edited to fit your needs, and manual DR backups can be executed on demand if needed.

To access CommServe® DR settings 1. Select

configuration menu | DR Backup.

2. Number of exports to retain.

3. Set the export location to network share or drive.

4. Browse for the location.

5. Define the user account for the network share

6. Check to upload a copy of the DR Backup to Commvault® Cloud Services.

Commvault® Education Services Page 22 of 178


V11 SP18 Commvault® Engineer February 2020

7. Check to send a copy of the DR backup to Commvault® Cloud Services.

8. Define the Commvault® Cloud Services user account.

9. Enable VSS for log file backups.

10. DR Backup Storage Policy association.

Backup Frequency
By default, the DR backup runs once a day at 10:00 AM. The time the backup runs can be modified, and the DR backup
can be scheduled to run multiple times a day or saved as a script to be executed on demand.

Consider the following key points regarding the scheduling time and frequency of DR backups:

• If tapes are being sent off-site daily prior to 10:00 AM then the default DR backup time is not
adequate. Alter the default schedule so the backup can complete, and DR tapes can be exported
from the library prior to media being sent off-site.
• The DR Metadata is essential to recover protected data. If backups are conducted at night and
auxiliary copies are run during the day, consider setting up a second schedule after auxiliary copies
complete.
• For mission critical jobs, consider saving a DR backup job as a script. The script can then be
executed by using an alert to execute the script upon successful completion of the job.

Locations
Multiple copies of the DR backup can be maintained in its raw (export) form using scripts. Multiple copies of the backup
phase are created within the DR storage policy by creating secondary copies, or by creating a data backup storage policy
and including the metadata in the secondary copy’s Association tab.

Follow these guidelines for locating the DR Metadata backups.

Commvault® Education Services Page 23 of 178


V11 SP18 Commvault® Engineer February 2020

• On-site and off-site standby CommServe® servers should have an export copy of the metadata.
• Wherever protected data is located, a copy of the DR Metadata should also be included.
• Whenever protected data is sent off-site a copy of the DR Metadata should be included.
• Since DR Metadata does not consume a lot of space, longer retention is recommended.

Retention
By default, the export phase maintains five copies of the metadata. A general recommendation is to maintain a weeks’
worth of metadata exports if disk space is available. This means if the DR backup is scheduled to run two times per day,
then 14 metadata backups should be maintained.

For the metadata backup phase, the default storage policy retention is 60 days and 60 cycles. A general best practice is
that the metadata should be saved based on the longest data being retained. If data is being sent off-site on tape for ten
years, a copy of the DR database should be included with the data.

Metadata Security
Securing the location where the DR Metadata is copied to is critical since all security and encryption keys are maintained
in the CommServe database. If the metadata is copied to removable drives or network locations, best practices
recommend using disk-based encryption.

CommServe® Recovery Assistant Tool


The CommServe Recovery Assistant tool is used to restore the CommServe database from the DR backup. The tool is
used to rebuild the CommServe server on the same or different computer, change the name of the CommServe host and
update the CommCell license.

Commvault® Education Services Page 24 of 178


V11 SP18 Commvault® Engineer February 2020

CommServe® Availability
High availability for the CommServe® server is essential to allow normal CommCell® operations to run. If the CommServe
server goes offline, data protection and recovery jobs are affected.

This is especially important when considering the following key points:

• Meeting backup windows – During data protection jobs, if the CommServe server is not
reachable, the client continues backing up data to a MediaAgent by default for 20 minutes. The
‘Network Retries’ determines the maximum time interval and number of attempts to contact the
CommServe system. The default is 40 retries at 30 second intervals.
• Restores – The CommServe server must be available to browse and recover data within a
CommCell environment.
• Deduplication database consistency – In the event of a CommServe failure, all Deduplication
Databases (DDBs) within a CommCell environment will be in an inconsistent state. When the
CommServe metadata is restored, all DDBs must be brought back to a consistent state. This
process brings the DDBs to a state as they existed based on the point-in-time of the CommServe
database restore point. This could result in losing some backup data if the backups completed after
the most recent CommServe DR backup.
• Archive stub recalls – When using Commvault archiving, stub recalls require the CommServe
server to be present. The HSM recall service redirects all item retrieval requests to the CommServe
server which then locates which MediaAgent and media contains the data.

Hot / Cold Standby


A hot or cold standby CommServe® server consists of a physical or virtual machine with the CommServe software
preinstalled. The DR backup Export process directs metadata exports to the standby CommServe server. In the event that
the production CommServe server is not available, the standby CommServe server can quickly be brought online. When
using a hot / cold standby CommServe server consider the following key points:

• It is critical that both the production and standby CommServe servers are patched to the same
level. After applying updates to the production CommServe server, ensure the same updates are
applied to the standby CommServe server.

• Multiple standby CommServe servers can be used. For example, an on-site standby and an off-site
DR CommServe server. Use post script processes to copy the raw DR Metadata to additional
CommServe servers.

• A standby CommServe server can be a multi-function system. The most common multi-function
system would be installing the CommServe software on a MediaAgent.

• If a virtual environment is present, consider using a virtual standby CommServe server. This avoids
problems associated with multi-function standby CommServe servers and eliminates the need to
invest in additional hardware. Ensure the virtual environment is properly scaled to handle the extra
load that may result when activating the virtual standby CommServe server.

Virtualization
Some customers with virtual environments are choosing to virtualize the production CommServe server. A virtualized
CommServe server has an advantage of using the hypervisors high availability functionality (when multiple hypervisors
are configured in a cluster) which reduces costs since separate CommServe hardware is not required. Although this
method could be beneficial, it should be properly planned and implemented.

Commvault® Education Services Page 25 of 178


V11 SP18 Commvault® Engineer February 2020

If the virtual environment is not properly scaled, the CommServe server could become a bottleneck when conducting data
protection jobs. In larger environments where jobs run throughout the business day, the CommServe server activity may
have a negative performance impact on production servers.

When virtualizing the CommServe server, it is still critical to run the CommServe DR backup. In the event of a disaster, the
CommServe server may still have to be reconstructed on a physical server. Do not rely on the availability of a virtual
environment in the case of a disaster. Follow normal Commvault software best practices in protecting the CommServe
metadata.

Clustering
The CommServe® server can be deployed in a clustered configuration. This provides high availability for environments
where CommCell operations run 24/7. Clustering the CommServe server is a good solution in large environments where
performance and availability are critical.

Note that a clustered CommServe server is not a DR solution, therefore a standby CommServe server must be planned
for at a DR site.

Another benefit for using a clustered CommServe server is when using Commvault OnePass® archiving. Archiving
operations are configured to create stub files which allow end users to initiate recall operations. For the end user recall to
complete successfully the CommServe server must be available. Having the CommServe server clustered ensures that
recalls can be accomplished.

CommServe Failover
CommServe failover provides methods for log shipping CommServe database data to a pre-configured standby
CommServe server.

For more information, refer to the Commvault Online Documentation sections, 'Setup a Standby CommServe Host for
Failover' and 'Testing Disaster Readiness'.

Commvault® Education Services Page 26 of 178


V11 SP18 Commvault® Engineer February 2020

MediaAgent Scaling
MediaAgents are the multifunction workhorses of a Commvault® software environment. They facilitate the transfer of data
from source to destination, and hosts the deduplication database, metadata indexes, and run analytic engines.

For MediaAgent resource requirements and guidelines, refer to the Commvault Online Documentation.

MediaAgents responsibilities include the following functions:

• Data Mover – moves data during data protection, data recovery, auxiliary copy, and content
indexing jobs.
• Deduplication Database (DDB) – hosts one or more deduplication databases on high speed solid
state or PCI storage.
• Metadata indexes – hosts both V1 and V2 indexes on high speed dedicated disks.
• Analytics – runs various analytics engines including data analytics, log monitoring, web analytics,
and the Exchange index for the new Exchange Mailbox agent.

Data Mover Role


The MediaAgent is the high-performance data mover that transmits data from source to destination, such as from a client
to a library during data protection operations or vice-versa during data recovery. They are used during auxiliary copy jobs
when data is copied from a source library to a destination library. The MediaAgent software can be installed on most
operating systems in physical, virtual, and clustered environments. Note that all tasks are coordinated by the
CommServe® server.

Commvault® Education Services Page 27 of 178


V11 SP18 Commvault® Engineer February 2020

MediaAgent and Data Movement


There is a basic rule that all data must travel through a MediaAgent to reach its destination. One exception to this rule is
when conducting Network Data Management Protocol (NDMP) dumps directly to tape media. In this case, the MediaAgent
is used to execute the NDMP dump and no data travels through the MediaAgent. This rule is important to note as it affects
MediaAgent placement.

Since all data moving to/from protected storage must move through a MediaAgent, resource provisioning for MediaAgent
hosts (e.g., CPU, memory, and bandwidth) must be adequate for both the volume and the concurrency of data movement
you expect it to handle.

MediaAgent Device Control


A MediaAgent provides device control over media changers and removable media devices - and writers to disk devices.
This control defines the path upon which data moves to/from protected storage. In addition to normal device integrity
checks, the MediaAgent can validate the integrity of data stored on the media during a recovery operation and validate the
integrity of the data on the network during a data protection operation.

In the scenario where the MediaAgent component is co-located on the same host as the client agent, the exchange of
data is contained within the host. This is called a SAN MediaAgent configuration, or sometimes referred to as LAN-free
backups, and has its advantages of keeping data off potentially slower TCP/IP networks by using local higher performance
transmission devices (e.g., Fibre Channel, SCSI, etc.). On the other hand, a MediaAgent component located on a host by
itself can provide dedicated resources and facilitate exchange of data over longer distances using TCP/IP (e.g., LAN,
WAN, etc.).

MediaAgent Hosting Functions


The MediaAgent component also achieves additional functions other than moving data. First, the MediaAgent hosts the
index directory. Every protection job allowing granular recovery must be indexed. The MediaAgent oversees indexing the
jobs and keeping the indexing information in the index directory. If Commvault® deduplication is enabled on a disk or a
cloud library, the MediaAgent also hosts the deduplication database containing the deduplication information. Finally, if
Data Analytics is in use, it requires the Analytics Engine to be installed on the MediaAgent. MediaAgent data
movement overview

Commvault® Education Services Page 28 of 178


V11 SP18 Commvault® Engineer February 2020

Deduplication Database
The Deduplication Database (DDB) maintains all signature records for a deduplication engine. During data protection
operations, signatures are generated on data blocks and sent to the DDB to determine if data blocks are duplicate or
unique. During data aging operations, the DDB is used to decrement signature counters for blocks from aged jobs and
subsequently prune signatures, and block records when the signature counter reaches zero. For these reasons, it is
critical that the DDB is located on high performance, locally attached solid state or PCI storage technology.

Metadata Indexes
Commvault® software uses a distributed indexing structure that provides for enterprise level scalability and automated
index management. This works by using the CommServe® database to only retain job-based metadata such as chunk
information, which keeps the database relatively small. Detailed index information, such as details of protected objects is
kept on the MediaAgent. The index location can maintain both V1 and V2 indexes. Ensure the index location is on high
speed dedicated disks.

Analytics
One or more analytics engines can be installed on a MediaAgent. The following provides a high-level overview of the
commonly used analytics engines:

• Data analytics – provides a view into unstructured data within an environment. Some capabilities
include:
o identifying old files and emails o
identifying multiple copies of large
files o removing unauthorized file
types
• Log monitoring – identifies and monitors any logs on client systems. The monitoring process is
used to identify specific log entries and set filters based on criteria defined within a monitoring
policy.
• Exchange index engine – maintains V2 metadata indexing information for the new Exchange
Mailbox Agent. It is recommended when using the Exchange index server that no other analytic
engines are installed on the MediaAgent hosting the index.

Physical vs. Virtual MediaAgent


Commvault recommends using physical MediaAgents to protect physical and virtual data. The advantages for using a
physical MediaAgent are: better performance, more versatility as a multi-purposed data mover (protect VMs and physical
data), and resiliency. If using a tape library, presenting it to a virtualized MediaAgent adds an additional layer of complexity
for configuration and troubleshooting (should an issue arise). A MediaAgent can be virtualized if all performance
requirements including CPU, RAM, index directory location and deduplication database location are being met.

Tip: Remote Site MediaAgents

You need to protect a smaller remote site and want to keep a local copy of data for quick restore. However, you
are concerned about hardware costs for a MediaAgent.

Solution: Virtualize the remote site MediaAgent and keep a shorter retention for the local copy, producing a
smaller footprint. Then replicate the data using DASH Copy to the main data center physical MediaAgent where it
can be kept for a longer retention.

Commvault® Education Services Page 29 of 178


V11 SP18 Commvault® Engineer February 2020

Indexing

Commvault® Education Services Page 30 of 178


V11 SP18 Commvault® Engineer February 2020

Indexing Overview
Commvault® software uses a distributed indexing structure that provides for enterprise-level scalability and automated
index management. This works by using the CommServe® database to only retain job-based metadata such as chunk
information, which keeps the database relatively small. Detailed index information, such as details of protected objects is
kept on the MediaAgent managing the job. When using Commvault deduplication, block and metadata indexing are
maintained within volume folders in the disk library.

Job summary data maintained in the CommServe database keeps track of all data chunks being written to media. As each
chunk completes, it is logged in the CommServe database. This information also tracks the media used to store the
chunks.

Commvault® Version 11 introduces the new V2 indexing model, which has significant benefits over its predecessor.
MediaAgents can host both V1 and V2 indexing in the index directory. The primary difference between these two indexing
models, relative to the index directory sizing, are as follows:

• V1 indexes are pruned from the directory based on the days and the index cleanup percentage
settings in the MediaAgent catalog tab.
• V2 indexes are persistent and not pruned from the index directory unless the backup set associated
with the V2 index is deleted

Indexed and Non-Indexed Jobs


Commvault® software defines data protection jobs as indexed or non-indexed job types. Indexes are used when data
protection jobs require indexing information for granular level recovery. Non-indexed jobs are database jobs where
recovery is only performed at the database level. Indexed-based operations require access to the index directory for
creating or updating index files. Non-indexed based jobs do not require index directory access since the backup jobs use
the CommServe database to update job summary information.

Indexed Based Jobs:

• File system backup and archive operations


• Exchange mailbox level backup and archive operations
• SharePoint document level backup and archive operations Non-Indexed Based Jobs:

• Database jobs protected at the database level


• Some database agents including Oracle and Exchange block level backups use indexes

Traditional Indexing (V1)


Job summary data maintained in the CommServe® database keeps track of all data chunks being written to media. As
each chunk completes it is logged in the CommServe database. This information also maintains media identities where
the job was written to, which can be used when recalling off-site media back for restores. This data is held in the database
for as long as the job exists. This means that even if the data has exceeded defined retention rules, the summary
information remains in the database until the job has been overwritten. An option to browse aged data is used to browse
and recover data on media that has exceeded retention but has not been overwritten.

Detailed index information for jobs is maintained in the MediaAgent's index directory. This information contains:

• Each object
• Which chunk the data is in
• The chunk offset defining the exact location of the data within the chunk
The index files are stored in the index directory and after the data is protected to media, an archive index operation is
conducted to write the index to media. This method automatically protects the index. The archived index is also used if the

Commvault® Education Services Page 31 of 178


V11 SP18 Commvault® Engineer February 2020

index directory is not available, when restoring the data at alternate locations, or if the indexes have been pruned from the
index directory location.

One major distinction between Commvault® software and other backup products is that Commvault uses a distributed self-
protecting index structure. The modular nature of the indexes allows the small index files to automatically be copied to
media at the conclusion of data protection jobs.

Indexing Operations
The following steps provide a high-level overview of indexing operations during data protection and recovery operations.

Data Protection Operation and Indexing Processes

1. A new data protection operation is initiated:


a. A full backup generates a new index.
b. An incremental or differential appends to an existing index.
2. The index is located (incremental / differential) or a new index file is created (full) and the job
begins.
3. After each successful chunk is written to media:
a. The chunk is logged in the CommServe SQL database.
b. The index directory is updated.
4. Once the protection phase of the job is completed:
5. The index is finalized.
The index file in the index directory is copied to media to automatically protect the index files.

Data Recovery Operation and Indexing Process

1. A browse or find operation is initiated. Restore by job operations do not use the index directory.
2. The index file is accessed / retrieved
a. If the index is in the index directory it is accessed, and the operation continues.
b. If the index is not in the index directory, it is automatically retrieved from media

Commvault® Education Services Page 32 of 178


V11 SP18 Commvault® Engineer February 2020

Backup and recovery process using V1 indexing

If media is not in the library, the system prompts you to place the media in the library.
During a browse operation, if it is known that the media is not in the library, use the 'List Media' button to determine which
media is required for the browse operation.

Self-Maintaining Indexing Structure


The index directory is self-maintaining based on two configurable parameters, 'Index Retention Time in Days' and 'Index
Cleanup Percent'. Index files are kept in the index directory for a default of 15 days or until the directory disk reaches 90%
disk capacity. A smaller index directory location may result in index files being pruned before the 15-day time period
expires if the cleanup percentage is reached first. Index files are pruned from the index based on least recently accessed.

It is important to note that the 'Time in Days' and 'Index Cleanup Percent' settings use OR logic to determine how long
indexes will be maintained in the index directory. If either one of these criteria are met, index files are pruned from the
directory. When files are pruned from the index, they are deleted based on access time; deleting the least frequently
accessed files first. This means that older index files that have been more recently accessed may be kept in the directory
while newer index files that have not been accessed are deleted.

Indexing Service
The Indexing Service process on the MediaAgent is responsible for cleaning up the index directory location. This service
runs every 24 hours. Any indexes older than 15 days are pruned from the index directory. If the directory location is above
the 90% space threshold, additional index files are pruned.

Commvault® Education Services Page 33 of 178


V11 SP18 Commvault® Engineer February 2020

V1 index cleanup process

Commvault® Education Services Page 34 of 178


V11 SP18 Commvault® Engineer February 2020

V2 Indexing Overview
Commvault® version 11 introduces the next generation indexing called indexing V2. It provides improved performance and
resiliency, while shrinking the size of index files in the index directory and in storage.

V2 indexing works by using a persistent index database maintained at the backup set level. During subclient data
protection jobs, log files are generated with all protected objects and placed into the index database.

Commvault® Education Services Page 35 of 178


V11 SP18 Commvault® Engineer February 2020

Index Process for Data Protection Jobs


Indexing data is located on a persistent index database. One index database maintains records for all objects within a
backup set, so all subclients within the same backup set writes to the same index database. The database is created and
maintained on the MediaAgent once the initial protection job of a subclient within a backup set completes. Index
databases are in the index directory location on the MediaAgent.

During data protection jobs, log files are generated with records of protected objects. The maximum size of a log is 10,000
objects or a complete chunk. Once a log is filled or a new chunk is started, a new log file is created, and the closed log is
written to the index database. By writing index logs to the database while the job is still running, the indexing operations of
the job runs independent of the actual job; allowing a job to complete even if log operations are still committing information
to the database.

At the end of each job, the log files are written to storage along with the job. This is an important distinction from traditional
indexing, which copies the entire index to storage. By copying just logs to storage, indexes require significantly less space
in storage, which is a benefit when protecting large file servers. Since the index database is not copied to storage at the
end of each job, a special IndexBackup subclient is used to protect index databases

Commvault® Education Services Page 36 of 178


V11 SP18 Commvault® Engineer February 2020

Index Database Backup Operations


During data protection jobs, logs are committed to the index database and are also kept in the index directory. If an index
database is lost or becomes corrupt, a backup copy of the index database is restored from media and the log files in the
index directory are replayed to the database. If the index directory location is lost, the database and logs are restored from
media and the logs are replayed into the database. These recovery methods provide complete resiliency for index
recovery.

The index databases are protected with system created subclients, which are displayed under the Index Servers
computers group in the CommCell® browser. An index server instance is created for each storage policy. An index backup
operation is scheduled to run every twenty-four hours. During the backup operation, index databases are checked to
determine if they qualify for protection. The two primary criteria to determine if a database qualifies for protection are one
million changes or 7 days since the last backup.

To access the index backup subclient properties

1. Expand Client Computer Groups | The Storage Policy pseudo client | Big Data Apps | classicIndexInstance |
Right-click the default subclient.

2. The description field confirms that this is an index backup subclient.

Commvault® Education Services Page 37 of 178


V11 SP18 Commvault® Engineer February 2020

To edit the index backup schedule

1. Expand Policies | Schedule Policies | Right-click the System Created for IndexBackup subclients schedule policy |
Edit.

2. The description field confirms that this is the schedule policy used for Index backups.

3. Highlight the schedule and click Edit.

Commvault® Education Services Page 38 of 178


V11 SP18 Commvault® Engineer February 2020

4. By default, the index backups are scheduled to run three times a day, but this can be modified as needed.

5. Once modified, click OK to apply changes.

Commvault® Education Services Page 39 of 178


V11 SP18 Commvault® Engineer February 2020

Index Checkpoint and Backup Process


If the index database qualifies, three actions occur:

• A database checkpoint
• The database is compacted
• The database is backed up to the storage policy associated with the index server subclient

Database Checkpoint
Checkpoints are used to indicate a point-in-time in which a database was backed up. Once the database is protected to
storage, any logs that are older than the checkpoint can be deleted from the index directory location.

Database Compaction
During data aging operations, deleted jobs are marked in the database as unrecoverable, but objects associated with the
job remain in the database. The compaction operation deletes all aged objects and compacts the database.

Database Backup
Once the checkpoint and compaction occur, the database is backed up to the primary copy location of the storage policy.
Three copies of the database are kept in storage and normal storage policy retention rules are ignored.

During the index backup process, the database is frozen and Browse or Find operations cannot be run against the
database. Each database that qualifies for backup is protected sequentially minimizing the freeze time. Data protection
jobs are not affected by the index backup.

Commvault® Education Services Page 40 of 178


V11 SP18 Commvault® Engineer February 2020

Index Database Recovery Process


If an index database is lost or corrupt, or if the entire index directory location is lost, indexes are automatically recovered.

The index recovery process works as follows:

1. The index database is restored from storage.


2. If index logs are more recent than the index database checkpoint that is in the index directory
location, they are automatically replayed into the index database.
3. If index logs are not in the index directory location, the logs are restored from storage and replayed
into the index database.

Commvault® Education Services Page 41 of 178


V11 SP18 Commvault® Engineer February 2020

Index Process Using Multiple MediaAgents


When multiple MediaAgents are configured to use a shared library, the MediaAgent used for the first protection job of a
backup set is designated as the database hosting MediaAgent. During subsequent operations, if another MediaAgent is
designated as the data mover, it does not copy the database to its local index directory. Instead, the data mover
MediaAgent generates logs and ship them to the database hosting MediaAgent which are committed to the index
database. If the hosting MediaAgent is not available, data protection operations continue uninterrupted. Once the hosting
MediaAgent is online, the logs are shipped and committed to the index database.

Commvault® Education Services Page 42 of 178


V11 SP18 Commvault® Engineer February 2020

Upgrading from V1 to V2 Indexing


When Commvault® software version 11 was initially released, only the file system agents supported V2 indexing. Since
then an increased number of agents have been added to the support list. However, the agents use V2 indexing, but only if
installed after support was added to the agent. If the agents were installed before and upgraded to the latest version, they
will still use V1 indexing.

If needed, file system agents can be upgraded to V2 indexing by using a Workflow. The Workflow is available for
download from the Commvault® Store.

Note that currently if the client has completed backup jobs, only the file system agent can be upgraded.

Upgrade Requirements
• The client and its agents must have a valid license applied.
• The BackupSet must be scheduled for backups.
• The client cannot be de-configured
• The Virtual Server Agent will be upgraded only if:
• The CommServe server is V11 SP13 or above.
• The VSA client does not have any completed or running backup jobs.
Any client that does not meet the requirements, will be skipped during the upgrade process.

Using the Upgrade Workflow

First Steps:

It is important to prepare for the upgrade. The following steps must be completed before running the script.

Commvault® Education Services Page 43 of 178


V11 SP18 Commvault® Engineer February 2020

• The Workflow can be executed against clients or client computer groups. Take note of the client or
client computer group names that require an upgrade.
• Note the agents installed on the clients. The Workflow can be executed only against clients with the
same agent set. If the clients have different agents install, run the Workflow multiple times.
• Note how many clients you want to upgrade in parallel. It can be from 1 to 20.
• Download the 'Upgrade to Indexing V2' Workflow from Commvault® Store.
• Run a full or synthetic full backup for the clients.
• Ensure that no jobs are running for the client.

Once these steps are complete, execute the Workflow.

To execute the workflow

1. Click Workflows.

2. Right-click the workflow | All Tasks | Execute.

3. Select if the workflow is executed against specific clients or specific computer groups.

4. From the list, select the computers to upgrade.

5. Select the agent type to upgrade.

6. Define the number of clients to upgrade in parallel.

7. Click OK to launch the workflow.

8. The workflow progress is displayed in the job controller.


Commvault® Education Services Page 44 of 178
V11 SP18 Commvault® Engineer February 2020

STORAGE DESIGN

Commvault® Education Services Page 45 of 178


V11 SP18 Commvault® Engineer February 2020

Storage Infrastructure Design


Commvault® software logically addresses storage systems to allow virtually any library type to be used. The three primary
library types are disk, tape, and cloud.

Disk libraries best practices:

• If using DAS or SAN, format mount paths using a 64KB block size.
• If using DAS or SAN, try to create multiple mount path. For instance, if there are 10 mount paths,
and there is a maintenance job, such as a defrag job running on one, the mount path can be set to
read-only, leaving 90% of the disk library available for backup jobs.
• Set mount path usage to Spill and Fill, even if using only one mount path. If additional mount paths
are added later, the streams will spill as expected.
• Share the disk library if required.
• From the CommCell® console, validate the mount path speed and document for future reference.

Commvault® Education Services Page 46 of 178


V11 SP18 Commvault® Engineer February 2020

Disk Library Design


A disk library is a logical container which is used to define one or more paths to storage called mount paths. These paths
are defined explicitly to the location of the storage as a drive letter or a UNC path. Within each mount path, writers are
allocated which defines the number of concurrent streams for the mount path.
There are three primary types of disk libraries:

• Dedicated – disk libraries are created by first adding a disk library entity to the MediaAgent using
either the right-click All Tasks menu or the Control Panel's Expert Storage Configuration tool. One or
more mount paths can be created/added to the library. Mount Paths are configured as Shared Disk
Devices. The Shared Disk Device in a dedicated disk library has only one Primary Sharing Folder.
• Shared – disk libraries are libraries with more than one Primary Sharing Folder configured on a
Shared Disk Device. This enables other MediaAgents access to the same shared volume resource. A
shared disk library can then be created and the 'Shared Disk Devices' added to the library. One path
to the shared folder can be direct while the others are Common Internet File System (CIFS) shared
directory paths. CIFS protocol is used to manage multiple MediaAgent access to the same directory.
For UNIX hosted MediaAgents, Network File System (NFS) protocol can be used. NFS shared disks
appear to the MediaAgent as local drives.
• Replicated – disk libraries are configured like a shared disk library with the exception that the
Shared Disk Device has a replicated data path defined to a volume accessible via another
MediaAgent. Replicated folders are read-only and replication can be configured for use with third
party replication hardware.

Commvault® Education Services Page 47 of 178


V11 SP18 Commvault® Engineer February 2020

There are three methods that disk library data paths can be configured:

• Network Attached Storage or NAS


• Storage Area Network or SAN
• Direct Attached Storage or DAS

• The following explanations assume Commvault deduplication is being used

Network-Attached Storage (NAS)


Network-Attached Storage provides the best connection method from a resiliency standpoint since the storage is
accessed directly through the NAS device. This means that by using a Common Internet File System (CIFS) or a Network
File System (NFS), Universal Naming Convention (UNC) paths can be configured to read and write directly to storage. In
this case, the library can be configured as a shared library, where all MediaAgents can see stored data for data protection
and recovery operations.

Disk library using Network Attached Storage (NAS)

Storage Area Network (SAN)


Storage Area Networks or SANs are very common in many data centers. SAN storage can be zoned and presented to
MediaAgents using either Fibre Chanel or iSCSI. In this case, the zoned storage is presented directly to the MediaAgent
providing Read / Write access to the disks.

When using SAN storage, each building block should use a dedicated MediaAgent, DDB and disk library. Although the
backend disk storage in the SAN can reside on the same disk array, it should be configured in the Commvault® software
as two separate libraries; where Logical unit numbers (LUNs) are presented as mount paths in dedicated libraries for
specific MediaAgents.

SAN storage provides fast and efficient movement of data but, if the building block MediaAgent fails, data cannot be
restored. When using SAN storage, either the MediaAgent can be rebuilt or the disk library can be re-zoned to a different
MediaAgent. If the disk library is rezoned, it must be reconfigured in the Commvault® software to the MediaAgent that has
access to the LUN.

Commvault® Education Services Page 48 of 178


V11 SP18 Commvault® Engineer February 2020

Disk library using Storage Area Network (SAN)

Direct Attached Storage (DAS)


Direct attached storage is when the disk library is physically attached to the MediaAgent. In this case, each building block
is completely self-contained. This provides for high performance but does not provide resiliency. If the MediaAgent
controlling the building block fails, data stored in the disk library cannot be recovered until the MediaAgent is repaired or
replaced. Keep in mind that, in this case, all the data in the disk library is still completely indexed and recoverable, even if
the index directory is lost. Once the MediaAgent is rebuilt, data from the disk library can be restored. Disk library
using Direct Attached Storage (DAS)

Commvault® Education Services Page 49 of 178


V11 SP18 Commvault® Engineer February 2020

Data Server (SAN, iSCSI, IP)


The Data Server feature allows the sharing of block-based storage among multiple MediaAgents. It also addresses the
traditional limitation, which prevents sharing disk libraries between Linux and Windows® MediaAgents. Security is
increased since the Data Server uses a local service account to access the storage and then presents it to other
MediaAgents on an as-needed basis.

• When configuring the Data Server feature, there are three types of connections to
storage/MediaAgent:
• Data Server IP - A MediaAgent presents local storage to other MediaAgents through the IP network
as an NFS volume.
• Data Server SAN - A Linux MediaAgent acts as a proxy to present storage to other MediaAgents
using Fibre Channel connections.
• Data Server iSCSI - A Linux MediaAgent acts as a proxy to present storage to other MediaAgents
using iSCSI connections.

Commvault® Education Services Page 50 of 178


V11 SP18 Commvault® Engineer February 2020

Tape Library Design

A tape library is a library where media can be added, removed, and moved between multiple libraries. The term removable
media is used to specify various types of removable media supported by Commvault® software, including tape and USB
disk drives, which can be moved between MediaAgents for data protection and recovery operations.

Tape libraries best practices:

• Configure the tape library cleaning method to use. Software cleaning (Commvault) or hardware
cleaning (library) can be used, but not both. A choice must be made.
• Share the tape library if required.
• Create a barcode pattern for cleaning tapes and assign it to the Cleaning Media group.
• If using multiple scratch media groups, create scratch groups and barcode patterns to use.
• Validate drive speed (from the CommCell console) and document for future reference.
Tape libraries are divided into the following components:

• Library – is the logical representation of a library within a CommCell® environment. A library can be dedicated to a
MediaAgent or shared between multiple MediaAgents. Sharing of removable media libraries can be static or
dynamic depending on the library type and the network connection method between the MediaAgents and the
library.
• Master drive pool – is a physical representation of drives of the same technology within a library. An example of
master drive pools would be a tape library with different drive types like LTO4 and LTO5 drives within the same
library.  Drive pool – is used to logically divide drives within a library. The drives can then be assigned to protect
different jobs.

Commvault® Education Services Page 51 of 178


V11 SP18 Commvault® Engineer February 2020

• Scratch pool – is defined to manage scratch media, also referred to as spare media, which can then be assigned
to different data protection jobs. o Custom scratch pools – can be defined and media can be assigned
to each pool.
o Custom barcode patterns – can be defined to automatically assign specific media to different
scratch pools or media can manually be moved between scratch pools in the library.

Commvault® Education Services Page 52 of 178


V11 SP18 Commvault® Engineer February 2020

GridStor® Technology
Storage policies are used to define one or more paths data takes from source to destination. When a MediaAgent and a
client agent are installed on the same server, a 'LAN Free' or 'preferred path' can be used to backup data directly to
storage. Network based clients can backup through a MediaAgent using a 'default path', a 'failover' path, or 'round-robin'
load balancing paths.

Configure the following data paths for the MediaAgent:

• Preferred Data Path


• Default Data Path
• Alternate Data Path
• Data Path Override

Preferred Data Path


If the client and MediaAgent code are co-hosted on the same system, and the system has direct access to the target
library through Direct Attach Storage (DAS) or Storage Area Network (SAN), the MediaAgent always uses that direct
connectivity to write data. This is called a 'preferred data path,' which overrides any data path configurations on the
storage policy copy.

Commvault® Education Services Page 53 of 178


V11 SP18 Commvault® Engineer February 2020

Preferred path concept

Default Data Path


Right-click the desired storage policy copy | Click Properties | Data Path tab

When configuring storage policy copy data paths, by default, the first data path defined becomes the 'Default Data Path.' If
multiple data paths are defined, the 'Default Data Path' is the first one to be used. This path can be modified later.

Alternate Data Path Configuration


Right-click the desired storage policy copy | Click Properties | Data Path tab

• Failover
• Round-Robin
This Commvault® software feature is called GridStor™ technology. For more information, about GridStor™ features, refer
to the Commvault® Online Documentation.

Failover Alternate Data Path


When used in failover mode, an alternate data path is used only when the default data path becomes unavailable or
overloaded. The data path automatically fails over immediately or after a configurable number of minutes. Failover
alternate data path concept

Commvault® Education Services Page 54 of 178


V11 SP18 Commvault® Engineer February 2020

Round-Robin Alternate Data Path


When configured as round-robin, client streams are sent alternatively to the default data path and all available alternate
data paths. This provides a load-balancing mechanism that takes full advantage of all available resources. Round-
robin alternate data path concept

Commvault® Education Services Page 55 of 178


V11 SP18 Commvault® Engineer February 2020

Data Path Properties


Data path properties can be individually configured in the data path tab of the storage policy copy. To configure options,
highlight the path and click the Properties button.

The following settings can be customized for a data path:

• Hardware compression
• Hardware encryption
• Chunk size
• Block size

Hardware Compression
For data paths defined to write to tape libraries, the 'Hardware Compression' option is enabled by default. If a tape drive
supports hardware compression, then this option is enabled in the General tab of the Data Path Properties.

Hardware Encryption
For tape drives that support hardware encryption, Commvault® software manages configuration settings and keys. Keys
are stored in the CommServe® database and can optionally be placed on the media to allow recovery of data if the
CommServe database is not available at time of recovery. The data path option 'Via Media Password' places the keys on
the media. The 'No Access' option only stores the keys in the CommServe database.

If the 'Via Media Password' option is chosen, it is essential that a Media Password be configured, or the encrypted data
can be recovered without entering any password during the recovery process. A global Media Password can be set in the
'System Settings' in the Control Panel applet. Optionally a storage policy level password can be set in the Advanced tab of
the Storage Policy Properties.

Commvault® Education Services Page 56 of 178


V11 SP18 Commvault® Engineer February 2020

Chunk Size
Chunk sizes define the size of data chunks that are written to media and is also a checkpoint in a job. The default size for
disk is 4GB. The default size for tape is 8GB for indexed based operations or 16GB for non-indexed database backups.
The data path 'Chunk Size' setting can override the default settings. A higher chunk size results in a more efficient data
movement process. In highly reliable networks, increasing chunk size can improve performance. However, for unreliable
networks, any failed chunks must be rewritten, so a larger chunk size could have a negative effect on performance.

Block Size
The default block size Commvault® software uses to move and write data to media is 64KB. This setting can be set from
32KB – 2048KB. Like chunk size, a higher block size can increase performance. However, block size is hardware
dependent. Before modifying this setting, ensure all hardware being used at your production and DR sites support the
higher block size. If you are not sure, don't change this value.

When writing to tape media, changing the block size only becomes effective when Commvault software rewrites the OML
header on the tape. This is done when new media is added to the library, or existing media is recycled into a scratch pool.
Media with existing jobs continue to use the block size established by its OML setting.

When writing to disk, it is important to match the block size data path setting to the formatted block size of the disk.
Matching block sizes can greatly improve disk performance. The default block sizes operating systems use to format disks
is usually much smaller than the default setting in the Commvault software.

It is strongly recommended to format disks to the block size being used in Commvault software.
Consult with your hardware vendor’s documentation and operating system settings to properly format
disks.

Commvault® Education Services Page 57 of 178


V11 SP18 Commvault® Engineer February 2020

Cloud

What is Cloud?
Commvault® is a leader in the protection, management, and migration of cloud infrastructure. Whether it is a public cloud
environment (cloud provider), a private cloud infrastructure (on-premises) or a hybrid cloud made of both cloud and
onpremises, Commvault® Software offers tools to handle ever-growing cloud environments.

Here is some an example of available tools:

• Application Agents
• Virtual Server Agents
• Application-Aware features
• Workflows
Before deciding which options to use, it is first important to collect information about the environment to protect, as well as
understanding the differences between cloud offerings. This can significantly impact the features available to use.

What is a Cloud?
Several cloud offerings and technologies can be used when building a cloud infrastructure. They are classified in the
following major categories, which basically defines the responsibility boundaries between the customer and the cloud
provider:

• Private cloud (or on-premises) - a cloud infrastructure hosted on-premises where the customer is
responsible for managing the entire stack (hardware and software).
• Infrastructure-as-a-Service (or IaaS) - A public cloud environment hosted by a cloud provider
allowing a customer to run virtual machines. The cloud vendor is responsible to manage the
hardware (physical servers, storage, and networking), while the customer is responsible to create

Commvault® Education Services Page 58 of 178


V11 SP18 Commvault® Engineer February 2020

and maintain virtual machines. This includes maintaining the operating system, applications, and
data.

• Platform-as-a-Service (or PaaS) - As the name suggests, the cloud vendor provides a platform that
typically includes the hardware, the operating system, the database engine, a programming
language execution environment, as well as web servers. The customer is not responsible to
maintain any virtual servers and can focus on using the framework to develop applications using
databases. The customer is therefore responsible to maintain the applications and the data. Good
examples of PaaS are Microsoft® Azure Database services and Amazon Relational Database
Services (RDS).
• Software-as-a-Service (or SaaS) - A cloud-based application for which the cloud provider is
responsible in its entirety. This includes the application itself, which is offered 'on-demand' to the
customer. A good example of SaaS is Microsoft® Office 365.
Responsibility boundaries by cloud offering

Commvault® Education Services Page 59 of 178


V11 SP18 Commvault® Engineer February 2020

General Commvault® Feature


The cloud offerings in use within the cloud infrastructure dictate the Commvault® Software agents and features to use. A
clear understanding of these features is important since parity is not the same across all cloud offerings. For instance,
when using Platform-as-a-Service, Commvault® Software is bound to use the cloud vendor APIs which can limit the
capabilities of the software. When using Infrastructure-as-a-Service, access to storage may be limited, preventing to use
advanced features such as Commvault IntelliSnap® snapshots. The following graphic provides an overview of the feature
parity across offerings.

Disaster Recovery and Cloud


Over the last few years, not only is cloud computing included in disaster recovery plans, but for some organizations, it is
the main disaster recovery solution. Cloud computing billing is built on a resource usage model. The more resources you
use, the more you pay. This makes it an ideal solution to host a standby disaster recovery environment that can be
brought online. In several cases, it is less costly than maintaining a complete disaster recovery infrastructure in a
secondary site. Cloud storage can be leveraged to host a copy of the backup data, ready to be restored if needed.
Furthermore, the Commvault® Live Sync feature can be used to recover the backup data automatically, significantly
reducing recovery time objectives (RTO).

Disaster Recovery using Cloud Computing


In this scenario, the data center is protected, and cloud computing is used for the recovery of the entire data center,
should a disaster occur. A copy of the backup data is kept locally in the data center. A secondary copy is also sent to a
cloud library. The disaster recovery workflow in such a scenario is as follows:

1. The main data center VMs, physical servers, and applications are backed up to a local deduplicated
library.
2. A predefined schedule (i.e., every 30 minutes) copies the backup data to a deduplicated cloud
library using the Commvault® Dash Copy feature.
3. If the data center is lost in a disaster, the data recovery is initiated from the cloud library.
Commvault® Education Services Page 60 of 178
V11 SP18 Commvault® Engineer February 2020

4. The virtual machines are recovered and converted into cloud provider VMs. For instance, VMWare
virtual machines, protected in the data center could be recovered and converted into Microsoft ®
Azure VMs.
5. If needed, the file system of physical servers is restored in cloud provider VMs.
6. Applications are restored either in VMs, Platform-as-a-Service (PaaS) or Software-as-a-Service (SaaS)
instances.
7. Applications are brought online and users can connect.
Disaster Recovery to Cloud Workflow

Commvault® Education Services Page 61 of 178


V11 SP18 Commvault® Engineer February 2020

Cloud Computing and Storage


Cloud storage is an emerging technology that is quickly being integrated into data centers for its availability and, in some
cases, lower Total Cost of Ownership (TCO). As a DR solution, however, there are still significant questions on its
effectiveness. The two biggest questions regarding cloud storage for DR are bandwidth availability and data security.

Using advanced features such as Commvault deduplication can greatly reduce the bandwidth requirements of backing up
to cloud storage. However, in a disaster situation where a significant amount of data must be restored, bandwidth can
become a serious bottleneck.

Data transfers are achieved using secured channels (HTTPS) and are optionally encrypted to further secure the data sent
to the cloud.

Cloud libraries best practices:


• Properly plan and analyze if the cloud library scenario meets the needs (i.e. restoring an entire
datacenter).
• If the link is shared with users, consider throttling Commvault ® bandwidth usage during business
hours.
• If the MediaAgent does not have direct access to the internet, define the proxy settings in the
Advanced tab of the cloud library configuration page.
• If the cloud library is accessed through a high-speed internet link (1GB or higher), consider tuning
the connection. For more information, refer to the Commvault Online Documentation, 'Cloud
Connection Performance Tuning' section.
• If using deduplication, by default, jobs are not aged and pruned unless the DDB is sealed. If you
want to age and prune jobs as soon as retention is met, configure micro pruning. For more

Commvault® Education Services Page 62 of 178


V11 SP18 Commvault® Engineer February 2020

information, refer to the Commvault Online Documentation, 'Configuring Micro Pruning on Cloud
Storage' section.

The list of supported cloud providers for Commvault® software grew over the years — up to 30 providers
as of Service Pack 14. For a complete list of supported providers, please refer to Commvault Online
Documentation.

Add a Cloud Library


If a cloud provider is used for the cloud library, access information is given by the provider. This includes the URL,
username, password or keys, and the container or bucket in which to store the data. This information is required in
Commvault® software when adding the cloud library.

A MediaAgent must be defined to act as a gateway and to send the data to the cloud. If the library is used for secondary
copies of data store in local library, it is recommended whenever possible to use the MediaAgent hosting the primary copy
to avoid unnecessary traffic. If the MediaAgent requires a proxy to reach the cloud, it can be defined during the cloud
library creation process by using the Advanced tab.

Commvault® Education Services Page 63 of 178


V11 SP18 Commvault® Engineer February 2020

To create a cloud library

1. Right-click Libraries | Add | Cloud Storage Library.

2. Provide the library a name

3. Provide the cloud provider storage type.

4. Select the MediaAgent that will access the cloud storage.

5. Select the authentication type to use to access the cloud storage.

6. Provide the DNS name of the provider storage service.

7. Provide the cloud storage connection credentials from the list or click create if they were not already configured.

8. Provide a meaningful name for the saved credentials.

9. Provide the connection credentials.

10. Click OK to save the credentials.

11. Provide the bucket name.

12. Select the storage class from the list.

13. Click OK to create the cloud storage.

Commvault® Education Services Page 64 of 178


V11 SP18 Commvault® Engineer February 2020

Disaster Recovery and Cloud


Over the last few years, not only is cloud computing included in disaster recovery plans, but for some organizations, it is
the main disaster recovery solution. Cloud computing billing is built on a resource usage model. The more resources you
use, the more you pay. This makes it an ideal solution to host a standby disaster recovery environment that can be
Commvault® Education Services Page 65 of 178
V11 SP18 Commvault® Engineer February 2020

brought online. In several cases, it is less costly than maintaining a complete disaster recovery infrastructure in a
secondary site. Cloud storage can be leveraged to host a copy of the backup data, ready to be restored if needed.
Furthermore, the Commvault® Live Sync feature can be used to recover the backup data automatically, significantly
reducing recovery time objectives (RTO).

Disaster Recovery using Cloud Computing


In this scenario, the data center is protected, and cloud computing is used for the recovery of the entire data center,
should a disaster occur. A copy of the backup data is kept locally in the data center. A secondary copy is also sent to a
cloud library. The disaster recovery workflow in such a scenario is as follows:

1. The main data center VMs, physical servers, and applications are backed up to a local
deduplicated library.
2. A predefined schedule (i.e., every 30 minutes) copies the backup data to a deduplicated cloud
library using the Commvault® Dash Copy feature.
3. If the data center is lost in a disaster, the data recovery is initiated from the cloud library.
4. The virtual machines are recovered and converted into cloud provider VMs. For instance,
VMWare virtual machines, protected in the data center could be recovered and converted into
Microsoft® Azure VMs.
5. If needed, the file system of physical servers is restored in cloud provider VMs.
6. Applications are restored either in VMs, Platform-as-a-Service (PaaS) or Software-as-a-Service
(SaaS) instances.
7. Applications are brought online and users can connect.

Disaster Recovery to Cloud using Live Sync


In this scenario, an additional automation layer is added. Instead of waiting after a disaster to recover VMs and
applications, data is automatically restored as soon as it reaches the cloud library. This significantly decreases recovery
Commvault® Education Services Page 66 of 178
V11 SP18 Commvault® Engineer February 2020

time objective (RTO) of systems but incurs larger costs as the cloud resource usage is increased. In this situation, the
'Disaster Recovery' workflow is used:

• The main data center VMs, physical servers, and applications are backed up to a local deduplicated
library.
• As soon as a backup completes, the data is copied to a deduplicated cloud library using the
Commvault® Dash Copy feature.
• As soon as the copy to the cloud library completes, a recovery process is automatically initiated.
• The virtual machines are recovered and converted into cloud provider VMs. For instance, VMWare
virtual machines, protected in the data center could be recovered and converted into Microsoft ®
Azure VMs.
• If needed, the file system of physical servers is restored in cloud provider VMs.
• Applications are restored either in VMs, Platform-as-a-Service (PaaS) or Software-as-a-Service (SaaS)
instances.
• If a disaster occurs, applications are brought online and users can connect.

Deduplication

Commvault® Education Services Page 67 of 178


V11 SP18 Commvault® Engineer February 2020

Components and Terminology


There are several components that comprise the Commvault® deduplication architecture:

• The Global Deduplication Policy – defines the rules for the Deduplication Engine. These rules
include:
o Deduplication Store location and configuration settings
o The Deduplication Database (DDB) location and configuration settings
• A Data Management Storage Policy – is configured as a traditional storage policy, where the
former also manages subclient associations and retention. Storage policy copies defined within the
Data Management policy are associated with Global Deduplication storage policies. This association
of the Data Management Storage Policy copy to a Global Deduplication Policy determines in which
Deduplication Store the protected data resides.
• Deduplication Database (DDB) – is the database that maintains records of all signatures for data
blocks in the Deduplication Store.
• Deduplication Store – contains the protected storage using Commvault deduplication. The store
is a disk library which contains non-duplicate blocks, along with block indexing information, job
metadata, and job indexes.
• Client – is the production client where data is being protected. The client has a file system and/or
an application agent installed. The agent contains the functionality to conduct deduplication
operations, such as creating data blocks and generating signatures.

Commvault® Education Services Page 68 of 178


V11 SP18 Commvault® Engineer February 2020

• MediaAgent – coordinates signature lookups in the DDB and writes data to a protected storage.
The signature lookups operation is performed using the DDB on the MediaAgent.

Deduplication Database Reconstruction


The Deduplication Database (DDB) is highly resilient and reconstruct operations can rebuild the database to match the
latest job and chunk information maintained in the CommServe® database.

In the unlikely event that the DDB becomes corrupt, the system automatically recovers the DDB from the most recent
backup. Once the DDB backup has been restored, a reconstruct process occurs which will ‘crawl’ job data since the last
DDB backup point. This brings the restored DDB to the most up-to-date state. Keep in mind that the more frequently DDB
backups are conducted, the shorter the ‘crawl’ period lasts to completely restore the DDB. Note that during this entire
recovery process, jobs that require the DDB must not be running.

How the DDB Reconstruct Works


During data protection jobs, as each chunk completes it is logged in the CommServe database. If the Deduplication
Database (DDB) needs to be restored, the chunk information is used to re-read signatures and add them to the DDB.
Upon initial restore of the DDB, the checkpoint at backup time is used to determine which chunks are more recent than
the restored database. An auxiliary copy operation then processes the chunk data, extracts block signatures from the job
metadata and adds the entries back into the DDB.

When using transactional DDB, the system checks the integrity of the database and the ‘DiskDB’ logs and attempts to
bring the database to an online consistent state. If this process succeeds, it only takes few minutes to bring the database
online. If the process is not successful, such as the case if the entire disk was lost, the process is automatically switched
to full reconstruct mode.

There are three methods available to reconstruct the deduplication database:

Commvault® Education Services Page 69 of 178


V11 SP18 Commvault® Engineer February 2020

• Delta Reconstruction – When using transaction deduplication, in the event of an unclean DDB
shutdown due to MediaAgent reboot or system crash, the ‘DiskDB’ logs can be used to bring the
DDB to a consistent state.
• Partial Database Reconstruction – If the DDB is lost or corrupt, a backup copy of the database is
restored and the database is reconstructed using chunk metadata.
• Full Database Reconstruction – If the DDB is lost and no backup copy is available, the entire
database is reconstructed from chunk metadata.

Commvault® Education Services Page 70 of 178


V11 SP18 Commvault® Engineer February 2020

Content Aware Deduplication


The concept of content aware deduplication is to identify what type of data is being protected and adjust how
deduplication is implemented. Consider a deduplication appliance that receives data from a backup application. The
appliance cannot detect files, databases, or metadata generated from the backup application. Commvault deduplication is
integrated into agents so it understands what is being protected. Content aware deduplication provides significant space
saving benefits and results in faster backup, restore, and synthetic full backup operations.

Object-Based Content Aware Deduplication


Since most file objects are not equally divisible by a set block size, such as 128KB, Commvault® deduplication uses a
content aware approach to generate signatures. If an object that is 272KB in size is deduplicated, it can be evenly divisible
by 128KB with a remainder of 16KB. In this case two 128KB deduplication blocks are hashed and compared.

The remaining 16KB will be hashed in its entirety. In other words, Commvault® deduplication will not add more data to the
deduplication buffer. The result is if the object containing the three deduplication blocks never changes, all three blocks
will always deduplicate against themselves.

The minimum fallback size to deduplicate the trailing block of an object is 4096 bytes (4 KB). Any trailing blocks smaller
than 4096 bytes is protected but will not be deduplicate.

Database and Log Content Aware Deduplication


Database applications often provide built-in compression, which compresses blocks before Commvault generates
signatures on the blocks. The application level compression can result in inconsistent blocks being deduplicated each time
a backup runs, which results in poor deduplication ratios.

When using Commvault compression during backups instead of application compression, the application agent can be
configured to detect the database backup and generates a signature on uncompressed data. After the signature has been
generated, the block is then compressed, which leads to improved deduplication ratios. By default, Commvault® software
always compresses prior to signature generation. Note that an additional setting can be added to the database client to
generate the signature prior to compression.
Commvault® Education Services Page 71 of 178
V11 SP18 Commvault® Engineer February 2020

Log files are constantly changing with new information added and old information truncated. Since the state of the data is
constantly changing, deduplication will provide no space saving benefits. During log backup jobs, the application agent
detects the log backup and no signatures are generated. This saves CPU and memory resources on the production
system and speeds up backups by eliminating signature lookups in the DDB.

Source and Target Side Deduplication


There are two types of deduplication that are performed:

• Source or client-side deduplication


• Target side deduplication

Source-Side Deduplication
Source-side deduplication, also referred to as 'client-side deduplication,' occurs when signatures are generated on
deduplication blocks by the client and the signature is sent to a MediaAgent hosting the DDB. The MediaAgent looks up
the signature within the DDB. If the signature is unique, a message is sent back to the client to transmit the block to the
MediaAgent, which then writes it to the disk library. The signature is logged in the DDB to signify the deduplication block is
now in storage.

If the signature already exists in the DDB then the block already exists in the disk library. The MediaAgent communicates
back to the client agent to discard the block and only send metadata information.

Target-Side Deduplication
Target-side deduplication requires all data to be transmitted to the MediaAgent. Signatures are generated on the client or
on the MediaAgent. The MediaAgent checks each signature in the DDB. If the signature does not exist, it is registered in
the database and the deduplication block is written to the disk library.

If the signature does exist in the DDB, then the block already exists in the library. The deduplication block is discarded and
only metadata associated with the block is written to disk.

Source-Side or Target-Side Deduplication?

Commvault® software is used to configure deduplication to occur either on the client or on the MediaAgent, but which is
best? This depends on several environmental variables including network bandwidth, client performance and MediaAgent
performance.

Which method is the best?

• Both Source-side and Target-side deduplication reduces storage requirements.


• Source-side deduplication also reduces network traffic by only transmitting deduplication blocks
that have changed since the last backup. Target-side deduplication does not.
• Target-side deduplication is used to reduce CPU processing by generating signatures on the
MediaAgent instead of the client. With Source-side deduplication, the signatures must be generated
on the client.
• For most network-based clients, Source-side deduplication is the preferred method since it reduces
network and storage requirements.
In certain situations, such as underpowered clients or high transaction clients such as production database servers,
Target-side deduplication may be preferable. Keep in mind that if Target-side deduplication is used and the MediaAgent is
generating signatures, adequate CPU power is required on the MediaAgent. If the MediaAgent is not scaled properly,
performance will suffer.

Commvault® Education Services Page 72 of 178


V11 SP18 Commvault® Engineer February 2020

Commvault® Education Services Page 73 of 178


V11 SP18 Commvault® Engineer February 2020

Partitioned Deduplication Database


Partitioned deduplication provides higher scalability and deduplication efficiency by allowing more than one Deduplication
Database (DDB) partition to exist within a single deduplication engine. It works by logically dividing signatures between
multiple databases. If two deduplication partitions are used, it effectively doubles the size of the deduplication store.
Currently Commvault® software supports up to four database partitions.

How Partitioned Databases Work


During data protection jobs, partitioned DDBs and the data protection operation work using the following logic:

1. Signature is generated at the source - For primary data protection jobs using client-side
deduplication, the source location is the client. For auxiliary DASH copy jobs, the source MediaAgent
generates signatures.
2. Based on the generated signature it is sent to its respective database. The database compares the
signature to determine if the block is duplicate or unique.
3. The defined storage policy data path is used to protect data – regardless of which database the
signature is compared in, the data path remains consistent throughout the job. If GridStor ® Round-
Robin has been enabled for the storage policy primary copy, jobs will load balance across
MediaAgents.

Partitioned Databases and Network-Attached Storage (NAS)


If partitioned deduplication is going to be implemented using two MediaAgents, it is recommended to use a shared disk
library with a Network-attached Storage (NAS) device. The NAS storage allows either MediaAgent to recover data even if
the other MediaAgent is not available.

Commvault® Education Services Page 74 of 178


V11 SP18 Commvault® Engineer February 2020

Partitioned Database for Scalability


The primary purpose for partitioned DDBs is to provide higher scalability. By balancing signatures between database
partitions, you can scale up the size of a single deduplication store. If you have two partitions, the size of the store doubles
– and having four partitions quadruples its size.

Partitioned Database for Resiliency


Using partitioned databases ensures resiliency. For instance, if one MediaAgent hosting a Deduplication Database (DDB)
goes offline, the other MediaAgent continues data protection jobs as the available DDB continues signature lookups.
However, with the loss of one database, all signatures previously managed by the off-line database would now be looked
up in the remaining online database. This causes existing signatures managed in the off-line database to be compared in
the online database, which results in the signatures being treated as unique, and additional data being written to the
library.

Commvault® Education Services Page 75 of 178


V11 SP18 Commvault® Engineer February 2020

Data Movement of Deduplicated Data


During data protection jobs, processes on the client compresses the data (if compression is enabled), fills the
deduplication buffer (default 128KB), generates a signature on the data, and then optionally encrypts the block.

Deduplication technical processes during a data protection job:

1. JobMgr on the CommServe® server initiates job.


2. CLBackup process uses the Commvault Communications (CVD) service to initiate communication
with CVD process on MediaAgent.
3. CVD process on MediaAgent launches the SIDB2 process to access the Deduplication Database
(DDB).
4. SIDB2 process communicates with CommServe server to retrieve deduplication parameters.
5. CLBackup process begins processing by buffering data based on deduplication block factor and
generates signatures on each deduplication block.
6. Signature is checked in DDB:
a. If the signature exists, the primary record counter is increased. Secondary tables will update
with detailed job information for the block. The block metadata is sent to the MediaAgent
but the data block is discarded.
b. If the signature does not exist, it is added to the primary table and detailed job information
related to the block is added to the secondary table. Block data and metadata are sent to
the MediaAgent.

Commvault® Education Services Page 76 of 178


V11 SP18 Commvault® Engineer February 2020

Deduplicated data movement during a data protection job

DASH Full Jobs


A read optimized synthetic DASH Full uses the Commvault® deduplication feature to logically perform synthesized full
backups without moving any data. This can be accomplished because Commvault deduplication tracks the location of all
blocks on disk storage. After the initial base full is run and subsequent incremental jobs are run, all block data required for
the synthetic full is already present in the deduplicated disk storage location. Since deduplication only stores a unique
block once in storage, the DASH Full operation only makes references to the blocks in storage and not actually copies
them. The DASH Full operation generates a new index file signifying that a full backup was run and updates the
Deduplication Database (DDB) with block record data that is used for data aging purposes. DASH Full backups are the
preferred method of running full backup jobs and can dramatically reduce backup windows.

When enabling Commvault deduplication for a primary copy, the ‘Enable DASH Full’ option is selected
by default.

Commvault® Education Services Page 77 of 178


V11 SP18 Commvault® Engineer February 2020

DASH Full process flow

Auxiliary Copy Jobs and Deduplication


An auxiliary copy job is a non-indexed chunk level copy operation. Chunks that are part of jobs required to be copied
during the auxiliary copy jobs are flagged. As each chunk is copied successfully to the destination MediaAgent, the flag is
removed. This means if for any reason the auxiliary copy fails or is killed, when the job restarts, only flagged chunks
require copying.

DASH Copy Jobs


A DASH Copy is an optimized auxiliary copy operation which only transmits unique blocks from the source library to the
destination library. It can be thought of as an intelligent replication which is ideal for consolidating data from remote sites
to a central data center and backups to DR sites.

DASH Copy has several advantages over traditional replication methods:

• DASH Copies are auxiliary copy operations so they can be scheduled to run at optimal time periods
when network bandwidth is readily available. Traditional replication would replicate data blocks as it
arrives at the source.
• Not all data on the source disk needs to be copied to the target disk. Using the subclient
associations of the secondary copy, only the data required to be copied would be selected.
Traditional replication would require all data on the source to be replicated to the destination.
• Different retention values can be set to each copy. Traditional replication would use the same
retention settings for both the source and target.
• DASH Copy is more resilient in that if the source disk data becomes corrupt the target is still aware
of all data blocks existing on the disk. This means after the source disk is repopulated with data
blocks, duplicate blocks will not be sent to the target, only changed blocks. Traditional replication
would require the entire replication process to start over if the source data became corrupt.

Commvault® Education Services Page 78 of 178


V11 SP18 Commvault® Engineer February 2020

Disk and Network Optimized DASH Copy


Disk optimized, which is the default setting, should always be used when the source library is using Commvault®
deduplication. Network optimized should only be used if the source library is not using Commvault deduplication.

Disk optimized DASH Copy will extract signatures from chunk metadata during the auxiliary copy process which reduces
the load on the source disks and the MediaAgent since blocks do not need to be read back to the MediaAgent and
signatures generated on the blocks.

Network optimized DASH Copy reads all blocks required for the auxiliary copy job back to the MediaAgent, which
generates signatures on each block.

To schedule an auxiliary copy job as a DASH Copy, first go to the Secondary Copy Properties Deduplication tab and, from
the Advanced subtab, select the ‘Enable DASH Copy’ check box and ensure that 'Disk Optimized' is also checked.

Data Movement and Job Checkpoints


During primary data protection and auxiliary copy jobs, the completion of each chunk represents a checkpoint in the job.
This checkpoint will do the following:

1. Commit the chunk metadata to the CommServe®


2. Commit signature records to the Deduplication Database (DDB).
These two steps are essential to ensure data integrity. If for any reason, a job fails or is killed, committed chunks are
reflected both in the CommServe database and DDB. Any chunks that did not complete are not registered in the
CommServe database and the records are not committed to the DDB.

This results in two important points:

1. No additional block data that generates the same signature will reference a block in an incomplete
chunk.
2. Once the chunk and signatures are committed, any signatures that match ones from the committed
chunk can immediately start deduplicating against the blocks within the chunk.
Another way to look at this is Commvault® software deduplicates on chunk boundaries. If multiple identical signatures
appear in the same chunk, each signature will be registered in the DDB and the blocks will be written multiple times. Once
the chunk is committed, duplicate signatures will only increase the record counter on the first occurrence of the signature.
All the other duplicate signatures registered in the DDB will remain with until the job is aged and pruned from storage.

It is also important to note that the chunk data is written as part of the job. Once the chunk is committed, SFiles that make
up the chunk are no longer bound to the job since other jobs can reference blocks within the SFile.

Commvault® Education Services Page 79 of 178


V11 SP18 Commvault® Engineer February 2020

DASH Copy process for disk and network optimized auxiliary copy jobs

Source Side Disk Cache


During DASH Copy operations, a source side cache can be enabled on the source MediaAgent to hold all signatures
locally for auxiliary copy jobs. When an auxiliary copy job runs, each signature is checked locally in the source cache to
determine if the block exists on the destination MediaAgent. Using the source side disk cache is recommended to improve
auxiliary copy performance over WAN links.

Optimize for high latency network is an optional setting which will first check the local MediaAgent disk cache. If the
signature is not found in the local cache, the process assumes the block is unique and sends both the block and the
signature to the destination MediaAgent.

To enable source side disk cache

1. Right-click the deduplicated secondary copy | Properties.

2. Check this option to create a small cache used for initial lookups on the source MediaAgent, before querying the
destination MediaAgent.

3. Set a size limit for the source side cache.

Commvault® Education Services Page 80 of 178


V11 SP18 Commvault® Engineer February 2020

Deduplicated Data Aging and Pruning Process


Data aging is a logical operation that compares what is in protected storage against defined retention settings. Jobs that
have exceeded retention are logically marked as aged. Jobs can also be manually marked as aged by the Commvault®
administrator. Aged jobs are registered in the MMDeletedAF table in the CommServe database.

Commvault® Education Services Page 81 of 178


V11 SP18 Commvault® Engineer February 2020

Pruning is the process of physically deleting data from disk storage. During normal data aging operations, all chunks
related to an aged job are marked as aged and pruned from disk. With Commvault deduplication, data blocks within
SFILES can be referenced by multiple jobs. If the entire SFILE was pruned, jobs referencing blocks within the SFILE
would not be recoverable. Commvault software uses a different mechanism when performing pruning operations for
deduplicated storage.

Aging and Pruning Process


To prune data from deduplicated storage, a counter system is used in the Deduplication Database (DDB) primary table to
determine the number of times a deduplication block is being referenced. Each time a duplicate block is written to disk
during a data protection job, a reference counter in the primary table is incremented. When the data aging operation runs,
each time a deduplication block is no longer being referenced by an aged job, the counter is decremented. When the
counter for the block reaches zero, it indicates that no jobs are referencing the block. The signature record is removed
from the primary table and placed in the zero reference table.

The aging and pruning process for deduplicated data is made up of several steps. When the data aging operation runs, it
appears in the Job Controller and may run for several minutes. This aging process logically marks data as aged. Behind
the scenes on the MediaAgent, the pruning process runs, which can take considerably more time depending on the
performance characteristics of the MediaAgent and DDB, as well as how many records need to be deleted.

Pruning Methods
Commvault® software supports the following pruning methods:

• Drill Holes – For disk libraries and MediaAgent operating systems that support the Sparse file
attribute, data blocks are pruned from within the SFILE. This frees up space at the block level
(default 128 KB) but over time can lead to disk fragmentation.
• SFILE truncation – If all trailing blocks in an SFILE are marked to be pruned, the End of File (EOF)
marker is reset reclaiming disk space.
• SFILE deletion – If all blocks in an SFILE are marked to be pruned, the SFILE is deleted.
• Store pruning – If all jobs within a store are aged and the DDB is sealed and a new DDB is created,
all data within the sealed store folders is deleted. This pruning method is a last resort measure and
requires sealing the DDB, which is strongly NOT recommended. This process should only be done
with Commvault Support and Development assistance.

Aging and Pruning Steps:


1. Jobs are logically aged which results in job metadata stored in the CommServe database as archive
files being moved into the MMDeletedAF table. This occurs based on one of two conditions:
a. Data aging operation runs and jobs which have exceeded retention are logically aged.
b. Jobs are manually deleted, which logically marks the job as aged.
2. Job metadata is sent to the MediaAgent to start the pruning process.
3. Metadata chunks are pruned from disk. Metadata chunks contain metadata associated with each
job so once the job is aged the metadata is no longer needed.
4. Signature references in the primary and secondary tables are adjusted based on:
o Primary table – Records for each signature are decremented for each occurrence of the block.
o Secondary table – Job information related to the aged job are deleted from the secondary table
files.

Commvault® Education Services Page 82 of 178


V11 SP18 Commvault® Engineer February 2020

5. Signatures no longer referenced are moved into the zero reference table.
6. Signatures for blocks no longer being referenced are updated in the chunk metadata information.
Blocks are then deleted using the drill holes, truncation or chunk file deletion method.

Deduplication Database Seeding


Commvault® deduplication efficiently backs up data from remote sites to the main data center, or sends a copy of the
backup data from the main data center to a secondary data center. Duplicate blocks are dropped from the source, sending
only changed blocks across the Wide Area Network (WAN). However, running the initial backup or auxiliary copy can be a
challenge since all blocks must be sent. This effort may slow down the process considerably. For instance, a large amount
of data combined with the limited bandwidth can cause an initial backup or auxiliary copy to take days or months to
complete.

To avoid that initial transfer over the WAN, Commvault® software offers a procedure called DDB Seeding. This procedure
transfers the initial baseline backup between two sites using available removable storage such as tapes, USB drives or
an iSCSI appliance.

Use DDB Seeding when remote office sites are separated from the data center across a WAN and data needs to be either
backed up remotely or replicated periodically to a central data center site. Once the initial baseline is established, all
subsequent backups and auxiliary copy operations consume less network bandwidth because only the changes are
transferred.

Note that this procedure is used to transfer only the initial baseline backup between two sites. It cannot be used for
subsequent backups.

DDB Seeding can be used in two scenarios:

• The initial backup of a large remote client or a large remote site with several clients.
• The initial auxiliary (DASH) copy between the main data center and the secondary data center.

Commvault® Education Services Page 83 of 178


V11 SP18 Commvault® Engineer February 2020

DDB Seeding for Initial Backup


The deduplication database seeding process for the initial backup leverages removable storage (USB drives or iSCSI
appliance) to transfer the data. The steps for this operation are as follows:

1. Attach the removable storage to a client from the remote site.


2. Temporarily install the MediaAgent software on the client to which the removable storage is
attached.
3. Define a library for the removable storage using the client/MediaAgent installed in the previous
step.
4. Create a storage policy for the remote site with the following copies.
a. Primary copy using the removable storage (can use deduplication if needed).
b. Secondary copy using the main data center disk library (copy typically using deduplication).
5. Associate the remote client or all of the remote site clients with the storage policy.
6. Execute the initial backup, which will write the data in the removable storage.
7. Ship the removable storage to the main data center and attached to the MediaAgent.
8. Modify the removable storage library properties to use the main data center MediaAgent from this
point.
9. Execute an auxiliary copy, which will copy the data from the removable storage to the disk library.
10. Once complete, validate that the data is accessible from the secondary copy.
11. Promote the secondary copy as the primary copy of the storage policy, resulting in the following.
a. Primary copy using the main data center disk library.
b. Secondary copy using the removable storage.
12. Delete the secondary copy using the removable storage.
13. Uninstall the MediaAgent software on the remote site client.
From that point on, traditional client-side deduplicated backups will be used for the remote site sending the data directly to
the main data center MediaAgent. But since the baseline is now completed, only changed block will travel across the
network.

Commvault® software also offers a workflow that automates most of those steps. For more information about the workflow,
consult the Commvault Online Documentation.

Commvault® Education Services Page 84 of 178


V11 SP18 Commvault® Engineer February 2020

DDB seeding process for an initial backup

DDB Seeding for Initial Auxiliary (DASH) Copy


A similar process is also used for the initial auxiliary copy between the main site and a secondary site. Removable storage
such as tapes, USB drives or iSCSI appliance can be used to transfer the data. In this scenario, the steps are as follows.

1. If not done already, attach the storage to the source MediaAgent.


2. If not done already, define a library for the removable storage using the source MediaAgent (can
use deduplication if needed, unless using tapes).
3. Typically, the storage policy has a primary copy in the source MediaAgent disk library and a
secondary copy in the target MediaAgent disk library. Add another secondary copy using the
removable storage library. This will result in the following copies:
a. Primary copy using the source MediaAgent library.
b. Secondary copy using the target MediaAgent library.
c. Secondary copy using the removable storage.
4. By default, a secondary copy uses the primary copy as a source during an auxiliary copy job. Modify
the properties of the copy using the target MediaAgent library to now use the removable storage
copy as a source for the auxiliary copy instead of the primary copy.
5. Run an auxiliary copy for the removable storage copy. This will copy the data from the source disk
library to the removable storage.
6. Once completed, ship the removable storage to the secondary data center.
7. If using tapes, simply insert in the library. If using other storage, attach it to the target MediaAgent.
8. If using any storage other than tapes, modify the library data path to point to the target
MediaAgent. If using tapes, skip this step.

Commvault® Education Services Page 85 of 178


V11 SP18 Commvault® Engineer February 2020

9. Run an auxiliary copy for the target library copy. This will copy the data from the removable storage
to the target disk library.
10. Once completed, validate that the data is accessible from the target disk library.
11. Modify the storage policy target library copy to use the primary copy as a source for auxiliary copy.
12. Delete the removable storage copy from the storage policy.
From this point on, traditional DASH copies will be used to transfer the data between the two sites. But since the baseline
exists in the target library, only blocks that have changed will be sent over the WAN.

DDB seeding process for an initial DASH copy

Commvault® Education Services Page 86 of 178


V11 SP18 Commvault® Engineer February 2020

Deduplication Database Synchronization


Sometimes there can be inconsistencies between the deduplication database entries and the CommServe® server
database job history. If this happens, the deduplication database is switched to maintenance mode. When in this mode,
data aging, backup, or auxiliary copy jobs cannot run using the DDBs. Also, recovering the CommServe server database
to a previous point-in-time can also lead to inconsistencies between the two databases.

For example, client backups are executed every hour and the DR backup is scheduled for 10:00 a.m. If the CommServe®
server crashes at 1:00 p.m. and is restored, it uses the 10:00 a.m. DR backup. However, since some client backups ran
during 10:00 a.m. and 1:00 p.m., the deduplication database contains block entries created after 10:00 a.m. Therefore,
these orphaned blocks entries are not known by the CommServe® server database.

To resolve any discrepancies, a deduplication database resynchronization must be executed.

Commvault® Education Services Page 87 of 178


V11 SP18 Commvault® Engineer February 2020

DDB resynchronization following a CommServe® server database restore

Note that after a CommServe® server database restore, the deduplication databases may be in maintenance mode, which
requires resynchronization. But the resync process will work only if the CommServe server database is restored from a
DR backup that is less than five days old. If the DR backup used is older than five days, the deduplication databases must
be sealed, leading to a re-baseline of the deduplication store.

To resync deduplication databases

1. Expand Deduplication Engines | Expand any DDB | Click a partition.

2. Validate that the partition status is set to maintenance mode.

Commvault® Education Services Page 88 of 178


V11 SP18 Commvault® Engineer February 2020

3. Right-click Deduplication Engines and synchronize all databases.

4. Acknowledge the warning.

5. Open the Event Viewer view.

6. Validate that all partitions are re-synced and online.

Commvault® Education Services Page 89 of 178


V11 SP18 Commvault® Engineer February 2020

Commvault HyperScale Technology

Commvault® Education Services Page 90 of 178


V11 SP18 Commvault® Engineer February 2020

Commvault HyperScale Technology Overview

Data Protection Scaling Challenges


Information technology environments are growing at a rapid pace forcing backup infrastructures to grow just as quickly. As
hardware and software components increase in size, scalability becomes a challenge. Moreover, the location and use of
data (e.g., cloud, and roaming users) requires the technology industry to quickly re-invent itself while protecting data and
providing immediate access in the event of a data loss.

In a traditional backup environment, scalability is achieved by scaling up to increase resources. For instance, if a lack of
resources is detected for a media server, memory or processors can be added. If the server has used all of its resources,
then it must be replaced. Depending on the controller-based technology used (i.e., SAN, DAS, NAS), options are available
to add disks or an additional shelf of disks when storage space is low. But if the unit is already saturated, it must be
replaced by a larger one. This situation can involve high costs, significant planning, and migration efforts.

Using Commvault® HyperScale™ technology mitigates costly endeavors by providing on-premises scale-out backup and
recovery that delivers "cloud-like” scale and flexibility.

What is Commvault HyperScale™ Technology


Commvault HyperScale™ technology is a private cloud-based technology using hyper-convergence to pool a set of
disparate resources. This infrastructure scales out as needed by simply adding more commodity servers to the pool in
blocks of three or six nodes. There is no need for costly power horses media servers that are difficult to scale.

Commvault HyperScale™ technology allows you to start small and continues to grow as needed, significantly reducing
costs in the long run. For instance, deploying a block of three nodes provides 80 TB of available space as a storage
target. When space is low, another block of nodes can be added to the pool. This new set of nodes expands the existing
pool and is used automatically. Scaling an environment becomes a simple and easy task with no need for reconfiguration.
Data is spread across all nodes using erasure coding which provides resiliency. A disk can have a failure, or a node can
be offline without affecting the environment operations without losing any data.

Commvault® Education Services Page 91 of 178


V11 SP18 Commvault® Engineer February 2020

Example of a storage pool expansion

Commvault® HyperScale™ technology offers the following benefits:

• Cost savings - Commodity servers are used as nodes.


• Ease of deployment - A node can be fully configured and usable in 30 minutes.
• Ease of management - No need to create additional pools of resources when introducing blocks of
nodes since it expands the current one.
• Resiliency - Depending on the configuration, one or more disks, or one or more nodes, can be lost
without disrupting operations or losing data.

Infrastructure Models
The Commvault HyperScale™ environment can be implemented using the following two models:

• Commvault HyperScale™ appliance (HS1300) - All in one appliance sold by Commvault®.


• Reference architecture - Commvault HyperScale™ sold as a software by Commvault®, installed on
a set of servers provided by a third-party vendor.
Both models can co-exist in the same Commvault® environment. For instance, the following graph shows a deployment of
two appliance blocks with three nodes each. When expanding storage, reference architecture blocks can also be
introduced.

Commvault® Education Services Page 92 of 178


V11 SP18 Commvault® Engineer February 2020

Illustration of a hybrid environment

Commvault HyperScale™ Appliance (HS1300)


The Commvault HyperScale™ Appliance is an all all-in-one 1U server that runs RedHat® Linux and includes all required
hardware, as well as the Commvault® software. It is configured in blocks of three or six nodes, providing a storage
capacity scope of 32TB to 160TB. This solution is well suited for small to medium organizations and remote office
protection.

Support is provided by Commvault® not only for the software but also for the operating system, the firmware, and part
replacement.

Illustration of an HS1300 block

Commvault® HyperScale™ on Reference Architecture


Commvault HyperScale™ on Reference Architecture includes the entire Commvault HyperScale™ software stack installed
on a set of validated servers provided by a third party vendor, such as Cisco, HPE, and Dell. It provides all the benefits of
using Commvault HyperScale™ technology such as ease of deployment, management, and resiliency. The difference with

Commvault® Education Services Page 93 of 178


V11 SP18 Commvault® Engineer February 2020

the Commvault HyperScale™ appliance is that the support for firmware and part replacement is provided by the vendor.
Commvault is still responsible for the software and the operating system (RedHat® Linux) support.

The Reference Architecture can scale to hundreds of terabytes and is therefore suited to protect large organizations and
data centers.

For more information on the supported servers and vendors, consult the Commvault® online documentation. The number
of validated servers constantly grow with each service pack.

Specification chart comparing models

Commvault® Education Services Page 94 of 178


V11 SP18 Commvault® Engineer February 2020

Commvault HyperScale Architecture – High Level


Deploying a Commvault HyperScale™ environment is the same as a deploying a traditional Commvault® environment (or
CommCell®). The Commvault HyperScale™ block can also be deployed in an existing environment with components
installed on a block of nodes. Those components are:

• CommServe server
• MediaAgents
• Deduplication Database Partitions
• CommServe® Server
In a Commvault HyperScale™ environment, the CommServe® server is required to control all operations with nodes
running on a RedHat® Linux operating system. Since the CommServe® server is a Windows-only server, it cannot be
installed directly on the node. Therefore, a Linux virtualization platform clustered across all nodes using GlusterFS (which
is a Linux clustered file system) is leveraged to run the CommServe® server as a virtual machine. If anything happens to
the active node running the CommServe® server, it will failover to the next node of the block.

MediaAgents
Each node within a block acts as a MediaAgent, a data mover that ingests data received from servers and sends the data
back to the servers during restore operations. Data is spread to disks across all nodes of the block. Catalogs of protected
objects are stored in the index directory, which is present on each node. The streams received from servers are load
balanced across all MediaAgents part of the storage pool.

Note that if there is a need to achieve LAN free backups or to create a tape copy, an additional controller can be added to
connect to the storage or tape library.

Deduplication Database Partitions


The Commvault® HyperScale™ environment also takes full advantage of Commvault partitioned deduplication. When
implementing the first three nodes block, a Deduplication Engine is automatically created using two deduplication database
partitions on each node, for a total of six. The first storage pool being created leverages all six partitions.

When adding an additional three nodes block to expand the storage pool, one database partition from each of the initial
nodes will automatically be moved on one of the three additional nodes. This results in a single partition per node. If the
storage pool is expanded again with another block, these new nodes will be part of the storage pool to increase the
storage capacity but will not host any database partitions. However, these additional nodes could host a deduplication
database partition for another storage pool, such as one using a cloud storage target. This is ideal to offer an offsite copy
of the data.

Deduplication layout for the initial three nodes block

Commvault® Education Services Page 95 of 178


V11 SP18 Commvault® Engineer February 2020

Deduplication layout when adding an additional three nodes block

Commvault HyperScale Architecture – Network


The Commvault HyperScale™ infrastructure is based on a strict network architecture. The entire solution is based on
three or four networks, depending on the configuration. It relies on two 10 GB and 1 GB networks. An additional 1 GB

Commvault® Education Services Page 96 of 178


V11 SP18 Commvault® Engineer February 2020

network can be used for DNS resolution purposes. All the switching, routing, and VLANs are not part of the reference
architecture nor the Commvault HyperScale™ appliance and must be provided and configured by the customer.

The network configuration also relies heavily on DNS resolution, both forward and reverse. Entries must be created in the
DNS server for each node. If not, hosts files on each node can be configured, but it increases the chance of human error
and misconfigurations. It is recommended to instead use DNS resolution.

The required networks are as follows:

• The backup network (10 GB)


• The storage network (10 GB)
• The iRMC (Remote Control) network (1 GB)
• The management network (1 GB) - Optional

The Data Protection Network


The first 10 GB network is the data protection network (backup network), used to receive backup data from servers
whether they are physical servers or virtual machines. From a network configuration perspective, this VLAN needs to be
configured to communicate with every client machines and the proxy servers used to protect VMs. If a secondary copy of
the data is sent to the cloud, this network is used to reach the Internet. If the Commvault HyperScale™ servers are an
expansion of a traditional Commvault CommCell®, this network is also used for communication with the CommServe®
server and other MediaAgents. Finally, if the CommServe® server is provisioned in the Commvault
HyperScale™ environment, it is hosted in the dedicated GlusterFS file system that also uses the data protection network
for communication and failover of the CommServe® server if needed.

When configuring the Commvault HyperScale™ node, the interface for this 10 GB network is represented as eno3 at the
Linux operating system level. If you run a Linux ifconfig command, it will return that eno3 interface configuration. A
representation of the backup network

Storage Network
The second 10 GB interface is used for the storage network (backend network). This isolated network is used for
communication of the clustered file system (GlusterFS) acting as a storage target to write backup data. This network can
use any arbitrary VLAN and does not require any routing towards other networks, nor a network providing DNS resolution.
All communications are handled internally by the Commvault HyperScale™ technology.

When configuring the Commvault HyperScale™ node, the interface used for the storage network is identified as eno4. If
you run a Linux ifconfig command, it will return that eno4 interface configuration.

Commvault® Education Services Page 97 of 178


V11 SP18 Commvault® Engineer February 2020

A representation of the storage network

The iRMC Network


The first 1 GB network is the iRMC network (remote management controller network). This network is used only in a
deployment where the CommServe® server is provisioned in the Commvault HyperScale™ environment. This network is
used to send signals and validate the status of the hardware. A failed response would initiate a CommServe® server
failover.

An important requirement for this mechanism to work is that this network must be routed to communicate with the data
protection network. For instance, if network connectivity is lost on one of the nodes on the data protection network, the
software uses the iRMC network to automatically shut down the node, avoiding any inconsistencies (split brain). A
representation of the iRMC network

The Management Network (Optional)


When configuring the Commvault HyperScale™ block of nodes, an Advanced Networking Configuration option available is
the possibility to use a management network. This network is useful in a scenario where a flat network is used for data

Commvault® Education Services Page 98 of 178


V11 SP18 Commvault® Engineer February 2020

protection to isolate data transfer from the production network. If the data protection network has no DNS services
accessible, it makes the client backup configuration a lot harder. In this case, a 1GB optional network can be configured to
access the production network DNS services. The CommServe® server uses that interface to query network services.
The data transfer between clients and MediaAgents still travel on the data protection network.

At the nodes operating system level, the management network interface is represented as eno2. Running an ifconfig
command gives configuration information for that network.

A representation of the management network

Commvault® Education Services Page 99 of 178


V11 SP18 Commvault® Engineer February 2020

Storage Architecture
Commvault HyperScale™ technology relies on a resilient storage system using erasure coding. The data is therefore
scattered on multiple disks and nodes. When using Commvault HyperScale™ reference architecture, nodes can have six,
twelve or twenty-four disk drives per node. The Commvault HyperScale™ appliance (HS1300) uses four disk drives per
node. Depending on the configuration it allows losing one or more disk without losing data.

What is Erasure Coding


Erasure code, as per Wikipedia, is a Forward Error Correction (FEC) mechanism based on bit erasure rather than bit
errors. In other words, the software encodes the data with parity, which results in a written file slightly bigger than its actual
size (33% increase in space consumption). But how does it work exactly? Here is how...

First, a choice must be made in the parity scheme to use. Commvault HyperScale™ technology offers two options:

• 4,2 (4 data segments + 2 parity segments)


• 8,4 (8 data segments + 4 parity segments)
Let's take a 2 MB chunk file written by a MediaAgent, processed using the 4,2 parity model. The erasure code first splits
that file in two blocks. It then encodes these two pieces, resulting in four segments of about 1/4 of the total original file
size. So in this example, the four segments are 0,5 MB each, for a total of 2 MB. To that, two parity segments of 0,5 MB
each are added, for a grand total of 3 MB (33% space increase for parity).

These files are then scattered on different disks and nodes. Since it uses the 4,2 model, it means that the data is always
available as long as four of the six segments are available.

Commvault® Education Services Page 100 of 178


V11 SP18 Commvault® Engineer February 2020

Erasure coding process on a 2 MB file

Erasure Coding Relationship to Storage


Once erasure code has created the six segments, they must be written to storage in a fashion providing the best resiliency
possible. It is handled by how Commvault HyperScale™ technology logically addresses and segregates storage disks. A
Commvault HyperScale™ appliance can have four disks per node for a total of twelve for the block. The reference
architecture number of disks depends on the number of nodes and the number of disks per node, which can vary.

The storage is logically divided into subvolumes. Each subvolume is made up of two physical disks per node from three
different nodes, for a total of six. As many logical subvolumes as needed are created until all disks are consumed.
Commvault HyperScale™ logical division of storage

File encoded by Commvault HyperScale™ erasure coding, are written to storage by following a simple rule. The six
segments of a file must be written to the same subvolume, one segment per disk. This rule ensures that all segments of a
file do not end all up on the same node, or even worst, the same disk. The segments of the next file can be written on
another subvolume or even the same, but never will the segments of the same file be split across multiple subvolumes.
File segments being written to a storage subvolume

Commvault® Education Services Page 101 of 178


V11 SP18 Commvault® Engineer February 2020

Therefore, using a 4,2 parity means that as long as four segments of a file are still available, the data is valid. Up to two
disks could fail, or even a complete node without impacting operations. Using an 8,4 parity means that if 8 of the 12
segments of the file are available, the data can be read. It is important if a disk or a node should fail, to address the issue
as soon as possible to avoid reaching the threshold in numbers of failures, which would corrupt data. Resiliency when
using 4,2 parity

Commvault® Education Services Page 102 of 178


V11 SP18 Commvault® Engineer February 2020

Resiliency information and best practices for Commvault HyperScale™ environments

Commvault® Education Services Page 103 of 178


V11 SP18 Commvault® Engineer February 2020

Storage Policies

Commvault® Education Services Page 104 of 178


V11 SP18 Commvault® Engineer February 2020

Storage Policy Design Methodology


Properly designing a CommCell® environment can be a difficult process. In some environments, a simple design may
suffice, but in more complex environments, careful planning must be done to ensure data is properly protected and the
CommCell® environment can properly scale to meet future requirements.

There are three phases to designing and implementing a proper solution:

1. Plan
2. Build
3. Assess & Modify
The following highlights the key elements of each phase:

• The Planning Phase – focuses on gathering all information to properly determine the minimum
number of storage policies required. Careful planning in this step makes it easier to build or modify
policies and subclients. The objective is to determine the basic structure required to meet
protection objectives. Modifications can later be made to meet additional requirements.
• There are three design methods that can be used during the plan phase:
o Basic Planning Methodology which focuses on generic guidelines to building storage policies
and subclients.

o Technical Planning Methodology which focuses on technical requirements for providing a


basic design strategy.

o Content Based Planning Methodology which takes a comprehensive end-to-end approach


taking into consideration all aspects of business and IT requirements as well as integrating
multiple technologies for a complete solution.

• The Build Phase – focuses on configuring storage policies, policy copies, and subclients. Proper
implementation in this phase is based on proper planning and documentation from the design
phase.
• The Modification Phase – focuses on key points for meeting backup/recovery windows, media
management requirements and environmental/procedural changes to modify, remove, or add any
additional storage policy or subclient components.
It is important to note that the ‘Design-Build-Modify’ approach is a cyclical process since an
environment is always changing. Not only is this important for data growth and procedural changes,
but it also allows you to modify your CommCell environment and protection strategies based on
emerging technologies. This provides greater speed and flexibility for managing protected data as
our industry continues to change at a rapid pace.

Commvault® Education Services Page 105 of 178


V11 SP18 Commvault® Engineer February 2020

Approaching Storage Policy Design


There is no one size fits all methodology for designing and configuring a Commvault® environment. For many it is more of
an art than a science where administrators and engineers use experience and intuition for proper implementation and
configuration. The balance of performance, media management, data retention, and ease of administration must be
considered throughout the design and implementation process.

Consider these four basic rules for approaching storage policy design:

1. Keep it simple
2. Meet protection requirements
3. Meet media management requirements
4. Meet recovery requirements

Rule 1: Keep it Simple


This section describes several different methods for protecting data. It is designed to provide in-depth explanations and
solutions for the most complex environments. But before overanalyzing and over-architecting the Commvault
environment, use this one simple rule: KEEP IT SIMPLE! If rules 2 – 4 are being satisfied then there is really no reason to
change anything. A complex environment leads to more complex problems.

Rule 2: Meet Protection Requirements


Data protection requirements MUST be met. Though it is true, the only reason we protect data is to recover it, if you are
not meeting your windows then you are not protecting data. You cannot recover something that never finished backing up,
so ensure protection windows are being met. Performance always starts with an adequately designed physical
environment. Before tweaking Commvault software to improve performance, ensure that Clients, MediaAgents, and
networks are scaled appropriately.

Commvault® Education Services Page 106 of 178


V11 SP18 Commvault® Engineer February 2020

Rule 3: Meet Media Management Requirements


In an ideal world, data would simply be preserved forever. With the dropping cost of disk storage and deduplication, most
data can be retained longer. As with anything this comes at a price. The best way to approach media management is to
ensure the business and understand your capabilities and limitations for preserving data.

Sometimes a 'Pie in the Sky' vision of protecting data can be brought right down to reality through a little education and a
cost association of the business requirements. Although you understand the capabilities and limitations of your storage,
the non-technical people may not. Provide basic guidance and education so they better understand what you and the
Commvault® software suite can do. You may not have the power to make the final decisions, but you do have the power to
influence the decision process.

Rule 4: Meet Recovery Windows


Recovery windows are determined based on Service Level Agreements (SLA).

For data protection and recovery an SLA is made up of three components:

• Protection Windows
• Recovery Time Objectives (RTO)
• Recovery Point Objectives (RPO)
When designing a CommCell environment, focus should always be placed on how data will be recovered. Does an entire
server need to be recovered or only certain critical data on the server require recovery? What other systems are required
for the data to be accessible by users? What is the business function that the data relies on? What is the associated cost
with that system being down for long periods of time? The following sections will address RTO and RPO and methods for
improving recovery performance.

Commvault® Education Services Page 107 of 178


V11 SP18 Commvault® Engineer February 2020

Basic Planning Methodology Approach

Data Locations
In a distributed CommCell® architecture where different physical locations are using local storage, different storage
policies should be used. This avoids the potential of improper data path configurations within the policy copy resulting in
data being unintentionally moved over WAN connections. This also provides the ability to delegate control of local policies
to administrators at that location without potentially providing them full control to all policies.

Commvault® Education Services Page 108 of 178


V11 SP18 Commvault® Engineer February 2020

Storage policy for data location concept

Data Paths
For simplicity of managing a CommCell® environment, different libraries as well as location of the libraries may require
separate storage policies. This allows for easier policy management, security configurations, and media management.

Consider the following when determining storage policy strategies for libraries and data paths:

• When using Commvault® deduplication, for performance and scalability reasons different policies
should be used for each MediaAgent data path. This allows the deduplication database to be locally
accessible by each MediaAgent providing better throughput, higher scalability, and more streams to
be run concurrently.
• If a shared disk (not using Commvault deduplication) or shared tape library is being used where
multiple Client / MediaAgents have LAN free (Preferred) paths to storage, a single storage policy can
be used. Add each path in the Data Path Properties tab of the Primary Copy. Each Client /
MediaAgent will use the LAN Free path to write to the shared library. This allows for simplified
storage policy management and the consolidation of data to tape media during auxiliary copy
operations.
• If a shared disk (not using Commvault deduplication) or tape library is protecting LAN based client
data where multiple MediaAgents can see the library, each data path can be added to the primary
copy. GridStor® Round
Robin or failover can be implemented to provide data path availability and load balancing for data
protection jobs.

Commvault® Education Services Page 109 of 178


V11 SP18 Commvault® Engineer February 2020

Retention Requirements for Contents


Retention requirements should be based on specific contents within a file system or application. All too often, determining
retention requirements is not easy, especially when data owners do not want to commit to specific numbers.

Considerations for Retention Requirements:

• Keep it simple. Unless specific content within an application or file system requires special retention
requirements, don't over design subclients.
• Consider using default retention policies providing several levels of protection. Provide the options
to the data owners and allow them to choose. Also, stipulate that if they do not make a choice, then
a primary default retention will be used. State a deadline in which they must provide their retention
requirements. It is important to note that this is a basic recommendation and you should always
follow policies based on company and compliance guidelines.
Consider defining retention rules for the following:

Disaster Recovery requirements should be based on the number of Cycles of data that should be retained. This should
also include how many copies (on-site / off-site) for each cycle.

Data Recovery requirements should be based on how far back in time (days) that data may be required for recovery.

Data Preservation/Compliance should be based on the frequency of point-in-time copies (Monthly, Quarterly, Yearly) and
how long the copies should be kept for (Days).

Storage policies for retention requirement

Data Isolation
A storage policy creates logical boundaries for protected data. Data associated with and managed by a storage policy is
bound to that policy. Protected data can be moved between copies within the same storage policy, but the data cannot be

Commvault® Education Services Page 110 of 178


V11 SP18 Commvault® Engineer February 2020

moved from one storage policy to another. This data isolation can be crucial when considering the management of data by
different departments, by different data types, different retention needs, or different storage locations.

Compliance
Compliance requirements often dictate the long-term preservation of specific business data. There are multiple features
built into Commvault® software that provides business data isolation and long term storage for compliance data.

Reference Copy and legal hold provide methods to extract data from standard data protection jobs and associate the data
with storage policies configured to meet compliance retention requirements. When using these features, it is
recommended to configure separate storage policies to manage compliance data in isolation.

Commvault® Education Services Page 111 of 178


V11 SP18 Commvault® Engineer February 2020

Guidelines for Custom Storage Policies

Microsoft SQL Log Storage Policy


MS SQL subclients have a unique configuration where Full and Differential backups can be directed to one storage policy
and Log backups can be directed to a second policy. This is the same concept as Incremental Storage Policies except that
instead of linking the policies together, the two policies are defined in the Storage Device tab of the SQL subclient.

Legal Hold Policy


When using the Content Indexing and compliance search feature, auditors can perform content searches on end user
data. The search results can be incorporated into a legal hold. By designating a storage policy as a Legal Hold policy, the
auditor will have the ability to associate selected items required for legal hold with designated Legal Hold policies. It is
recommended to use dedicated legal hold policies when using this feature.

Legal Hold Storage Policies can also be used with Content Director for records management policies. This allows content
searches to be scheduled and results of the searches can be automatically copied into a designated Legal Hold Policy.

To use a legal hold storage policy, simply create a storage policy with the required legal hold retention. Then, enable it as
a legal hold policy, and the compliance officers and legal team members will be able to use it from the Compliance Search
portal.

Erase Data
Erase data is a powerful tool that allows end users or Commvault® administrators to granularly mark objects as
unrecoverable within the CommCell® environment. For object level archiving such as files and email messages, if an end
user deleted a stub, the corresponding object in Commvault protected storage can be marked as unrecoverable.
Administrators can also browse or search for data through the CommCell® console and mark the data as unrecoverable.

It is technically not possible to erase specific data from within a job. The way 'Erase Data' works is by logically marking the
data unrecoverable. If a Browse or Find operation is conducted, the data does not appear. For this feature to be effective,

Commvault® Education Services Page 112 of 178


V11 SP18 Commvault® Engineer February 2020

any media managed by a storage policy with the 'Erase Data' option enabled will not be able to be recovered through
Media Explorer, Restore by Job, or Cataloged.

It is important to note that enabling or disabling this feature cannot be applied retroactively to media already written to. If
this option is enabled, then all media managed by the policy cannot be recovered other than through the CommCell
console. If it is not enabled then all data managed by the policy can be recovered through Media Explorer, Restore by Job,
or Cataloged.

If this feature is going to be used, it is recommended to use dedicated storage policies for all data that may require the
'Erase Data' option to be applied. Disable this feature for data that is known to not require this option.

Global Secondary Copy


Global Secondary copy policies allow multiple storage policy secondary copies using a tape data path to be associated
with a single global secondary copy. This is based on the same concept as global deduplication policies, but global
secondary copies only apply to tape copies. If multiple secondary copies require the same retention and encryption
settings, using a global secondary copy reduces the number of tapes required during auxiliary copy operations and
improves performance.

To configure and use a Global Secondary Copy, the Global Secondary Copy Policy first needs to be created. Then, in
every storage policy for which you want to use it, a secondary copy associated to the Global Secondary Copy Policy must
be created.

Security
If specific users or groups need rights to manage a storage policy, it is recommended to use different policies for each
group. Each group can be granted management capabilities to their own storage policies.

Media Password
The Media Password is used when recovering data through Media Explorer or by Cataloging media. When using
hardware encryption or Commvault copy based encryption with the 'Direct Media Access' option set to 'Via Media
Password,' a media password is essential. By default, the password is set for the entire CommCell environment in the
System applet located in the Control Panel. Storage policy level media passwords can be set to override the CommCell
password settings. For a higher level of security or if a department requires specific passwords, use the 'Policy level'
password setting which is configured in the Advanced tab of the Storage Policy Properties.

Commvault® Education Services Page 113 of 178


V11 SP18 Commvault® Engineer February 2020

RETENTION

Commvault® Education Services Page 114 of 178


V11 SP18 Commvault® Engineer February 2020

Retention Overview
A data retention strategy is important for managing storage in your CommCell® environment. With Commvault® software,
you can define retention for multiple copies of data with each copy having different retention requirements. Additionally,
retention may be required at the object-level and not just the data protection operation. Commvault software makes this
strategy straight forward to implement by using storage policy copies, subclient object-level retention, and Exchange
configuration retention policies.

In Version 11, Commvault software has three primary retention methods:

• Job based retention – Configured at the storage policy copy level, job schedule level, or manually by
selecting jobs or media to retain, and applying different retention.
• Subclient object based retention – Configured at the subclient level, it applies retention-based on
the deletion point of an object. Object-based retention is based on the retention setting in the
subclient properties plus the storage policy copy retention settings.
• Configuration policies – Currently used for Exchange mailbox protection. These policies include
archive, retention, cleanup, and journaling. Configuration policies provide the ability to define
complete retention and destruction policies, including the capability of deleting messages from the
production Exchange environment.

Retention Basics
Commvault® software provides extensive retention control for protected data. For basic retention requirements, follow the
general guidelines and best practices for retention configuration.

Retention general guidelines:

 Disk storage:
o Leave the Cycles retention set at the default of two
o Use the Days retention to govern retention policies for each copy o Never use extended
retention rules when using Commvault deduplication  Tape storage:
o Set the Cycles retention based on the number of complete sets of tape copies you want
to retain. For example, if you want 30 days of data stored off-site, which includes at least
four full backups and all dependent jobs (incremental or differential), for complete
recovery from any tape set, set the Cycles retention to four. o Set the Days retention
based on standard retention requirements.

Commvault® Education Services Page 115 of 178


V11 SP18 Commvault® Engineer February 2020

Job Based Retention


Job-based retention places a standard retention for an entire job. Jobs are retained based on storage policy copy retention
rules. Additionally, job-based retention is applied through the job schedule or modified after the job completes.

Storage Policy Copy Retention Rules


Policy-based retention settings are configured in the storage policy copy Retention tab. The settings for backup data are
Days and Cycles. For archive data, the retention is configured in Days. Retention is also set through schedules or applied
retroactively to a job in a storage policy copy.

Days
A day is a 24-hour time-period defined by the start time of the job. Each 24-hour time period is complete whether a backup
runs or not. This way, a day is considered a constant.

Cycles
A cycle is defined as all backup jobs required to restore a system to a specific point-in-time. Traditionally, cycles are
defined as a complete full backup, all dependent incremental backups, differential backups, or log backups; up to, but not
including the subsequent full backup. A cycle is referenced as Active or Complete, which means that as soon as a full
backup completes successfully it starts a new cycle which is the active cycle. The previous active cycle is marked as a
complete cycle.

An active cycle is marked complete only if a new full backup finishes successfully. If a scheduled full backup does not
complete successfully, the active cycle remains active until such time that a full backup does complete. On the other hand,
a new active cycle begins and the previous active cycle is marked complete when a full backup completes successfully
regardless of scheduling. In this way, a cycle can be thought of as a variable value based on the successful completion or
failure of a full backup. This also helps to break away from the traditional thought of a cycle being a week long, or even a
specified period of time.

Commvault® Education Services Page 116 of 178


V11 SP18 Commvault® Engineer February 2020

Days and Cycles Relation


Cycles and days should directly or indirectly equal each other:

• 2 cycles and 14 days with weekly full backups


• 4 cycles and 30 days being approximately 1 month
• 12 cycles and 365 days for month end full backups being retained for a year
But what about 52 cycles and 365 days? In situations like this it is rather irrelevant how many cycles are set. The truth is,
2 cycles and 365 days is good enough. You will meet your retention requirements since you are keeping data for one year.
If backups don't run for over a year, you are still guaranteed to have at least 2 cycles of data in storage based on the aging
entire cycles rule.

When setting retention in the policy copy, base it on the primary reason data is being protected. If it is for disaster
recovery, ensure the proper number of cycles are set to guarantee a minimum number of backup sets for full backup
restore. If you are retaining data for data recovery, then set the days to the required length of time determined by retention
policies. If the data recovery policy is for three months, 12 cycles and 90 days or 1 cycle and 90 days will still meet the
retention requirements.

With the release of Commvault Version 11 software, the default retention for a storage policy primary
copy is 15 days and 2 cycles. A secondary copy's default retention is 30 days and 4 cycles.

Retention Rules for Storage Policy Copy Retention


There are several retention rules that are applied to jobs:

• Both Days and Cycles criteria must be met for aging to occur
• Data is aged in complete cycles
• Days criteria is not dependent on jobs running on a given day
Rule 1: Both CYCLES and DAYS criteria must be met

Commvault® software uses AND logic to ensure that both retention parameters are satisfied. Another way of looking at this
is the longer of the two values of cycles and days within a policy copy always determines the time data is retained for.

Example: Retention for a storage policy copy is set to 3 days and 2 cycles. This is not a typical example, but it's used to
logically prove the statement that both days and cycles criteria must be met for data to age. By Monday 3 full backups
have been performed. If Friday's full backup is aged, there would be 2 full backups left meeting our criteria of 2 cycles.
However, the days criteria calls for 3 days, and if the Friday full backup was aged, only 2 days would be counted. The
Friday full backup would therefore age on Tuesday.

Monday at 12 PM the data aging operation runs and determines no data can be marked aged

Commvault® Education Services Page 117 of 178


V11 SP18 Commvault® Engineer February 2020

Tuesday at 12 PM the data aging operation runs and determines the Friday full backup can be marked aged

Rule 2: Data is retained based on complete cycles

Backup data is managed within a storage policy copy as a cycle or a set of backups. This includes the full backup which
designates the beginning of a cycle and all incrementals or differentials backups. When data aging is performed and
retention criteria allows for data to be aged, the entire cycle is marked as aged. This process ensures that jobs will not
become orphaned resulting in dependent jobs (incremental or differential) existing without the associated full backup.

Example: This is another retention example used to prove the rule. Retention is configured for 7 days and 2 cycles. Full
backups are being performed on Fridays and Mondays, and incremental backups on all other days. On Saturday the
cycles criteria of 2 has been met since there are 3 full backups. If a cycle is removed there would be 2 left, a complete
cycle (Monday – Thursday) and the full backup on Friday night. However, since we prune entire cycles we would have to
age the Friday full backup and the incremental backups from Saturday and Sunday. This results in only 5 days, which
does not meet our day’s retention requirements of 7. So on Monday when the data aging operation runs (default 12PM
daily) there will now be 7 days and 2 cycles which will allow the first cycle to be aged.

Retention has been defined for 7 Days and 2 Cycles. When the data aging operation runs on Saturday, the
cycles criteria has been met but not the days criteria

Commvault® Education Services Page 118 of 178


V11 SP18 Commvault® Engineer February 2020

Retention has been defined for 7 Days and 2 Cycles. When the data aging operation runs on Monday both
Cycles and Days criteria have been met and the first cycle will be marked as aged

Rule 3: Day is based on a 24 hour time period

A day is measured as a 24 hour time period from the start time of a data protection job. Days are considered constants,
since regardless of a backup being performed or completed successfully, the time period will always be counted. If a
backup fails, backups are not scheduled, or if power goes out, a day will still count towards retention. Therefore it is so
critical to measure retention in cycles and days. If retention was just managed by days and no backups were run for a few
weeks, all backup data may age off leaving no backups.

Example: Defining retention in both days and cycles is very important. For example, during a Friday night backup power is
lost in the building. Power is restored on Sunday resulting in two days elapsing and counting towards retention. Note that
since the Friday full backup failed, the cycle continues into the next scheduled full (following Friday).

A failure of a full backup on Friday due to a power outage results in a cycle continuing until a valid full is
completed

Commvault® Education Services Page 119 of 178


V11 SP18 Commvault® Engineer February 2020

Spool Copy
Right-click the primary storage policy copy | Click Properties | Retention tab

The Spool Copy option is used for fast disk read/write access and its multi-streaming capabilities – when there is limited
capacity available on the disks. A spool copy is not a retention copy. Data is spooled to disk and then copied to a
secondary copy. Once the data is successfully copied to the secondary copy, the data on disk is pruned, immediately
freeing up space for new backups.

The Spool Copy option is not available when using deduplication.

Extended Retention
Right-click the desired storage policy copy | Click Properties | Retention tab

Standard retention allows you to define the length of time based on cycles and days that you want to retain data.
Extended retention allows you to define specific retention in days that you want to keep full backups for. It allows you to
extend the basic retention by assigning specific retention to full backups based on criteria configured in the extended
retention settings. Basically, it allows you to set a grandfather, father, son tape rotation scheme.

Extended retention rules are not designed to be used with disk storage and will have significant negative effects on aging
and pruning of deduplicated data.

Example: You want to retain backups for 4 cycles and 28 days. You also want to retain a monthly full backup for three
months, a quarterly full backup for a year, and a yearly full backup infinitely.

To accomplish this, you configure retention as follows:

• Standard retention is set for (4,28)


• Extended retention is configured for:
o 90 days keep monthly full
backups o 365 days keep quarterly
full backups o An infinite keep
yearly full backup
Extended retention rules are like selective copies in that they only apply to full backups. However, a selective copy creates
an additional copy of a full backup and assigns it a specific retention. Extended retention applies retention to an existing
full backup and does not create an additional copy. Determine which solution is more appropriate when planning retention
strategies.

Zero Cycle Retention


It is possible to configure a storage policy copy for a zero-cycle retention. However, this can cause undesired results
where data is pruned, but not stored. This is a common mistake when administrators don’t fully understand how
Commvault cycles and days retention works.

It is NOT recommended to set zero cycles for a policy copy unless another copy has been configured
with at least one cycle defined.

Commvault® Education Services Page 120 of 178


V11 SP18 Commvault® Engineer February 2020

Item Based Retention


Item based retention is used to apply retention to protected data based on individual files and email messages. This
provides granular retention to meet data recovery requirements, compliance requirements, and optimize storage media.

The following Commvault® agents support item based retention:

• File system agents using subclient retention settings


• Exchange Mailbox agent using Configuration policies
Depending on the agent being used, one of two methods are used to implement item based retention:

• Synthetic full item carry forward – this method does not directly prune items that have
exceeded retention. Instead, upon deletion of an item either by the user or the agent, items are
carried forward with each synthetic full backup until its 'days' retention is exceeded. Once the
synthetic full ages based on storage policy copy retention, the item no longer exists. This method is
used for file system agents using V1 indexing and is configured in the Subclient Properties.
• Index masking – this method marks the item as unrecoverable by masking the item in the index.
This method requires V2 indexing. This method is implemented for file system agents using V2
indexing in the Subclient Retention tab and for Exchange Mailbox agent using Configuration
policies.

Item Based Retention Benefits:


• Compliance – certain compliance regulations require item based retention. Using job based
retention can result in items being retained beyond their required retention policies.

Commvault® Education Services Page 121 of 178


V11 SP18 Commvault® Engineer February 2020

• Defensible deletion – some items, specifically email messages, must be destroyed when they are
deleted from the production mail server. Item based retention can provide defensible deletion of
items.
• Efficient media usage – Consider the benefit of managing one year of off-site data on
considerably fewer tapes. Typically, when data is sent off-site on tapes, the same stale data exists
each time a set of tapes is exported. If data is sent off-site weekly on tape, 52 versions of the same
stale item exists.
Example: Using item-based retention when secondary tape copies are created, only the items contained within the most
recent synthetic full backup are copied to tape. If the retention is set to 365 days, then each tape set contains all items
within the past year. This means with a standard off-site tape rotation of 30 days, 365 days of data exists on each set.

Synthetic Full Item Carry Forward Using V1 Indexing


Retention settings defined in the Subclient Properties currently uses the 'synthetic full carry forward' method. To
understand how this method works, first an understanding of synthetic full protection jobs is required.

Synthetic Full Protection Jobs


A synthetic full backup synthesizes a full backup by using previous data protection jobs to generate a new full backup.
Objects required for the synthetic full backup are pulled from previous incremental or differential backups and the most
recent full. To determine which objects are required for the synthetic full, an image file is used. An image file is a logical
view of the folder structure including all objects within the folders and is generated every time a traditional backup is
executed. The synthetic full backup uses the image file from the most recent traditional backup that was conducted on the
production data to determine which objects are required for the new synthetic full.

When an image file is generated, all objects that exist at the time of the scan phase of the backup job are logged in the
image file. This information includes date/time stamp and journal counter information, which is used to select the proper
version of the object when the synthetic full runs. If an object is deleted prior to the image file being generated, it is not
included in the image file and is not backed up in the next synthetic full operation. The concept of synthetic full backups
and deleted objects not being carried over in the next synthetic full is the key aspect of how object based retention works.
Synthetic full concept diagram

Commvault® Education Services Page 122 of 178


V11 SP18 Commvault® Engineer February 2020

Deleted Items Carry Forward


When subclient retention is configured, items which have been deleted by the user or by the system during an archive job
are carried forward to the next synthetic full based on the number of days specified. Once the days have been exceeded,
the item is no longer carried forward in the next synthetic full job. The item still exists in the synthetic full already generated
until the 'days and cycles' criteria defined in the primary copy are exceeded. This means that the total retention time of the
item upon deletion, is a sum of the days defined in the subclient, and the 'days and cycles' defined in the primary copy.

Multiple Versions Carry Forward


Multiple versions of an item can also be carried forward. This allows an item that has been modified to have all modified
versions moved forward with each synthetic full. If the number of versions is set to five, five versions are carried forward. If
the item is modified again, upon the next synthetic full, the oldest version is dropped and the most recent five are carried
forward. If the item is deleted from the production system, all five items are carried forward until the defined days have
been exceeded.

The synthetic full carry forward method is used for V1 file system subclients using subclient retention
rules.

Synthetic full operation using subclient retention

Subclient and Storage Policy Retention Combination


It is important to note that subclient retention is not used in place of storage policy based retention, instead the two
retentions are added to determine when an object is pruned from protected storage. If an object is carried forward for 90
days upon deletion, each time a synthetic full job runs, it is carried forward until the 90 days elapses.

The synthetic full backups themselves are retained based on the storage policy copy retention rules. So, if the storage
policy copy has a retention of 30 days and 4 cycles, then a synthetic full remains in storage until the job exceeds retention.
In this instance, the object is carried forward for 90 days and the last synthetic full that copies the object over is retained

Commvault® Education Services Page 123 of 178


V11 SP18 Commvault® Engineer February 2020

for 30 days, the object therefore remains in storage from the time of deletion for 120 days – 90 day subclient retention and
30 days storage policy copy retention.

Storage Policy Secondary Copies


Item based retention applies to how long an item is carried forward when synthetic full backups are executed. This applies
to backup jobs managed by the storage policy primary copy. Secondary copies always have retention applied to the copy
in the traditional manner. If subclient retention is set to 90 days, storage policy primary copy retention is 1 cycle and 0
days, and synthetic full backups are being run daily; a deleted item will be retained for 91 days. If a secondary copy has
been configured with a retention of 8 cycles and 90 days, the object may be retained for up to an additional 90 days.

How long a deleted object is potentially retained in a secondary copy depends on the copy type. If the secondary copy is a
synchronous copy then the deleted object will always be retained for the retention defined in the secondary copy since all
synthetic full backups will be copied to the secondary copy. Selective copies however, allow the selection of full backups
at a time interval. If synthetic full backups are run daily and a selective copy is set to select the month end full, then any
items that are not present in the month end synthetic full will not be copied to the selective copy. To ensure all items are
preserved in a secondary copy, it is recommended to use synchronous copies and not selective copies.

Index Masking Using V2 Indexing


Index masking masks deleted items from all restore operations. The V2 index tracks all messages and files at a granular
level. When an item is protected, a field in the database is set to 'visible' for each item. When the item exceeds retention,
the field is marked to 'mask' the item. When browse or find operations are run, the masked items do not appear. If aging
activity is disabled at a client or client group level, all messages belonging to the client or group are not aged during the
aging process.

By default, a cleanup process runs every 24 hours. This process checks the Retention Policy's 'Retain for' setting for
messages or the subclient retention for files and marks all items exceeding retention as invisible. It is important to note
that if the 'Retain for' setting or the subclient retention is changed, (i.e., decreasing the number of days), the next aging
process immediately follows the new retention value.

If Exchange Mailbox agent data is copied to secondary copy locations, the days setting defined in the Retention Policy is
not honored. Instead, standard storage policy copy retention determines how long the messages are retained. In other
words, the primary copy manages all items at a granular level and secondary copies manage the retention at the job level.
From a compliance standpoint, this is an important distinction and should be taken into consideration when defining data
retention and destruction policies.

If that the V2 index is lost and restored to a previous point-in-time, it is possible that previously masked items will be set to
visible. The next time the aging process runs, these items will be re-masked making them unrecoverable.

From a compliance standpoint, defensible deletion of items is crucial. There is the possibility that email messages or files
copied to secondary storage such as tape media, could potentially be recovered using the Media Explorer tool. To ensure
that this cannot occur, enable the 'Erase Data' checkbox for any storage policies managing Exchange Mailbox agent data.
Note that the 'Erase Data' option is enabled by default for all data management storage policies.

Subclient Retention
Right-click Subclient | properties | Advanced | Retention tab

Subclient retention should only be used for users' data. When using synthetic full backups, subclient retention can be
applied to both backup and archive operations only.

These settings only apply to files (or stubs) that are deleted from the system.

Enable subclient retention key points:

• Blocks the use of traditional full backups, only synthetic full backups are allowed.
• Enables the use of archiver or backup retention options.

Commvault® Education Services Page 124 of 178


V11 SP18 Commvault® Engineer February 2020

• Enables the selection of older versions or number of versions of files.


• Enables the subclient 'Disk Cleanup' tab that allows to configure Commvault OnePass ® archive
settings.

Enable Archiver Retention Only


If 'Archiver Retention' is only enabled, synthetic full backup jobs are disabled along with full backups. Retention is data
protection job-based measured by time only. Cycle retention criteria specified in the storage policy copy is ignored.
Retention time will be the longest time specified between the 'Archiver Retention' and the storage policy copy's 'Days'
retention. As each data protection job exceeds the time criteria, that job becomes eligible for aging and pruning.

Enable Backup Retention Only


If 'Backup Retention' is only selected, synthetic full backups are allowed. Retention is job-based measured by both time
and cycles. The time specified for 'Backup Retention' is additive to the days criteria specified in the associated storage
policy copy.

Example: You enable 'Backup Retention' on the subclient Retention tab and set the 'After deletion keep items for <period
of time>' option time value to 1 month. The 1 month (30 day) count starts from the last time the deleted file appeared in a
data protection job's scan. Appearance in a data protection job scan means the file is considered to be "in image." An "in
image" file always has a copy in protected storage. A synthetic full backup job keeps the deleted file "in image" for the
specified time. Once the backup retention time has passed, storage policy retention is applied. The deleted file appears
last in the most recently completed synthetic full backup job. Storage policy copy retention then retains that job for its
cycle and days retention criteria. Synthetic full backup jobs must be run to enable aging and pruning of data from media.

Enable Archiver and Backup Retention


If both 'Archiver' and 'Backup' retention are selected, synthetic full backups are allowed. Retention is either time or
jobbased depending on whether the file is deleted or not.

For files and stubbed files:

Retention is cycle and time-based. Files or stubbed files are extended on media by both the archiver and backup retention
time based on their file modification time. Once this retention has been exceeded, the storage policy copy retention 'Days
and Cycles' criteria are applied. Synthetic full backups must be run to allow aging and pruning of data from media.

Note: A stub file supports non-browse recovery operations (i.e., stub recalls) and acts as a place holder to persist the
associated file on media through synthetic full backups. Stub files have the same modification time as the associated file.
Deleting a stub is equivalent to deleting the file.

For deleted files:

The 'Retention of deleted files on media' is time-based only using the deleted file's modification time (MTIME). Based on
the MTIME, the deleted file is retained on media for the 'Archiver Retention' time plus the 'Backup Retention' time. So, if
'Archiver Retention' was set to 2 years and 'Backup Retention' set to 1 month, the total retention time on media for deleted
files would be 2 years and 1 month from the deleted file's last modification time.

Note: If 'Archiver Retention' is set to 'Extend Retention' indefinitely (default), 'Backup Retention' is un-selectable. To select
both options, you need to select the 'Archiver Retention' option to 'Extend Retention' for <a period of time>.

File Versions Retention


The 'Retention of File versions' is either number-based or time-based. For example, you can retain the last 3 versions of a
file or you can retain any versions created in the past 90 days.

Retaining previous file versions essentially applies the same retention clock basis (file modification time) used for the
current version to all versions qualified by the criteria.

Commvault® Education Services Page 125 of 178


V11 SP18 Commvault® Engineer February 2020

Deleting Subclients Configured with Subclient Retention


When a file system agent that has the 'Subclient Retention Settings' enabled is deleted, the last cycle has infinite retention
applied. This ensures a lock down of all existing protected data since the retention settings defined in the subclient no
longer exist. If data within the last cycle is no longer needed, delete the jobs by viewing the job history in the storage
policy primary copy. The contents of the subclient is included in the default subclient for future data protection jobs.

Commvault® Education Services Page 126 of 178


V11 SP18 Commvault® Engineer February 2020

VIRTUALIZATION

Commvault® Education Services Page 127 of 178


V11 SP18 Commvault® Engineer February 2020

Virtualization Primer
Virtualization has become the standard of data center consolidation whether on-premises or in the cloud. As the number of
virtual machines and the physical hosts they run on grows, a comprehensive protection strategy is required to ensure
proper protection. Commvault® software provides several protection methods for virtual environments on premises and in
the cloud. These methods provide a comprehensive enterprise hybrid protection strategy.

There are four primary methods Commvault® software can use to protect virtual environments:

• Virtual Server Agent (VSA)


• Application Aware backup integrating the VSA and application plugins
• Agents installed within virtual machines
• IntelliSnap® Technology
Which method is best to use depends on the virtual infrastructure, type of virtual machines being protected, and the data
contained within the virtual machines. In most cases using the Virtual Server Agent (VSA) is the preferred protection
method. For specific virtual machines, using 'application aware' backups or an agent directly installed within the VMs is
the preferred method. For mission critical virtual machines, large virtual machines or virtual machines with high I/O
processes, the IntelliSnap feature is used to coordinate hypervisor software snapshots with array hardware snapshots to
efficiently protect virtual machines while minimizing the performance impact of the virtual infrastructure.

Virtual Server Agent (VSA)


The Commvault Virtual Server Agent (VSA) interacts with the hosting hypervisor to provide protection at the virtual
machine level. This means agents do not need to be installed directly on the virtual machines, although installing
restoreonly agents provides a simplified method for restoring data back to the VM.

Depending on the hypervisor application being used and the virtual machine's operating system, different features and
capabilities are available. The VSA interfaces with the hypervisor's APIs and provides capabilities inherent to the
application. As hypervisor capabilities improve, the Commvault VSA agent is enhanced to take advantage of new
capabilities.

Agent-Based Protection
Agent-based protection uses Commvault agents installed directly in the virtual machine. When an agent is installed in the
VM, it appears in the CommCell® console just like a regular client and the functionality is the same as an agent installed
on a physical host.

The main advantage with this configuration is that all the features available with Commvault agents are used to protect
data on the VM. For applications, using a Commvault agent provides complete application awareness of all data
protection operations including streaming log backups, granular item-based protection, archiving and content indexing.

VSA Application Aware Protection


VSA application aware backups insert an 'application plugin' into the VM during a VSA backup. When a VM backup runs,
the plugin quiesces the application using a VSS snapshot. The VSA coordinator then communicates with the hypervisor to
conduct a VM snapshot. This protection method provides a hybrid approach using the VSA to conduct data protection
jobs, and agent-based functionality for recovery, similar to installing an agent directly in the VM.

IntelliSnap® for VSA


The Commvault IntelliSnap® feature provides integration with supported hardware vendors to conduct, manage, and
create backup copies of snapshots. This technology is used to snap VMs at the Datastore level and back them up to
protected storage.

The process for protecting virtual machines is similar to performing snapshots with the VSA agent directly interfacing with
the hosting hypervisor application. The VSA first quiesces the virtual machine and then the IntelliSnap feature uses vendor
API's to perform a hardware snapshot of the Datastore. The Datastore is then mounted on an ESX proxy and all VMs are
registered. Finally, the VMs are backed up and indexes are generated for granular level recovery. The snapshots can also
Commvault® Education Services Page 128 of 178
V11 SP18 Commvault® Engineer February 2020

be maintained for live browse and recovery. The backup copies are used for longer term retention and granular browse
and recovery.

Transport Modes
The VMware® VADP framework provides three transport modes to protect virtual machines:

• SAN transport mode


• HotAdd mode
• NBD and NBD SSL mode
Each of these modes has their advantages and disadvantages. Variables such as physical architecture, source data
location, ESX resources, network resources and VSA proximity to MediaAgents and storage have an effect on
determining which mode is best to use. It is also recommended to consult with Commvault for design guidance when
deploying Commvault® software in a VMware environment.

SAN Transport Mode


SAN Transport Mode is used on a VSA proxy with direct Fibre channel or iSCSI access to snapshot VMs in the source
storage location. This mode provides the advantage of avoiding network movement of VM data and eliminates load on
production ESX servers.

Virtual machines are backed up through the VSA and to the MediaAgent. If the VSA is installed on a proxy server
configured as a MediaAgent with direct access to storage, LAN-Free backups can be performed. For best performance,
Commvault recommends that the VSA have a dedicated HBA to access the VMDK files. If an iSCSI SAN is used, we
recommend a dedicated Network Interface Card on the VSA for access to the SAN.

Commvault® Education Services Page 129 of 178


V11 SP18 Commvault® Engineer February 2020

VSA backup process using SAN transport mode

HotAdd Mode
HotAdd mode uses a virtual VSA in the VMware environment. This requires all data to be processed and moved through
the VSA proxy on the ESX server. HotAdd mode has the advantage of not requiring a physical VSA proxy and does not
require direct SAN access to storage. It works by 'hot adding' virtual disks to the VSA proxy and backing up the disks and
configuration files to protected storage.

A common method of using HotAdd mode is to use Commvault® deduplication with client-side deduplication, DASH Full
and incremental forever protection strategy. Using Change Block Tracking (CBT), only changed blocks within the virtual
disk have signatures generated and only unique block data are protected.

This mode is also useful when there is no physical connectivity between the physical VSA proxy and the Datastore
storage preventing the use of SAN transport mode. Some examples of such scenarios are when using NFS Datastores or
using ESX hosts local disk storage to host Datastores.

Commvault® Education Services Page 130 of 178


V11 SP18 Commvault® Engineer February 2020

VSA backup process using HotAdd transport mode

NBD Mode
NBD mode uses a VSA proxy installed on a physical host. The VSA connects to VMware and snapshots will be moved
from the ESX server over the network to the VSA proxy. This method requires adequate network resources. NBD mode is
the simplest method to protect virtual machines.

VSA backup process using NBD transport mode

Commvault® Education Services Page 131 of 178


V11 SP18 Commvault® Engineer February 2020

Hyper-V Transport Modes


Commvault® software uses VSA proxies to facilitate the movement of virtual machine data during Hyper-V backup
operations. The VSA proxies are identified in the instance properties. For Microsoft Hyper-V, the VSA is installed on each
hypervisor host. VMs can be protected from each host or a VSA proxy can be designated to protect VMs. The proxy must
have access to all clustered shared volumes where VMs reside.

Hyper-V Transport Mode

Commvault® Education Services Page 132 of 178


V11 SP18 Commvault® Engineer February 2020

Virtual Server Agent Backup Process


The VSA works by communicating with the hosting hypervisor to initiate software snapshots of virtual machines. Once the
VMs are snapped, the VSA backs them up to protected storage.

The following steps illustrate the process of backing up VMware® virtual machines:

1. Virtual Server Agent communicates with the hypervisor instance to locate virtual machines defined
in the subclient that requires protection.
2. Once a virtual machine is located, the hypervisor prepares the virtual machine for the snapshot
process.
3. The virtual machine is placed in a quiescent state. For Windows ® VMs, VSS is engaged to quiesce
disks.
4. The hypervisor then conducts a software snapshot of the virtual machine.
5. The virtual machine metadata is extracted.
6. The backup process then backs up all virtual disk files and VM configuration files.
7. Once the disks are backed up, indexes can optionally be generated for granular recovery.
8. Finally, the hypervisor deletes the snapshots.

Commvault® Education Services Page 133 of 178


V11 SP18 Commvault® Engineer February 2020

Virtual Server Agent Proxy Roles


Virtual Server Agent (VSA) proxies are defined at the instance level of the VSA pseudo client. The top listed VSA proxy is
designated as the coordinator and all other proxies are designated as data movers. Note that the Coordinator proxy also
acts as a data mover. The coordinator is responsible for communicating with the hypervisor to get information about VMs
and distribute VM backups to data mover proxies. Data mover proxies communicate with the coordinator proxy and
provide information on available resources and job status. If the coordinator proxy is unavailable, the next proxy in the list
assumes the role of coordinator. If a data mover proxy becomes unavailable, the coordinator proxy assigns jobs to other
available proxies.

Virtual Machine Distribution Process


When a VSA subclient backup starts, the coordinator receives a list of all virtual machines listed in the subclient. Based on
a defined set of rules, the coordinator creates a dynamic VM queue to determine the order in which virtual machines will
be protected and which VSA proxies will back up each virtual machine.

Subclient Data Readers


The data readers setting in the advanced tab of the subclient defines the maximum number of streams used for the
backup. When the job starts, if there are more VMs than available streams, each VM is allocated a single stream. If there
are more streams than VMs, the coordinator automatically instructs the data mover proxy to use multiple streams for the
VM backups. Depending on the number of available streams, each virtual disk in the VM is backed up as a single stream.
This process is dynamic so as a job progresses, and more streams become available and less VMs require protection,
multiple streams can be used to protect individual VMs.

Commvault® Education Services Page 134 of 178


V11 SP18 Commvault® Engineer February 2020

Stream allocation when there are more VMs than data readers

Stream allocation when there are more data readers than VMs

DataStore Distribution
If VMs within a subclient exist across multiple Datastores, the coordinator assigns VMs to proxies, one VM per Datastore
until the maximum stream count is reached. Each VM is assigned to a different data mover proxy, balancing stream loads
across proxies based on proxy resources. This distributes the load across multiple Datastores, which improves backup

Commvault® Education Services Page 135 of 178


V11 SP18 Commvault® Engineer February 2020

performance and maintain a healthy Datastore state. In addition to the subclient Data Readers setting, a hard limit can be
set for the maximum number of concurrent VMs that can be protected within a single Datastore using the
nVolumeActivityLimit additional setting.

DataStore stream allocation

VSA Proxies
Commvault® software uses VSA proxies to facilitate the movement of virtual machine data during backup and recovery
operations. The VSA proxies are identified in the instance properties. For Microsoft Hyper-V, each VSA proxy is
designated to protect virtual machines hosted on the physical Hyper-V server. For VMware, the VSA proxies is used as a
pooled resource. This means that depending on resource availability different proxies may be used to backup VSA
subclients each time a job runs. This method of backing up virtual machines provides for higher scalability and resiliency.

Commvault® Education Services Page 136 of 178


V11 SP18 Commvault® Engineer February 2020

VSA Proxy placement for VMware® and Hyper-V®

VM and VSA Proxy Distribution Rules


Datastore distribution is the primary rule that determines the order in which VMs are backed up. Additional rules that
determine VM backup order are:

1. Number of proxies available to back up a VM – The fewer proxies available, the higher in the
queue the VM is. This also is dependent on transport mode. If the transport mode is set to Auto
(default), SAN have highest priority, followed by HotAdd and then NDB mode. If a specific transport
mode is defined in the subclient, only proxies that can protect the VM can be used – this could
affect the available number of proxies which could result in a higher queue priority.
2. Number of virtual disks – VMs with more virtual disks are higher in the queue.
3. Size of virtual machine – Larger VMs are higher in the queue.

Stream Allocation and Proxy Throttling


During backup operations, the coordinator proxy gathers information on each data mover proxy to determine the default
maximum stream count each proxy can handle. This is based on the following:

• 10 streams per CPU


• 1 stream per 100 MB available RAM
When the coordinator assigns jobs to the data mover proxies, it evenly distributes jobs until the default maximum number
of streams on a proxy is reached. Once the threshold is reached it no longer assigns additional jobs to the proxy. If all
proxies are handling the maximum number of streams and there are still streams available, the coordinator assigns
additional jobs to proxies using a round-robin method.

Throttling can be hard set on a per proxy basis using the following registry keys:

• nStreamsPerCPU – limits the number of streams per CPU on the proxy


• nMemoryMBPerStream – sets the required memory on the proxy for each stream
Commvault® Education Services Page 137 of 178
V11 SP18 Commvault® Engineer February 2020

• nStreamLimit – sets a limit on the total number of streams for a proxy


• bHardStreamLimit – sets a hard stream limit across all proxies within the VSA instance

To create the Coordinator Additional Settings key

1. Right-click the VSA Coordinator or failover candidate | Click Properties.

2. Click Advanced.

3. Click Add to create the key.

4. Click Lookup and find the key.

5. Type the value for the Key.

6. The key is displayed in the Additional Settings tab.

Commvault® Education Services Page 138 of 178


V11 SP18 Commvault® Engineer February 2020

Disable Dynamic Assignment


To disable dynamic VM assignment and force static assignment, configure the
DynamicVMAssignmentAllowed additional setting on the coordinator (and failover candidates) and set the value to
'false.' If this additional setting is configured with a value of 'true,' dynamic VM assignment is allowed, but is not forced.
When dynamic VM assignment is disabled, virtual machines are assigned to proxies at the beginning of the job and
assignments are not modified during the job.

Hyper-V Dynamic Distribution


There are several differences in the dynamic distribution of VMs in a Hyper-V environment:

• Cluster Shared Volume (CSV) owner – VMs are protected based on the VSA proxy that owns the
cluster.
• Cluster – If CSV owner is not in the proxy list, VMs are dispatched to any node in the cluster.
• Host – When the hypervisor host is a VSA proxy and in the proxy list, the host VSA proxy is used.
• Any Proxy – If the hypervisor host is not a proxy or not in the list, VMs are distributed to any
available proxy.

VSA Coordinator or Proxy Failover


If a VSA proxy protecting a virtual machine goes offline, VMs are returned to the priority queue. The next available proxy is
assigned to the re-queued VM.

If the VSA coordinator proxy goes offline, VSA backup jobs managed by the coordinator are placed in a pending state. The
next proxy in the list assumes the role of the active coordinator proxy and jobs will return to a running state. Any VMs that
were in process of being protected are re-queued and restarted.

Commvault® Education Services Page 139 of 178


V11 SP18 Commvault® Engineer February 2020

Stream Management and VSA


Using the VSA coordinator and data mover proxies, along with the intelligent load distribution, the number of streams that
can be used to protect VMs can be set higher than in previous software versions. However, it is still important to
understand possible negative consequences of misconfigurations.

The data readers setting in the subclient is the primary governor to determine the maximum number of VMs or virtual
disks that can be protected at a given time. The load distribution attempts to balance VM backups across disk volumes.
However, if the VMs requiring protection reside on only a few volumes and the data readers is set too high, problems can
occur.

When a VM backup runs, a software snapshot is taken where block changes are cached, the frozen disk data is read from
the volume, and normal I/O still occurs on the volume. With these three actions occurring simultaneously, if there are too
many snap and backup operations occurring, significant performance degradation can occur. This can also cause major
issues during snapshot cleanup operations.

As a general rule of thumb, each disk volume should have two concurrent snap and backup operations as a starting point.
This number may vary greatly based on whether the disks are SAN or NAS, the size of the disks, and the performance
characteristics. Consider the significant performance difference between spinning versus solid state disks. The two data
readers per disk volume is a starting point. Adjust these numbers to a point where backup windows are being met. Also,
consider mapping subclients to specific disk volumes and adjusting the data readers based on the performance
characteristics of the disks.

Commvault® Education Services Page 140 of 178


V11 SP18 Commvault® Engineer February 2020

Virtual Server Agent Settings

Virtual Machine Swap File Filtering


When backing up VMware® or Hyper-V virtual machines, by default, the VSA filters the Windows page file or Linux swap
file. To achieve this, the system maps the virtual machine disk blocks from which the page file or swap file is made of.
These blocks are skipped during the backups, significantly reducing the storage footprint and the backup time.

It is possible to disable the skipping of page and swap files by creating the bSkipPageFileExtent additional setting on
the VSA proxy and by setting its value to 0 (zero).

Commvault® Education Services Page 141 of 178


V11 SP18 Commvault® Engineer February 2020

Swap or Page file filtering during VSA backups

Virtual Machine Filtering


Virtual machines can be filtered by browsing for VMs or adding specific criteria for VM filtering. This can be useful when
content is being defined at a parent level but specific virtual machines are to be excluded from backup. For instance, if the
subclient is configured to auto-discover and protect all VMs within a specific Datastore, but there are few virtual machines
that do not require protection, they can be added as filters. Virtual machines can be defined as filters at the subclient or at
the backup set level.

If your subclient’s content is defined using auto-discovery rules, it is recommended to define VM filters
at the backup set level to ensure that none of the subclients back up the VM.

Virtual Disk Filtering


For some hypervisors, such as VMware and Hyper-V, disk level filtering can also be applied. This provides the ability to
filter disks based on host, Datastore, VMDK, VHD or VHDX name pattern or hard disk number. This can be useful when
certain disks do not require protection or if Commvault agents installed within the VM are used to protect data.

Example: A database server requires protection. For shorter recovery points and more granular backup and recovery
functionality, a database agent can be used to protect application database and log files. For system drives, the virtual
server agent can be used for quick backup and recovery. Disks containing the database and logs should be filtered from
the VSA subclient. The VSA will protect system drives and the application database agent will be used to protect database
daily and log files every 15 minutes. This solution provides shorter recovery points by conducting frequent log backups,
application aware backup and restores, and protects system drives using the virtual server agent.

VSA Instance Configuration


Once the VSA software has been installed on all the desired proxies, the VSA pseudo client, or instance, can be
configured. When configuring the instance, a list of proxies must be defined. The first proxy in the list acts as the VSA
proxy coordinator.

Commvault® Education Services Page 142 of 178


V11 SP18 Commvault® Engineer February 2020

Default Subclient Content


Right-click the default subclient | Click Properties | Content tab

The default subclient content tab contains a backslash entry, like the Windows® File System agents to signify the subclient
as a catch all. Any VMs not protected in other subclients are automatically protected by the default subclient. It is
recommended that the default subclient contents is not changed, activity is not disabled and the default subclient is
regularly scheduled to back up, even if there are no VMs in the subclient.

To avoid protecting VMs that do not need to be backed up, use the backup set level filters and add all VMs that don't
require protection. Complying with these best practices ensures that if a VM is added in the virtualization environment,
even if the Commvault® system administrator is unaware of the VM, it gets protected by the default subclient.

VM Content Tab
Right-click the desired subclient | Click Properties | Content tab

VSA subclient contents are defined using the Browse or Add buttons. Browse provides a vCenter like tree structure where
resources can be selected at different levels including Cluster or Datastore. For most environments, it is recommended to
select subclient contents at the cluster level. For smaller environments, or for optimal performance, defining subclient
contents at the Datastore level can be used to distribute the backup load across multiple Datastores.

The Add option is used to define discovery rules for VM content definition. Multiple rules can be nested such as all
Windows® VMs in a specific Datastore.

Discovery Rules
Right-click the desired subclient | Click Properties | Content tab | Add

You can refine the selection of virtual machines for subclient content by defining rules that identify specific virtual
machines based on their properties. These rules are used in conjunction with other discovery rules that identify virtual
machines based on operating system, server, and storage location.

Custom virtual machine properties can include:


VM Name/Pattern Enter the display name of the virtual machine or a pattern using wildcards (for example, Test*
to identify VMs for which the VM name begins with "Test"). You can also click ... to browse for a
VM.
Host Enter the host name as it appears in vCenter, the IP address of the host, or a host name
pattern using wildcards. You can also click ... to open the Browse dialog box. When you add a
host, all virtual machines on the host are included in the backup.

DataStore Enter the DataStore name or a pattern. You can also click ... to open the Browse dialog box.

Guest OS Enter the exact name of the operating system or a pattern to identify an operating system
group (for example, Win* to identify any virtual machine that has a version of the Windows®
operating system).

Guest DNS Enter a hostname or a pattern to identify a hostname or domain (for example,
Hostname myhost.mycompany.com to identify a specific host or *mycompany.com to identify all hosts
on that domain).

Power State Select the power on status of virtual machines to be included in the subclient content. You can
select one of the following options:
• Powered On - to identify VMs that are powered on
• Powered Off - to identify VMs that are powered off
• Other - to identify VMs with a different power on status, such as Suspended

Commvault® Education Services Page 143 of 178


V11 SP18 Commvault® Engineer February 2020

Notes Enter a pattern to identify virtual machines based on notes text contained in
vCenter annotations for the VM summary (for example, Test* to identify VMs with
a note that begins with "Test").
Custom Attribute Enter a pattern to identify virtual machines based on custom attributes in vCenter
annotations for the VM summary. You can enter search values for the names and
values of custom attributes. For example:
• Name Contains *resize* to identify VMs where the name of a custom
attribute contains the word "resize."
• Value Contains *128* to identify VMs where the value of a custom
attribute contains the number "128."

Transport Modes (VMware)


Right-click the desired subclient | Click Properties | General tab

The VMware transport mode is configured in the General tab of the subclient. The default setting is Auto which will attempt
to use SAN or HotAdd mode and fall back to NBD mode if other modes are not available. To configure a specific transport
mode with no fall back, select the desired mode from the drop-down box.

Data Readers
Right-click the desired subclient | Click Properties | Advanced Options tab

The data readers setting in the advanced tab of the subclient properties is used to determine the number of streams used
for the subclient backup. This value must be set to meet backup windows while avoiding overloading DataStore, network,
and proxy resources.

Subclient Proxies
Right-click the desired subclient | Click Properties | Advanced Options tab

Proxies are defined in the VSA instance but can be overridden at the subclient level. This is useful when specific subclient
VM contents are not accessible from all VSA proxies. Proxies can be added, removed, and moved up or down to set proxy
priority.

Subclient and Backup Set Filters


Right-click the desired subclient | Click Properties | Filters tab

Subclient or backup set filters can be used to filter virtual machines or virtual machine disks for both Hyper-V and VMware.
If auto-discovery rules are used to define content, it is recommended to apply filters at the backup set level to ensure that
no subclients protect the VM.

Backup Options
Right-click the desired subclient | Click Properties | Backup Options tab

There are several subclient options that are specific to the VMware® and Hyper-V® VSA subclient.

• Quiesce guest file system and applications – Configured in the Quiesce Options tab, this is
used to enable (default) or disable the use of VSS to quiesce disks and VSS aware application for
Windows® virtual machines.
• Application aware backup for item-based recovery – Configured in the Quiesce Options tab,
this is available only when using the IntelliSnap feature and is used to conduct application aware
snapshots of virtualized Microsoft SQL and Exchange servers.

Commvault® Education Services Page 144 of 178


V11 SP18 Commvault® Engineer February 2020

• Perform Datastore free space check (VMWare only) – Configured in the Quiesce Options tab,
this sets a minimum free space (default 10%) for the Datastore to ensure there is enough free
space to conduct and manage software snapshots during the VM data protection process.

Auto Detect VM Owner


Right-click the desired subclient | Click Properties | Advanced Options tab

Virtual machine owners can be assigned automatically during virtual machine discovery, based on privileges and roles
defined in vCenter that indicate rights to virtual machines. When this feature is enabled, users and user groups who have
appropriate capabilities in vCenter and are also defined in the CommCell® console are automatically assigned as VM
owners in the client computer properties for the virtual machine.

This feature enables administrators and end users to access virtual machine data without requiring that they be assigned
as VM owners manually. Depending on the permissions and role a user has in vCenter, they can view virtual machine
data or recover VM data. Any user with Remove VM, VM Power On, and VM Power Off capabilities for a virtual machine is
assigned as an owner of that VM during VM discovery.
Owner IDs are only assigned during discovery for a streaming or IntelliSnap backup and are not modified by backup copy
or auxiliary copy operations.

Single sign on must be enabled on the vCenter and required vCenter capabilities must be configured for users and
groups.

Users or user groups defined in vCenter must also be defined in the CommCell interface, either through a local user
definition or a Name Server user definition (such as an Active Directory user or group).

VSA Advanced Restore Options


The VSA agent offers multiple Live Recovery features.

Commvault® Education Services Page 145 of 178


V11 SP18 Commvault® Engineer February 2020

• Live File Recovery – allows Commvault software to break open a backup or snapshot copy of a
virtual machine and recover individual files. This feature provides extended support for various file
system types. Use this feature to reduce backup times without sacrificing the capability to recover
individual files.
• Live Recovery for Virtual Machines – provides the ability to start a virtual machine almost
instantaneously while recovering it in the background. This provides an artificially enhanced RTO as
we do not have to wait for the full recovery operation to complete before accessing the virtual
machine.
• Live Mount – allows to power up virtual machines directly from the backup copy without having to
restore it or commit any changes. This allows access to the virtual machine for validation purposes,
testing or application level recovery via the provided mining tools.
• Live Sync – takes changed blocks from our standard VSA protection copy and overlays those blocks
to a warm standby VM at an alternate location thereby providing VM level replication. Live Sync can
be used to create and maintain warm recovery sites for virtual machines running critical business
applications.
Not all VSA features are supported on all hypervisors. For more information on supported features for your hypervisor,
refer to the Commvault Online documentation.

Live Mount
Expand Client Computer Groups | VSA instance | Right-click the desired VM | All Tasks | Live Mount

The Live Mount feature enables you to run a virtual machine directly from a stored backup. You can use this feature to
validate that backups are usable for a disaster recovery scenario, to validate the content on the backup, testing purposes,
or to access data from the virtual machine directly instead of restoring guest files.

Virtual machines that are live mounted are intended for short term usage and should not be used for production; changes
to live mounted VMs or their data are not retained when the virtual machine expires. The VM expiration period is set
through a Virtual Machine policy.

When a live mount is initiated, an ESX server is selected to host the virtual machine, based on the criteria set in the live
mount virtual machine policy. The backup is exposed to the ESX server as a temporary Datastore. The configuration file
for the live mounted VM is updated to reflect the name of the new VM, disks are redirected to the Datastore, and network
connections are cleared and reconfigured to the network selected for the live mounted VM. When this reconfiguration is
complete, the VM is powered on.

Tip: Using Live Mount for update validation

Situation: You are about to apply updates to a critical system and are concerned about the impacts on the
system.

Solution: Use Live Mount to power on the same system from the backups. Isolate it on its own network to avoid
duplicate hostname and IP address. Install and validate the update.

Live File Recovery


Right-click the desired subclient or backup set | Click All Tasks | Browse and Restore | Virtual Server tab

Commvault® Education Services Page 146 of 178


V11 SP18 Commvault® Engineer February 2020

Live File Recovery provides expanded file system support, including ext4, and enables live browse of backup data without
requiring granular metadata collection during backups. This option supports restores of files and folders from backups of
Windows VMs and of UNIX VMs that use ext2, ext3, ext4, XFS, JFS, or Btrfs file systems.

Live File Recovery can also be used to reduce backup times. This is a trade-off; using this feature reduces backup time
but increases the time required to browse files and folders. It is only supported for backups to disk storage targets.

To recover files or folders from a backup, you can enable backup data to be mounted as a temporary NFS Datastore that
can be used to browse and restore file and folders. The process is similar to an ISO file that you right-click and mount on a
Windows computer. The operating system virtually mounts the ISO file and cracks it open to display the content. In the
case of Live File Recovery, the Windows MediaAgent locates the virtual machine's blocks in the disk library. These blocks
are presented to the Windows operating system through a virtual mount driver. The VM file system is then cracked open
and the content is displayed in the console.

For Linux virtual machine, the file system cannot be mounted by the Windows MediaAgent. It requires a virtual Linux
MediaAgent on which the File Recovery Enabler for Linux (FREL) component must be installed.

For Service Pack 6 and earlier, a Linux VMware template containing the MediaAgent and FREL (downloadable from
Commvault cloud) needs to be deployed. Refer to the Commvault Online Documentation VMWare section.

Since Service Pack 7, simply deploy a Linux VM and install the MediaAgent code. If the system requirements are in place,
the FREL component is automatically installed with the MediaAgent software.

Enabling or disabling the Live File Recovery method is achieved by the 'Collect File Details' backup option of a subclient. If
it is check, traditional file recovery is used. If unchecked, Live File Recovery is used.

The default, for a new backup or schedule, is to use Live File Recovery.

If 'Collect File Details' was enabled, but you still want to use Live File Recovery, configure the following additional setting
key on the VSA proxy:

nEnforceLivebrowse with a value of 1

Performing a Live File Recovery is achieved through the usual guest files and folders recovery screens. The difference is
in the system mechanics.

Live VM Recovery
Right-click the desired subclient or backup set | Click All Tasks | Browse and Restore | Virtual Server tab

The Live Recovery feature enables virtual machines (VMs) to be recovered and powered on from a backup without waiting
for a full restore of the VM. This feature can be used to recover a VM that has failed and needs to be placed back in
production quickly, or to validate that a backup can be used in a disaster recovery scenario.

Basically, the disk library is presented to the virtualization environment. Then the VM is powered on from the disk library.
While it runs, the VM get moved back into the production Datastore using a storage 'vMotion' operation. All these tasks
are accomplished automatically by Commvault® software.

Live Sync
The Live Sync feature enables incremental replication from a backup of a virtual machine (source VM) to a synchronized
copy of the virtual machine (destination VM).

The Live Sync operation opens the destination VM and applies changes from the source VM backups since the last sync
point. It is important to understand that since it is achieved from the backups, Live Sync is not a real-time synchronization.

The Live Sync feature can initiate replication automatically after backups or on a scheduled basis (for example, daily or
once a week), without requiring any additional action from users. Using backup data for replications minimizes the impact
on the production workload by avoiding the need to read the source VM again for replication. Additionally, in cases where

Commvault® Education Services Page 147 of 178


V11 SP18 Commvault® Engineer February 2020

corruption on the source VM is replicated to the destination VM, users can still recover a point-in-time version of the
source VM from older backups.

If no new backups have been run since the last Live Sync, the scheduled Live Sync does not run.

When using Live Sync, it is recommended to use an incremental forever strategy. Run a first full backup, which gets
replicated to the destination. Then, only run incremental backups to apply the smallest changes possible to the
destination. Periodically, such as once a week, run a synthetic DASH full backup to consolidate backups in a new full
backup, without impacting the replication. If you execute a real full backup, the entire machine must replicate to the
destination.

Live Sync Configuration


Right-click the desired subclient or backup set | Live Sync | Configuration

Before you configure Live Sync, configure the vCenter client in the CommCell® console. If the destination uses a different
vCenter server, it must also be defined as a vCenter client. Then run the initial VM backups. The VM must be backed up
once and can then be added to a Live Sync schedule.

Live Sync from a Secondary Copy


Right-click the desired subclient or backup set | Live Sync | Configuration | Advanced | Copy Precedence tab

By default, Live Sync replicates from backups in the primary copy of a storage policy. It is possible to modify this behavior
to restore from a secondary copy. This can be useful when the VM is backed up to a disk library that is replicated to a
remote site where the replicated machine resides.

When Live Sync is configured to use an auxiliary copy or backup copy, the Live Sync operation uses the copy as the
source rather than the primary backup. If the 'After Backup Job Completes' option is selected in the schedule, Live Sync
automatically waits until the data is ready on the secondary copy before running the Live Sync job.

Live Sync Monitor


Right-click the desired subclient or backup set | Live Sync | Monitor

The Live Sync Monitor tool is used to monitor and control live sync replication. In addition to the replication status of VMs,
replication can be enabled/disabled and VM failover/failback can be initiated.

Live Sync Failover


From the Live Sync Monitor | Right-click the desired VM | Failover

From the Live Sync Monitor, the failover of a virtual machine can be initiated. It can be defined as a planned failover, for
testing purposes for instance, or unplanned, such as in a disaster situation. Once a VM was failed over, a failback
operation can be executed. In a failback, the VM from the failover location gets backed up and synced back to the primary
site.

Prerequisites to use failover feature:

• The Workflow engine must be installed on the CommServe® server.


• The 'allowToolsToExecuteOnServerOrClient' additional settings key with a value of seven (7) must be
created on the CommServe server.
• The VMs must have been synced at least once.
The Failover of a VM provides few options which are as follows:

• Test Boot VM – Powers on the replicated VM. It is useful to test and ensure that it is useable in the
case of a disaster. The destination VM is not modified to avoid any conflicts with the production VM.

Commvault® Education Services Page 148 of 178


V11 SP18 Commvault® Engineer February 2020

• Planned Failover – The planned failover is useful to test the complete failover scenario or to
conduct maintenance on the primary site. A planned failover achieves the following tasks:
1. Powers off the source VMs.
2. Performs an incremental backup of the source VMs
3. Runs Live Sync to synchronize the destination VMs with the latest changes
4. Disables Live Sync
5. Powers on the destination VMs with the appropriate network connections and IP addresses
• Unplanned Failover – The unplanned failover is used in a real disaster scenario where the primary
site is unavailable. In this scenario, the unplanned failover does not care about the primary site and
achieves the following tasks:
1. Disable Live Sync
2. Powers on the destination VMs with the appropriate network connections and IP addresses

VIRTUAL APPLICATION PROTECTION

Commvault® Education Services Page 149 of 178


V11 SP18 Commvault® Engineer February 2020

Virtual Application Protection Overview


Before determining which Commvault option is best to protect application data, an understanding of Crash Consistency
and Application Consistency is required. The consistent state of application data is essential to provide a backup which
can be restored in a proper state. Many applications do have a built-in reconciliation process that can return application
data to a consistent state. It's important to note that this process could take a long time and application experts may be
required to assist in the process. Using Commvault features to ensure a consistent state makes restore operations faster
and simpler.

Crash Consistent
Crash Consistent backups are based on point-in-time software snapshots and backup operations of a virtual machine that
allows the VM to be restored to the point in which it was snapped. When the snapshot occurs, all blocks on the virtual
disks are frozen for a consistent point-in-time view. The application is not aware that this process is occurring.

There are several issues when performing crash consistent snapshot and backup operations. The first issue is that if an
application is running on the virtual machine, it is not aware the snapshot is being taken. VSA communicates with the
hosting hypervisor to initiate snapshots at the VM level and there is no communication with the application. Any I/O
processes being conducted by the application will continue without any knowledge that the snap has been performed.
This can cause issues during restore operations as the application data will be restored to the exact point where the
software snapshot was conducted.

Example: a database application is conducting a maintenance to defrag and reorganize data within its files. In the middle
of this process, the software snapshot occurs. When the VM is restored, it will be placed in the state of the maintenance
period.

Another issue in this case would be larger than normal snapshots as all the block changes are cached to keep the
production virtual disk in a consistent state. This will cause a longer than normal cleanup process when the snapshot is
released and may cause storage space issues on the production volume.

Commvault® Education Services Page 150 of 178


V11 SP18 Commvault® Engineer February 2020

Application Consistent
With Application Consistent protection, the application itself is aware that it is being snapped. This awareness allows for
the data to be protected and restored in a consistent and usable state. Application aware protection works by
communicating with the application to quiesce data or by using scripts to properly quiesce the data. Application consistent
protection is not critical for file data but is critical for application databases.

There are several methods to provide application consistent protection:

• Commvault® agents
• Application Aware VSA Backup
• Application Consistent VSA Backup
• Scripting Database Shutdowns

Commvault Agents
An agent installed in the VM will directly communicate with the application running in the VM. The agent communicates
with the application to properly quiesce databases. A streaming backup of application data is then conducted. If the
application data is on an RDM volume, the application agent can be used with the IntelliSnap feature to quiesce the data
and snap the volume. A proxy host can be used to back up the data avoiding load on the VM or hypervisor. Using
application agents in the VM also provide database and log backup operations and a simplified restore method using the
standard browse and recovery options in the CommCell® console. Commvault agents in the hosting VM are
recommended for mission-critical high I/O applications.

Application Aware VSA backup


An application plugin is pushed to the VM to properly quiesce application data. The plugin communicates directly with the
application and the VSA to ensure a proper quiesce of application data. The quiesce process uses VSS to quiesce the
data and is supported on Windows-based VSS aware application including SQL, Exchange, SharePoint and Oracle on
Windows. It is important to note there are certain limitations for Exchange DAG and SQL always on configurations. Check
with the Commvault Online Documentation for the latest support and enhancements for Application Aware VSA backup.

This protection method is recommended on low to medium I/O applications.

Application Consistent VSA backup


The Volume Shadow Services (VSS) is used to quiesce application data. This method works for Windows-based
application that are VSS aware including SQL, Exchange and Oracle on Windows. When the VSS call is made to the VM,
any VSS aware applications attempt to quiesce. If the attempt is successful, the backup is application consistent.
However, if the VSS quiesce fails, which can occur if there is too much application I/O at the time of the quiesce, the
backup will only be crash consistent. This method is not recommended for high I/O virtual applications.

Scripting Database Shutdowns


Using external scripts which can be inserted in the Pre/Post processes of a subclient, or executed as part of a Workflow,
application data can be placed in an offline state to allow for a consistent point-in-time snap and backup operation. This
requires the application to remain in the offline state for the entire time of the snapshot operation. When the VM is
recovered, the application must be restarted after the restore operation completes. This method is only recommended
when Commvault agents are not available for the application.

Impact on Software Snapshots and Volumes during VM Backups


It is important to note that even with an application consistent backup, problems can still occur. For high I/O applications
running in virtual machines, software snapshots managed by the hypervisor can grow beyond a manageable level. This
can result in running out of disk space on the volume or a failure to clean up software snapshots.
Commvault® Education Services Page 151 of 178
V11 SP18 Commvault® Engineer February 2020

Agent Based Application Protection


Agent-based protection uses Commvault® agents installed directly in the virtual machine. When an agent is installed in the
VM, it appears in the CommCell® console just like a regular client and the functionality is the same as an agent installed on
a physical host.

The main advantage with this configuration is that all the features available with Commvault agents are used to protect
data on the VM. For applications, using a Commvault agent provides complete application awareness of all data
protection operations including streaming log backups, granular item-based protection, archiving and content indexing.

Commvault® Education Services Page 152 of 178


V11 SP18 Commvault® Engineer February 2020

Virtual Server Agent Application Aware Backup


Right-click the desired subclient | Click Properties | Backup Options tab

Application aware VSA backups inserts an 'application plugin' into the VM during a VSA backup and IntelliSnap® feature.
When a VM backup runs, the plugin quiesces the application using a VSS snapshot. The VSA coordinator then
communicates with the hypervisor to conduct a VM snapshot. If IntelliSnap is used, a hardware snapshot is taken on the
Datastore and then the software snapshot and VSS snap is released.

VSA Application Aware backup support as of SP14


Microsoft Microsoft Microsoft SQL Oracle database Oracle database
Hypervisor Exchange SharePoint Server for Windows for Linux
Amazon Yes Yes Yes Yes No
(streaming)
Amazon Yes Yes Yes Yes No
(IntelliSnap)
Microsoft HyperV Yes Yes Yes Yes No
(streaming)
Microsoft HyperV Yes Yes Yes Yes No
(IntelliSnap with
nonpersistent
snap engines)

Nutanix AVH Yes Yes Yes (only with (only with Linux
(streaming) Windows proxy) proxy)
Nutanix AVH Yes Yes Yes (only with (only with Linux
(IntelliSnap) Windows proxy) proxy)
OpenStack Yes Yes Yes (only with (only with Linux
(streaming) Windows proxy) proxy)

Commvault® Education Services Page 153 of 178


V11 SP18 Commvault® Engineer February 2020

Oracle VM Yes Yes Yes (only with (only with Linux


(streaming) Windows proxy) proxy)
Microsoft Microsoft Microsoft SQL Oracle database Oracle database
Hypervisor Exchange SharePoint Server for Windows for Linux
Red Hat Yes Yes Yes (only with (only with Linux
Virtualization Windows proxy) proxy)
(streaming)
VMware Yes Yes Yes (only with (only with Linux
(streaming) Windows proxy) proxy)
VMware Yes Yes Yes (only with (only with Linux
(IntelliSnap) Windows proxy) proxy)

To enable application aware VSA backups, a user account with administrative privileges for the application must be used.
This account can be entered at the instance or subclient level. When the VSA backup runs, the system detects if any
supported agents are installed in the VM and automatically installs the application plugin. After the backup completes, the
plugin remains in the VM for subsequent backup operations. Application data recovery is conducted using the agent in the
CommCell® console, providing full agent level recovery options.

Application Aware Backup additional prerequisites:

• MediaAgent software must be installed on the VSA proxy


• A snap copy must be created in the storage policy receiving the backup
When the first backup is initiated, a 'VSAAppAwareBackupWorkflow' is initiated. The workflow executes required tasks to
properly protect the application.

VSAAppAwareBackupWorkflow high level phases:

1. It validates that the MediaAgent software is installed on the VSA proxy server
2. It validates that the Snap Copy is created for the storage policy
3. It discovers if a supported application is installed in the VM
4. It pushes the application plugin
5. It protects the application

SQL Transaction Log Backup Support


The VSA application aware backups for SQL server has been enhanced to include an automatic schedule for transaction
log backups.

This provides the following advantages:

• Allows point-in-time restores of SQL databases, also known as log replays.


• Since the automatic schedule uses a free space threshold, it ensures that the volume containing the
SQL logs does not fill up between VSA backups.

The schedule default setting can be modified as desired.

Commvault® Education Services Page 154 of 178


V11 SP18 Commvault® Engineer February 2020

Additional Application Protection Methods

Using Scripts to Quiesce Applications


Another method to protect virtualized application servers is to use scripts to quiesce application data prior to the snapshot
process. This method will require careful planning and communication with application administrators to ensure the
process works properly. Scripts can be created and placed on each virtual machine. You can use the Pre/Post Process
tab to insert scripts Pre-Snap and Post-Snap. In this case you can generate a Pre-Snap process script which will call local
scripts on each virtual machine in the Datastore defined by the subclient. The scripts will be used to quiesce application
data within each virtual machine. You can use a Post-Snap process script to unquiesce the applications once the snap
process completes.

Database Dumps
In many organizations, DBAs continue to rely and database dumps for their backups. Although this is not the most efficient
method of protecting databases and is not truly a backup, it does result in a consistent state dump of a production
database. If the dump files are being backed up, application aware restores can be conducted. This will require someone
with knowledge of the application in order to restore the database in an online state.

Application Data on Raw Device Mapping (RDM) Volumes


When the VSA agent protects VMware virtual machines it conducts software snapshots of VMDK files. It will not protect
any volumes using RDM. This can be used as an advantage when designing solutions for protecting large databases. A
VSA agent will be used to snap and backup the virtual disks as VMDK files but will skip RDM volumes. An application
agent can then be installed in the VM and subclients can be configured to protect databases on RDM volumes. The
application agent provides communication to provide application consistent point-in-time backup of application data.

Commvault® Education Services Page 155 of 178


V11 SP18 Commvault® Engineer February 2020

INTELLISNAP® TECHNOLOGY

Commvault® Education Services Page 156 of 178


V11 SP18 Commvault® Engineer February 2020

IntelliSnap® Technology Overview


Snapshots provide a method of snapping a view of the block structure of a disk to provide point-in-time revert, mount, or
restore operations as well as a consistent state of a disk structure for backup operations. Snapshots can be implemented
through hardware or software. Software snapshot technologies using Commvault® software include Microsoft® VSS and
Commvault block level backups.

Hardware based snapshot technology provides the ability to use optimized hardware and disk appliances to snap data on
disk arrays providing quick recovery by reverting or mounting the snapshots. This protection method significantly reduces
protection and recovery times while requiring minimal additional disk storage to maintain snaps. Since minimal storage is
required to hold snapshots, they can frequently be conducted to provide multiple recovery points to minimize potential
data loss. Snapshot technology can also be used to snap and replicate data to additional disk storage using minimal
bandwidth, providing physical data separation and a complete disaster recovery solution.

Technology is rapidly evolving, and more capabilities are being added to snap hardware with every new generation.
However, hardware-based snapshot technologies without an enterprise data protection software to manage the snaps
have several disadvantages. IntelliSnap® Technology overcomes these limitations by providing a single interface to
conduct, manage, revert, and backup snapshots.

The following lists the key highlights for the IntelliSnap feature:

• Full Application Awareness – By using Commvault agents to communicate with hosting


applications, application consistent snapshots can be performed. The application agent
communicates with the hosting application to quiesce databases prior to the snap occurring. This is
a significant benefit when protecting large databases where traditional backup methods are not
adequate to meet protection windows.
• Snapshot backups to reclaim disk cache space – By managing the snapshots, Commvault
software can also be used to back up the snapped data. As older snapshots are backed up to
protected storage, the snaps can be released on the source disk and the space can be freed for new
snap operations.

• Granular recovery - Snapshots can be mounted for Live Browse and indexed during backup
operations for granular recovery of objects within the snap. Whether using live browse or a restore
from a backup, the method to restore the data is consistent. Using the proper iDataAgent you can
browse the snapped data and select objects for recovery. This process is especially useful when
multiple databases or virtual machines are in the same snap and a full revert cannot be done. In
this case, just the objects required for recovery can be selected and restored.
• Clone support – Commvault software supports clone, mirror and vault capabilities for certain
hardware vendors and is adding support for additional vendors as its software continues to evolve.
• Simplified management – Multiple hardware vendors supported by the IntelliSnap feature can all
be managed through the Commvault interface. Little additional training is involved since the same
subclient and storage policy strategies used for backing up data are extended when using
snapshots. Just a few additional settings are configured to enable snapshots within the CommCell ®
environment.

Commvault® Education Services Page 157 of 178


V11 SP18 Commvault® Engineer February 2020

The IntelliSnap feature is rapidly evolving to incorporate increased capabilities as well as expanded
hardware support. Check Commvault documentation for a current list of supported features,
applications and vendors.

Copy on Write
The copy on write method uses snapshots to gather reference markers for blocks on the snapped volume. A ‘copy on write
(COW)’ cache is created which caches the original blocks when the blocks are overwritten. This requires a readwrite-write
operation to complete. When a block update of a snapped volume is required, the original block is read from the source
volume. Next the original block is written to the cache location. Once the original block has been cached, the new block is
committed to the production volume overwriting the original block. This method has the advantage of keeping production
blocks contingent in the volume which provides faster read access. The disadvantage is the read-write-write processes
increases I/O load on the disks.

Allocate on Write (Write Optimized)


Allocate on write uses additional space on a volume to write update blocks when the original block is modified. In this
case, the original block remains in place and the new block is written to another section of the volume. Markers are used
to reference the new block for read requests of the production data. This has an advantage over copy on write in that
there is only a single write operation decreasing I/O load on the disks. The disadvantage is that over time, higher
fragmentation may exist on the volume.

IntelliSnap® for VSA


IntelliSnap technology can integrate with the Commvault® Virtual Server Agent (VSA to provide hardware and software
integration for conducting volume snapshots, managing the retention for the snapshots, and backing up virtual machines
from the array. In large virtual environments, this integration can provide greater scalability by using hardware snapshots
for mission critical and high transactional virtual machines.

The IntelliSnap for VSA feature provides the following benefits:


Commvault® Education Services Page 158 of 178
V11 SP18 Commvault® Engineer February 2020

• Fast hardware snapshots result in shortened VM quiesce times and faster software snapshot
deletes. This is ideal for high transaction virtual machines.
• Live browse feature allows administrators to seamlessly mount and browse contents of virtual
machines for file and folder based recovery.
• Revert operations can be conducted in the event of DataStore corruption. For NetApp arrays,
individual virtual machine reverts can also be conducted.
• Hardware snapshots can be mounted to an ESX proxy server for streaming backup operations
eliminating the data movement load on production ESX hosts.

IntelliSnap for VSA Snap Process


The IntelliSnap feature requires several components to operate. A VSA agent is installed on a physical or virtual proxy
host. The VSA proxy communicates with the hypervisor to coordinate VM quiescing and snapshot operations. For backup
operations, an ESX proxy server is recommended to mount the snapped volume to create backup copies of the VMs. For
hardware vendors supported by IntelliSnap technology, the VSA can be installed on a virtual machine running on the ESX
proxy server. When using a physical VSA proxy, MediaAgent software can be installed on the proxy to provide LAN free
backups.

The IntelliSnap for VSA snap and backup process uses the following steps:

1. VSA communicates with the hypervisor to locate VMs and initiate snap operations.
2. The hypervisor quiesces virtual machines listed in the subclient contents.
3. Hypervisor initiates software snapshots.
4. The IntelliSnap feature uses MediaAgent processes to initiate a hardware snapshot of the volume.
5. Once the snapshot is complete, the VSA proxy communicates with the hypervisor to remove the
software snapshots.
6. VMs are mounted to a hypervisor proxy for backup operations.

Commvault® Education Services Page 159 of 178


V11 SP18 Commvault® Engineer February 2020

Block Level Backups


Commvault® software provides block level protection for several agents. Block level backups work just like hardware
snapshots but use VSS snapshots and Commvault block tracking technology to coordinate and manage snap operations.
Like a hardware snap using IntelliSnap technology, the application agent and MediaAgent software must be installed on
the client. Block level backups must be enabled in the subclient and the entire volume or database must be defined within
the subclient contents.

Block Level Use Cases

File System
File system block level backups are used to protect large volumes where the number of objects in the volume make it
impractical to conduct traditional indexed based backups which require a scan, backup, and index phase to complete.

Exchange Database
Exchange database block level backups are used to conduct database mining for mailbox data without requiring a staging
area for the database. Since the block level backup appears to Commvault software as a snapshot, it can be mounted and
mined directly from the Content Store.

MySQL and PostgreSQL Databases


MySQL and PostgreSQL database block level backups are used to conduct database mining for tables restore. Same as
Exchange block level backups, MySQL and PostgreSQL block level backups appear to Commvault software as a
snapshot and can be mounted and mined directly from the Content Store.

How Block Level Backups Work


1. The application or file system agent quiesces the data.
2. A VSS snapshot is taken on the volume or database.

Commvault® Education Services Page 160 of 178


V11 SP18 Commvault® Engineer February 2020

3. The primary snap copy of the storage policy manages the snap.
4. A backup copy operation is used to copy the snapshot to protected storage.
5. The VSS snapshot is released

Commvault® Education Services Page 161 of 178


V11 SP18 Commvault® Engineer February 2020

IntelliSnap® Configuration

Array Configuration
Hardware arrays are configured from the Array Management applet which can be accessed from the Control Panel or from
the Manage Array button in the subclient. All configured arrays will be displayed in the Array Management window.
Multiple arrays can be configured, each with their specific credentials. For some arrays, a Snap Configuration tab is
available to further customize the array options.

Storage Policy Configuration


Storage Policies can be used to manage both traditional data protection operations and snapshot operations. A Storage
Policy can have a primary (classic) copy and one or more snap copies.

A primary snap copy can be added for any Storage Policy by right-clicking the policy. Select All Tasks and then Create
New Snapshot Copy. The copy can be given a name, define a data path location to maintain indexing data, and retention
settings can be configured.

Retention can be configured to maintain a specific number of snapshots, retain by days or retain by cycles. Note that if the
days or cycles criteria is going to be used, it is critical have a complete understanding of how days and cycles criteria
operate.

Snapshot Retention
Just like traditional protection methods, storage policies are used to manage the retention of snapshots. There are three
methods retention can be configured for snapshot data:

• Retain snaps by the number of jobs


• Days retention

Commvault® Education Services Page 162 of 178


V11 SP18 Commvault® Engineer February 2020

• Cycles retention
It is important to note that although snap operations can be scheduled as full, incremental or differential, a snapshot will
always be the same. The type of backup is in fact applied to the subsequent snap backup copy job, which copies the
content of the snapshot to Commvault® storage. For instance, if an incremental job was selected, only changes since the
last snap backup copy job are sent to the Commvault library.

Retain Snaps by Number of Jobs


This feature allows the retention to be based on the number of snap jobs that are conducted. The number of snapshots
that can be retained is based on the incremental block change rate and the amount of snap cache space available.

Days Retention
The days retention rule determines how many days of snapshots are retained. Careful planning should be done before
configuring the number of days for snap retention to ensure there is adequate disk cache space. This factor is determined
by the number of snaps performed and the incremental block change rates. Performing hourly snapshots with a high
incremental change rate and a two day retention may require more cache space then performing daily snapshots with low
change rates and a seven day retention.

Cycles Retention
The days retention rule determines how many days of snapshots are retained. Careful planning should be done before
configuring the number of days for snap retention to ensure there is adequate disk cache space. This factor is determined
by the number of snaps performed and the incremental block change rates. Performing hourly snapshots with a high
incremental change rate and a two day retention may require more cache space then performing daily snapshots with low
change rates and a seven day retention.

Retention Dependencies
Cycles can also be used to manage snapshots. When using this option, it is important to ensure backup copies are
properly running to protect all full and incremental jobs. When using cycles to define snapshot retention, the basic
retention rules of cycles applies just as if a backup operation was conducted. This means that if the cycles criteria is set to
two, then a third full snapshot needs to run before the first full snap and any incremental or differential snaps will be
released from disks.

Subclient Configuration
To protect production data using IntelliSnap technology, the client must be enabled for the IntelliSnap feature and a
subclient must be configured defining the content to be snapped and the IntelliSnap feature must be enabled for the
subclient.

To enable the IntelliSnap feature for the client: select the client properties, click the Advanced button and check the Enable
IntelliSnap option.

Once the IntelliSnap feature has been enabled for the client the IntelliSnap tab will be used to enable snapshot operations.
Enabling the IntelliSnap check box designates the contents of the subclient to be snapped when schedules for the
subclient are executed. The snap engine must be selected from the drop-down box. Use the Manage Array button to
configure a new array, if one has not already been configured. A specific proxy can be designated for backup copy
operations. This proxy must have the appropriate software and hardware configurations to conduct the backup copies.
Refer to Commvault's documentation for specific hardware and software requirements for the array and application data
that is being snapped.

Once IntelliSnap operations have been configured for the subclient, ensure the subclient is associated with a snap
enabled Storage Policy.

When defining content for the subclient, ensure that only data sitting on the array volume is defined, since no snapshot
can be conducted on data outside of the array.

When defining content for the subclient, ensure that only data sitting on the array volume is defined, since no snapshot
can be conducted on data outside of the array.
Commvault® Education Services Page 163 of 178
V11 SP18 Commvault® Engineer February 2020

Commvault® Education Services Page 164 of 178


V11 SP18 Commvault® Engineer February 2020

PERFORMANCE

Commvault® Education Services Page 165 of 178


V11 SP18 Commvault® Engineer February 2020

Performance Overview
Commvault® software is a high-performance solution for protecting all data in any environment within defined protection
windows. The software also provides many settings to improve performance. Before considering tuning Commvault
software, it is important to understand capabilities and limitations of all hardware and software deployed within an
environment.

There is no such thing as a static data center. Network infrastructures are constantly changing, new servers are added,
mission critical business systems are moving to hybrid cloud, or public cloud infrastructures. Before considering
Commvault tunables, it is first important to understand your environment including the capabilities and limitations of the
infrastructure; specifically the ability to transfer large amounts of data of production or backup networks.

When making modifications to an environment, changes that may positively impact one aspect of the environment can
negatively affect another aspect. This is also true about Commvault settings. For example, enabling multiplexing when
writing to tape drive can improve backup speeds. However, it may have a negative impact on restores if dissimilar data
types are multiplexed to the same tape. Another example is using Commvault deduplication and setting a high number of
data streams. Since client side deduplication is being used, there will be a low impact to the network. But if the
deduplication database needs to be sealed, the next set of backup operations may result in oversaturating the network
while re-baselining blocks in storage.

Performance Benchmarks
Benchmarks can be divided into two kinds, component and system. Component benchmarks measure the performance of
specific parts of a process, such as the network, tape or hard disk drive, while system benchmarks typically measure the
performance of the entire process end-to-end.

Establishing a benchmark focuses your performance tuning and quantifies the effects of your efforts. Building a
benchmark is made up of the following 5 steps:

• Understand the process


• Identify the resources involved
• Minimize outside influence
• Periodic test
• Write it down
Understand the process
You can’t document or improve something if you don’t know what’s going on. More importantly, you need to understand
what phases a job goes through and how much each phase affects the overall outcome.

For example, a backup job over a network to a tape library takes two hours to complete. You think it should take a lot less
and you spend time, effort, and money to improve your network and tape drives and parallel the movement of data. The
job now takes 1.8 hours to complete. You gained a 10% improvement.

Looking at the job in more detail we find that the scan phase of the job is taking 1.5 hours and the rest is the actual data
movement. Switching the scan method reduces the scan phase time to 12 minutes. The job now takes .4 hours. You
gained a 78% improvement.

Knowing what phases a job goes through and how much each phase impacts the overall performance can help you focus
your time, effort, and money on the real problems.

Identify the resources involved

Commvault® Education Services Page 166 of 178


V11 SP18 Commvault® Engineer February 2020

Each hardware component is going to have a theoretical performance limit and a practical one. Attempting to get
improvement beyond these limits without changing the resources involved is a waste of time. Consider using newer vs.
older technologies, such as tape drives.

Minimize outside influence

Large data movements are usually done during non-production hours for two reasons – one, they can degrade production
work, and two, production work can degrade the movement of data. You want to minimize competition for resources to get
a fair benchmark of what performance is achievable. In those cases, where competition cannot be eliminated, you must
accept the impact to performance or invest in more resources.

Periodic Test

A single measurement is not a benchmark. Tape devices have burst speeds that are not sustainable over the long run.
Networks have various degrees of bandwidth availability over a period of time. A single snapshot check of bandwidth will
not give you a realistic expectation. Do periodic testing over the actual usage of a resource to determine its average
performance. Try to level out the peaks and valleys - or at least try to identify what causes these variations.

Multiple measurements scattered over a day can also help in establishing if an unexpected external process is impacting
the environment. For example, if you have a database server that is slowly backing up at night, but when you sample
during the day, it is achieving expected performances, you can suspect an external process impacting the backup, such
as a database administrator dumping the database and copying it to another server at the same time in this example.

Write it down

The hardest lessons are the ones you must learn twice. Once you’ve established your acceptable and/or expected
performance levels for each resource and end-to-end, write them down and use them as the baseline for comparing future
performance.

Environment Considerations
Before modifying Commvault® software settings to improve performance, consider environmental capabilities and
limitations. Ensure the environment is optimized to the best of your team’s abilities. Commvault software can move data at
high rates of speed, but it will ultimately be limited by bottlenecks on servers and network devices.

TCP/IP
TCP/IP is the most common network transmission protocol. Factors that can degrade TCP/IP performance are:

• Latency - Packet retransmissions over distance take longer and negatively impact overall
throughput for a transmission path.

• Concurrency - TCP/IP was intended to provide multiple users with a shared transmission media. For
a single user, it is an extremely inefficient means to move data.

• Line Quality - Transmission packet sizes are negotiated between sender/receiver based on line
quality. A poor line connection can degrade a single link’s performance.

• Duplex setting - Automatic detection of connection speed and duplex setting can result in a half-
duplex connection. Full duplex is needed for best performance.

• Switches - Each switch in the data path is a potential performance degrader if not properly
configured.
• Firewalls – Firewall is the first line of defense against hackers, malware, and viruses. There are
hardware firewall appliances and software firewalls, such as operating system firewalls. Firewalls
can have minor to moderate impacts on transfer performances.

Commvault® Education Services Page 167 of 178


V11 SP18 Commvault® Engineer February 2020

SCSI/RAID
SCSI is the most common device protocol used and provides the highest direct connection speed. An individual SCSI
drive’s speed is determined by spindle speed, access time, latency, and buffer. Overall SCSI throughput is also
dependent on how many devices are on the controller and in what type of configuration. The limitation of SCSI is the
distance between devices and the number of devices per controller.

• RAID arrays extend the single addressable capacity and random access performance of a set of
disks. The fundamental difference between reading and writing under RAID is this: when you write
data in a redundant environment, you must access every place where that data is stored; when you
read the data back, you only need to read the minimum amount of data necessary to retrieve the
actual data--the redundant information does not need to be accessed on a read. Basically – writes
are slower than reads.

• RAID 0 (striping) or RAID 1 (mirror) or RAID 1+0 with narrow striping are the fastest configurations
when it comes to sequential write performance. Wider striping is better for concurrent use. A RAID 5
configured array can have poor write performance. The tradeoff in slower write performance is
redundancy should a disk fail.

Fine tuning a RAID controller for sequential read/write may be counterproductive to concurrent
read/write. If backup/archive performance is an issue, a compromise must be arranged.

iSCSI/Fibre Channel
iSCSI or Fibre Channel protocol (FCP) is essentially serial SCSI with increased distance and device support. SCSI
commands and data are assembled into packets and transmitted to devices where the SCSI command is assembled and
executed. Both protocols are more efficient than TCP/IP. FCP has slightly better statistics than iSCSI for moving data.
Performance tuning is usually setting the correct ‘Host Bust Adapter’ configuration (as recommended by the vendor for
sequential I/O) or hardware mismatch. Best performance is achieved when the hardware involved is from the same
vendor. Given that configuration and hardware is optimum, then for both iSCSI and FCP, performance is inhibited only by
available server CPU resources.

Disk I/O
Performing I/O to disks is a slow process because disks are physical devices that require time to move the heads to the
correct position on the disk before reading or writing. This re-positioning of the head is exacerbated by having many files
or having fragmented files. You can significantly improve read performance of the source data by de-fragmenting the data
on a regular basis.

Anti-Virus
Anti-viruses are intelligent software protecting a system against corrupted data by periodically scanning files systems and
ensuring that every file accessed or opened by any processes running on the system is a legitimate file (and not a virus).
You can easily imagine that when a backup runs and protects every system files, the anti-virus validation significantly
decrease backup performances. It might also access and lock Commvault files, such as log files. It is recommended on all
systems on which Commvault software is installed, to add exclusions to the anti-virus software for Commvault® software
folders, so that when Commvault related processes are in action, they do not trigger the anti-virus validation process.

Commvault® Education Services Page 168 of 178


V11 SP18 Commvault® Engineer February 2020

Stream Management
Data Streams are used to move data from source to destination. The source can be production data or Commvault
protected data. A destination stream will always move to Commvault protected storage. Understanding the data stream
concept will allow a CommCell® environment to be optimally configured to meet protection and recovery windows.

Stream settings are configured in various places within the CommCell® console including the storage policy, MediaAgent,
subclient, and library. The system always uses the lowest setting. If a MediaAgent is configured to receive as many as 100
streams and one storage policy is writing through the MediaAgent and is configured to use 50 streams, then only 50
streams will be sent through the MediaAgent.

During a data protection job, streams originate at the source file or application that is being protected. One or more read
operations is used to read the source data. The number of read operations is determined by the number of subclients and
within each subclient, the number of data readers or data streams, depending on which agent is managing the data. Once
the data is read from the source it is processed by the agent and then sent to the MediaAgent as job streams. The
MediaAgent then processes the data, arranges the data into chunks and writes the data to storage as device streams.
The data is written to storage based on the number of writers, for a disk library, or devices (tape drives) for a tape library.

Stream Settings Summary Table


Features & Description
Functionality

Subclients  Subclients are independent jobs, meaning each subclient will have one or more
streams associated with each job.
Multi-stream • Most subclients can be multi-streamed. For subclients that do not support multiple
subclients streams, multiple subclients are used to multi-stream data protection jobs.
• Data readers are configured in the General tab of the subclient.
• Data Streams are configured in the storage device tab for MS-SQL and Oracle
subclients.

Commvault® Education Services Page 169 of 178


V11 SP18 Commvault® Engineer February 2020

Non-Subclient  Agents such as the new Exchange Mailbox agent manage streams at the object level.
based agents For Exchange, each mailbox is protected as a single stream.
 The default subclient data readers setting is still used as the primary stream governor
for the maximum number of concurrent objects that can be protected.
Job streams are active network streams moving from source (client or MediaAgent) to
Job Streams 
destination (MediaAgent).
 The Job controller shows the total number of job streams currently in use in the
bottom of the window and the job stream ‘high watermark’ for the CommCell
environment.
 Add the ‘Number of Readers in Use’ field in the job controller to view the number of
streams being used for each active job.
Device Streams  Configured in the Storage Policy properties.
 Determines how many concurrent write operations will be performed to a library. This
number should be set to equal the number of drives or writers in the library to
maximize throughput.
 Multiplexing is used to consolidate multiple job streams into single device streams.

Drives  For a removable media library writing data sequentially to devices, there will be one
device stream per drive.
Writers  For a disk library where random read/write operations can be performed the number
of writers should be set to allow the maximum throughput without creating bottlenecks
in your network, MediaAgents, or disks.

Commvault® Education Services Page 170 of 178


V11 SP18 Commvault® Engineer February 2020

Meeting Protection Windows


It is critical to meet data protection windows. If windows are not being met then restore windows may not be met. If data is
scheduled to go off-site daily but it takes four days to back up the data, then the data cannot be sent off-site until the job
completes.

If you are currently meeting protection windows, then there is no need to modify anything. Improving windows from six to
four hours when your window is eight hours just creates more work and a more complex environment. The following
recommendations are intended to improve performance when protection windows are NOT being met.

Storage policy settings and modification to help meet protection windows:


• Device Streams – Increase device streams to allow for more concurrent jobs streams to write if
adequate resources are available.
o MediaAgent – ensure MediaAgent is properly scaled to accommodate higher stream
concurrency. o Network – ensure network bandwidth can manage higher traffic.

o Disk Library (not using Commvault deduplication) – ensure the library can handle higher
number of write operations. Increase the number of mount path writers so the total number
of writers across all mount paths equals the number of device streams.
o Disk Library (with Commvault deduplication) – if not using Client Side Deduplication enable
it. Each deduplication database can manage 50 or more concurrent streams. If using Client
Side Deduplication, after the initial full is complete most data processing will be done locally
on each Client. This means minimum bandwidth, MediaAgent, and disk resource will be
required for data protection operations.
• Tape Library – If tape write speeds are slow, enable multiplexing. Note: enabling multiplexing can
have a positive effect on data protection jobs but may have a negative effect on restore and
auxiliary copy performance.

Commvault® Education Services Page 171 of 178


V11 SP18 Commvault® Engineer February 2020

• Commvault Deduplication:
o Ensure the deduplication database is on high speed disks. Use the SIDB2 utility tool to
simulate database performance before implementing. Check Commvault ® documentation on
guidance for using this tool. o For Primary backups use Client Side Deduplication and DASH
Full backups.
o For Secondary copies use DASH Copy backups to a destination disk target enabled for
deduplication.

• Data path property settings:


o Increase chunk size to improve performance.

o Increase block size to improve performance. Note: block size is hardware dependent. Before
changing the block size ensure all NICs, HBAs. Switches, routers, MediaAgent OS, and
storage devices at your primary and alternate sites (including DR sites) support the block
size setting.

Subclient settings and modifications to help meet protection windows:

General recommendations:
• Ensure all data is properly being filtered. Use the job history for the client to obtain a list of all
objects being protected. View the failed items at the log to determine if files are being skipped
because they are open or if they existed at time of scan and not at time of backup. This is common
with temp files. Filters should be set to eliminate failed objects as much as possible.
• For file systems and application with granular object access (Exchange, Domino, SharePoint)
consider using data archiving. This will move older and infrequently accessed data to protected
storage which will reduce backup and recovery windows.

File Backup recommendations:


• For backups on Windows operating systems ensure source disks are defragmented.
• Ensure all global and local filters are properly configured.
• If source data is on multiple physical drives increase the number of data readers to multi-stream
protection jobs.
• If source data is on a RAID volume, create subclient(s) for the volume and increase the number of
data readers to improve performance. Enable the Allow Multiple Data Readers within a Drive or
Mount Point option.
• For large volumes containing millions of objects:
o Consider using multiple subclients and stagger scheduling backup operations over a weekly
or even monthly time period.
o For supported hardware consider using the Commvault IntelliSnap feature to snap and
backup volumes using a MediaAgent proxy server.
o Consider using File System Block Level Backup.

Database applications
• For large databases that are being dumped by application administrators consider using Commvault
database agents to provide multi-streamed backup and restores.
• When using Commvault database agents for instances with multiple databases consider creating
multiple subclients to manage databases.

Commvault® Education Services Page 172 of 178


V11 SP18 Commvault® Engineer February 2020

• For large databases consider increasing the number of data streams for backing up database. Note:
For multistreamed subclient backups of SQL and Sybase databases, the streams cannot be
multiplexed. During auxiliary copy operations to tape if the streams are combined to a tape they
must be pre-staged to a secondary disk target before they can be restored.
• For MS-SQL databases using file/folder groups, separate subclients can be configured to manage
databases and file/folder groups.

Virtual Machine Backups


• General Guidelines
o Consider using the Commvault® Virtual Server Agent (VSA).

o Determine which virtual machines DO NOT require protection and do not back them up.

• When using VSA agent to protect VMware environment:


o It is preferred to use physical VSA MediaAgent proxies versus virtual server MA proxies. o
Ensure enough proxies are being used to handle load. o Use Commvault Client Side
Deduplication and DASH Full backups.
o The data readers setting determines the number of simultaneous snap and backup
operations that will be performed. Increase this number to improve performance. NOTE:
ensure disks where virtual machines are stored can handle the number of concurrent
snapshots or the snapshot process may fail.
• When using file system agents in virtual machines:
o Consider having a base VM image that will be used to recreate the virtual machine. Use the
default subclient filters to filter out any volumes and folders that do not require protection.
Note: It is STRONGLY NOT recommended to alter the contents of the default subclient. If you
explicitly map default subclient data the auto-detect feature will be disabled. This means
any new volumes added to the machine will have to be explicitly added to the content of the
subclient.
• When protecting applications in VMware environment:
o Use application agents inside the VMs. It is strongly recommended NOT to perform VSA
crash consistent backups of application database data.
o Consider the pros and cons of using Commvault compression and client side deduplication.
Using application level compression may have a better compression ratio but deduplication
efficiency can suffer.
• Commvault IntelliSnap for VSA:
o Define subclients by DataStore affinity. When hardware snaps are performed the entire
DataStore is snapped regardless of whether the VM is being backed up.
o For smaller Exchange or MS-SQL databases (less than 500GB), application consistent
snapshots can be performed using the IntelliSnap feature and VSA.
o For large databases, install the application agent in the VM and configure the IntelliSnap
options in the subclient. Hardware snapshots will be performed at the database level
providing better scalability and application awareness.

Commvault® Education Services Page 173 of 178


V11 SP18 Commvault® Engineer February 2020

Meeting Media Management Requirements


Media management is critical to ensure all data is properly protected. Improper use of media can lead to insufficient
storage for new jobs and could potentially cost your company lots of money. The following section focuses on methods to
improve media management.

Considerations to meet media management requirements:


• Define subclient content specifically to data requiring longer retentions. Use subclient associations
to associate subclients to the policy copy with the proper retention requirements. If a server
contains 500 GB of data and 50 GB needs to be kept for five years, it doesn’t make sense to keep all
500 GB of data for five years.
• Considerations for tape media:
o Use the Combine to Stream option to consolidate streams to fewer media.

o Consider consolidating the number of storage policies. Each policy copy will manage its own
set of media. The more policies and policy copies you have, the more tapes you will need to
manage data for all copies.
o If most jobs on a tape have aged but a few jobs have not, the tape will not recycle. Use the
media refresh option to copy un-aged jobs to new media so the tape can recycle.
• Considerations for Disk Usage:
o Use Commvault deduplication.

o Do not use extended retention on a deduplication enabled copy.

Commvault® Education Services Page 174 of 178


V11 SP18 Commvault® Engineer February 2020

Meeting Restore Requirements


The only reason we backup is to restore. Recovery windows (RTO) should be established for different data types. Test
restores should be performed to determine whether windows can be met. If recovery windows are not being met then
adjustments can be made to improve restore performance. Recovery windows can greatly be affected by the level of
disaster or disruption that may occur. The Service Level Agreements (SLA) regarding recovery windows is based on two
key requirements:

• Recovery Time Objective (RTO) is the time to recover a business system after a disruption or
disaster.
• Recovery Point Objective (RPO) is the time interval in which recovery points are created. Each
recovery point is created by a backup, snapshot or replication interval. The RPO corresponds to the
acceptable amount of data loss a business system can tolerate.
RTO and RPO should not have a single requirement. Different types of disruptions or disasters will have an impact on time
to recover and data loss that may occur. Consider these four basic levels of disaster and how they may affect recovery
objectives:

• Disruption of business system: This can affect a single system such as database or Email where end
users can still function but will not have access to the system. High availability solutions such as
clustering, virtualization, data mirroring, and data replication should be considered. For critical
business systems a disruption of a business system should have very short RTO & RPO
requirements defined.
• Limited site disaster: This may affect the datacenter, routers, switches, or other components that
can have a larger effect on end users ability to perform their jobs. Consider the classic air
conditioner leak that may force power to be cut or systems shut down. Users can still have access
to facilities but their ability to access business systems may be down for longer periods of time. In
this case the RTO may be defined higher, but RPO should still be relatively low.

Commvault® Education Services Page 175 of 178


V11 SP18 Commvault® Engineer February 2020

• Site Disaster: This will force the shutdown of the entire building. End users can work from home or
take the day off. This scenario can be quite difficult to define accurate RTO and RPO requirements
since the disaster site may be a result of circumstances beyond your control. Consider a gas pipe
leak which forces power to be cut from the building for safety reasons. Power being restored to the
building is out of your hands. This is a strong reason to have an active DR facility. In this case the
RTO and RPO would be based on the readiness and availability of equipment at the DR facility and
the frequency in which data is sent there.
• Regional Disaster: Major regional disasters can have a large impact on a business’s ability to
continue. This scenario will not only affect the IT department’s ability to restore services but can
also have an impact on the users ability to access the services. A DR facility is a requirement for a
regional disaster and it should be located at a proper distance based on perceived risk of the types
of disaster that may occur. The bigger picture here is Business Continuity as it will be more on
management to ensure the continuation of business.
Within the Commvault® software suite there are methods that RTO and RPO can be improved. The following section
explains some of the ways you can configure your CommCell environment to improve recovery windows.

Considerations to meet restore requirements


• Test restores to establish realistic restore windows. This is especially important when using
Commvault Client Side Deduplication. Backups may run considerably faster than restore operations.
Establish benchmarks and determine what can be changed to improve recovery windows.
• Considerations for tape media:
o If streams from different data sets are multiplexed or combined to a tape, only one data set
can be restored at a time. Consider isolating different data set streams to different media
using separate secondary copies for each data set and using the combine to streams option.
o For large amounts of data that are being multi-streamed during backups, do not multiplex or
combine the streams to tape. If the streams are on separate tapes the Restore by Job option
can be used to multistream restore operations improving performance.
• Considerations for disk media:
o When using Commvault deduplication use the minimum recommended 128 k block size.
Small block sizes will result in heavier data fragmentation on disk which can reduce restore
performance.
• Improving Recovery Time Objectives (RTO):
o Filter out data that is not required for data protection operations. The less you back up the
less you have to restore.
o Strongly consider data archiving. It will improve backup and restore performance. Note that
deduplication will improve backups and reduce storage requirements it can actually have a
negative effect on restore performance.
o If a subclient job was multi-streamed you can restore it using multiple streams through the
Restore by Job option.
o Consider assigning different RTOs for different business data. It is not always about restoring
everything. Consider a database server with five databases. Each one can be defined in a
separate subclient. This will allow each database to have a separate RTO so they can be
recovered by priority.
• Improving Recovery Point Objectives (RPO):
o Run point-in-time backups such as incremental or transaction logs more frequently for
shorter RPO.

Commvault® Education Services Page 176 of 178


V11 SP18 Commvault® Engineer February 2020

o Consider prioritizing data for RPO requirements and define the data as a separate subclient
and assign separate schedules. For example, a critical database with frequent changes can
be configured in a separate subclient and scheduled to run transaction logs every fifteen
minutes. To provide short off-site RPO windows consider running synchronous copies with
the automatic schedule enabled.
o Consider using hardware snapshots with the Commvault IntelliSnap feature to manage and
backup snapshots.

Commvault® Education Services Page 177 of 178


V11 SP18 Commvault® Engineer February 2020

COMMVAULT.COM | 888.746.3849 | EA.COMMVAULT.COM

©2020 COMMVAULT SYSTEMS, INC. ALL RIGHTS RESERVED.

You might also like