0% found this document useful (0 votes)
30 views

NetBackup Training Module1

The document provides an overview and agenda for a NetBackup training module. It will cover topics such as client and policy configuration, troubleshooting, and a brief history of NetBackup. The training assumes familiarity with NetBackup and will demonstrate how to manage components like jobs, storage, media, and clients using the NetBackup Administration Console.

Uploaded by

Mehves Demirel
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

NetBackup Training Module1

The document provides an overview and agenda for a NetBackup training module. It will cover topics such as client and policy configuration, troubleshooting, and a brief history of NetBackup. The training assumes familiarity with NetBackup and will demonstrate how to manage components like jobs, storage, media, and clients using the NetBackup Administration Console.

Uploaded by

Mehves Demirel
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 35

NetBackup Training

KEEPING PEOPLE AND INFORMATION CONNECTED.®

Module 1:
Brief Overview, Client/Policy Configuration,
Troubleshooting
For Internal SunGard Use Only
Agenda
 Introduction
 Purpose & Assumptions
 History
 Terminology and Concepts
 Architecture
 Standards
 How Backups Work
 Managing NetBackup
 Client Implementation and Configuration
 Policies
 Troubleshooting
 Reporting
 Monitoring Overall Environment
 Shutdown/Restart NetBackup
 Tips and Tricks
 Education/Further Reading
 Q&A
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Purpose and Assumptions

 Purpose
– Increase knowledge of NetBackup product
 Assumptions
– Presentation assumes 6.5.3
– Vague familiarity of NetBackup
– Know how to access environments
– Windows and/or Unix admin experience
– Please write down your questions for the Q&A session at
the end

KEEPING PEOPLE AND INFORMATION CONNECTED ®


History
Corporate
 1987 - proprietary software solution written by engineers at Control
Data for Chrysler Corp.
 1993 - renamed to BackupPlus (‘bp’ prefix)
 Late 1993 - OpenVision acquisition (/usr/openv/ install path) and re-
branded product “NetBackup”
 1997 - Veritas acquired OpenVision
 2005 - Symantec acquired Veritas
Version
 1993 – BackupPlus 1.0 (Control Data)
 1994 – NetBackup 1.6 (OpenVision)
 1996 – NetBackup 2.0
 1997 – NetBackup 3.0 (Veritas)
 2000 – NetBackup 3.4
 2002 – NetBackup 4.5
 2003 – NetBackup 5.0
 2005 – NetBackup 6.0 (Symantec)
 2007 – NetBackup 6.5
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Terminology and Concepts
 Master Server – brains of the operation, houses catalog
 Media Server – where storage units exist, pushes data
 Client – device providing data to be backed up
 Enterprise Media Manager (EMM) – manages device and media information;
typically installed on Master
 Catalog – database of backup images and other information
 Metadata – info of files backed up (name, path, size, date, image location, etc.)
 Duration - time it takes to perform the backup
 Exit Code – final status of job
– 0 = Successful with NO files missed
– 1 = Successful with files missed
– 2+ = Backup Failed
 Start Window – time when a backup can START
 Frequency – how often the backup should execute
 Retention – length of time backups are valid
 Policy – grouping of like clients sharing similar attributes

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Terminology and Concepts continued
 Schedule – subset of Policy, defines Start Window, retention,
storage unit, etc.
 Storage Unit – location defined to store backups, can be disk/tape,
exist only on a Media Server
 Backup Image – one backup job comprised of all files backed up;
job must complete
 Disk Storage – primary landing zone for jobs; destage to tape later;
removes older images as needed; can be configured many ways,
current standard is Basic Disk; optional
 Multiplexing – interleaving of multiple jobs on tape to prevent ‘shoe-
shining’
 Long Term Data Retention – utilizes media and marginally increases
catalog size; non-issue
 Dependent on proper forward and reverse lookups
 Scaling – horizontally by adding more media servers
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Terminology and Concepts continued
 Files to Backup – Part of policy config
– Exclude List – files to skip
– Include List – files to include after processing excludes
– Additional config on client; granular to policy or schedule level; no stacking
 Backup Type
– Full – all files captured
– Differential Incremental – all changes since last backup
– Cumulative Incremental – all changes since last full
– User – Allows user to run backups from client side; most used for child jobs
of DB Agents - “Default-Application-Backup”
 Database Agents
– Exchange, Notes, Oracle, SQL, SAP, etc.
 Options
– NDMP, Off-site Management (Vault), Tape/Disk Sharing, Bare Metal Restore,
Snapshot, VMWare etc.
 Licensing - gold key with many options; SunGard pays for ‘Protected Data’
 Recovery – restore catalog or import all images manually
KEEPING PEOPLE AND INFORMATION CONNECTED ®
NetBackup Tiered Architecture

Master Server (Top Tier)


Scheduler
Stores Catalog (Metadata,
Images), Volume Information
Vaulting Management
Media Server(s) (Mid Tier)
Data Mover
Sends Metadata to Master
Can be located on Master
Clients (Lower Tier)
Configured via GUI/Registry
(Win) or config files (*nix)

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Example NetBackup Architecture Diagram

Master Server

FC Switch
Fabric A
Meta-data Nework

Disk Storage

Media Server 1 Media Server 2 Media Server 3 Media Server N

Backup Network
Enterprise Class
FC Switch Tape Library
Fabric B

Client Hosts
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Standards
 Infrastructure
– Server/OS Types
 Unix – Solaris 10
 T2000
 Naming Standards Defined
 Network Configuration Standards (Metadata, backup, mgmt)
– Robot Types
 Quantum Scalar i2000
 STK SL8500
 Small Robots for legacy restores
– LTO3/4 Tape Drives / Media Types
– Volume Serial Numbers (VolSers/bar codes)
– SAN connectivity
– Disk Array Standards
– DSSU Configuration
 Application/Configuration
– Documented on LiveLink

KEEPING PEOPLE AND INFORMATION CONNECTED ®


How Backups Work (simplified)
 Scheduler on Master tells Media to backup its client
 Media server is granted storage unit resource (disk or tape)
 Media connects to client software and tells it to start
backing up
 Client creates list of files to backup
– Full – everything
– Differential – changes since last backup
– Cumulative – changes since last full
 Copies of files are sent to buffer
 Buffer contents sent to Media Server
 Media server writes buffer contents to storage unit
 Media server sends metadata to Master server to update
catalog
 Backup completes
 Storage unit resource released
 Backup image is completed and closed
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Managing NetBackup (Demonstration)
NBU Administration Console – 99.9% of daily administration occurs here
 Activity Monitor – Overall job status
– Jobs tab
 Job details
 State - Queued, Active, Partial, Failed
 Type – Backup, Restore, Catalog, Duplicate, Vault
 Status – Exit Code of job
– 0 = All files backed up, no problems
– 1 = Some files skipped (open/locked)
– >1 = Failure
 Additional info
 Suspend/kill jobs
 Sorting/Filtering - Be aware of any filters you have set
 Exporting
– Daemons tab
– Processes tab
– Help

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Managing NetBackup (cont’d)
 Storage
– Storage Units – defined target for backups (similar to storage pool in TSM)
– Disk or Tape
– Storage Unit Groups
 Media
– Volume Pools – logical grouping of tapes
 Various defined pools
 Scratch
 SG_SHARED_xxx
 Policy defines Volume Pool
– Volume Groups – locational grouping of tapes
 Robot groups
 Onsite group
 Offsite groups
 Vault moves media between volume pools
– Robots – media currently in robot
– Standalone – tapes no longer associated with robot/volume group
– Inventory Robot
– Ejecting media
– States – Active, Full, Frozen, Suspended, Imported
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Managing NetBackup (cont’d)
 Device Monitor
– Up/Down/Reset drive
 Devices
– Drives
– Robots
 SCSI Robots have single Control Host
 ACS any server can control
– Media Servers
– Topology

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Managing NetBackup (cont’d)
 Backup Archive Restore
– Used for restoring files
 Host Properties
– Master Server
– Media Servers
– Clients
 Include/Exclude Lists
 Server authorization
 Catalog
– Offline backup (legacy method)
– Import images
– Verify Images
– Duplicate images
 Reports
 Vault – option that processes and tracks volumes sent offsite

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Client Implementation and Configuration
 All systems
– Install client binaries
 Agents included for Windows, not for Unix
– Verify network communication
 Client configuration
– Unix
 Configuration files
– bp.conf
SERVER = backup01-dal Master Must be Listed First!
SERVER = backup02-dal
SERVER = backup03-dal
SERVER = backup0N-dal
CLIENT_NAME = jumpstart01-dal
– exclude_list and include_list
» exclude_list.policyName.scheduleName
» include_list.policyName.scheduleName
» Exclude/Include lists do not stack
– Windows
 Backup, Archive, Restore GUI or Registry
 Some configuration available from Admin Console>Host Properties>Clients
 Changing open file backup for Windows
– Demonstration of Windows client configuration

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Policies (Demonstration)
Policies - A backup policy allows the admin to configure how and when backups are to
be performed for a group of clients. This group of clients share similar backup
requirements (type, backup window, retention, etc.)

 Attributes
– Policy Type – Active/Inactive
– Destination
– Follow NFS
 Classification
 Storage Unit
– Cross mount points
 Volume Pool – Compression
– Check Points – Encryption
– Limit Jobs per Policy – Collect DR Info
– Job Priority – Allow Multiple Data Streams
– Media Owner – Keyword Phrase
– Snapshot Client

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Policies (cont’d)

– Destination
 Schedules – Multiple Copies
– Attributes Tab – Override Policy Storage
 Name – Override Policy Vol Pool
– Override Media Owner
 Type of Backup
– Retention
 Full, Incremental, Differential, – Media Multiplexing
Cumulative., User – Start Window Tab
 Synthetic  Defines when backup can
START
 Schedule Type
– Exclude Dates Tab
– Calendar Based  Defines when backup cannot
– Frequency Based run
– Calendar Schedule
 Only available when calendar
sched type chosen
 Retries allowed after runday
 Specific Days or Recurring
Days
– Summary of All Policies
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Policies (cont’d)

 Clients
– Know hardware/OS type
 Backup Selections – what to backup
– ALL_LOCAL_DRIVES
– System_State:\ or Shadow Copy Components:\
– NEW_STREAM for multistreaming
 Manual backups

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Troubleshooting

 MSS Document
 When in doubt, ASK!
 Windows client Troubleshooting

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Windows Clients

 Over 3000 servers across all environments


 77% of all servers
 85% of all failures

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Error Codes

 Media related (8x)


 Network Communication related (4x)
 Configuration/Hardware related (5x)
 Most Common Codes:
– 41, 196, 5x, 219, 13, 14, 2x

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Check the Simple Stuff
 Is Server On and Cabled
– Decommissioned
– Maintenance
 Hosts Files or DNS correct
– Host
– All backup servers
– All backup interfaces on backup servers
 Network
– Functional
– Routing
 Library/Media Problem
 Server Hardware
 Windows Event Log
 Correlation
 Telnet
– To Master/Media from Client
– To Client from Master/Media
– telnet <hostname> bpcd (or 13782)
– telnet <hostname> vnetd (or 13724)
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Check the Simple Stuff (cont’d)
 BPCLNTCMD
– Command Options
-sv – returns version of Master
5.1

-pn – communicates back to Master


expecting response from server backup01-dal
backup03-dal backup03-dal 10.229.133.233 56618

-self – returns info about local system


gethostname() returned: backup03-dal
host backup03-dal: backup03-dal at 10.229.133.233 (0xae585e9)
checkhname: aliases:

-hn <hostname> - returns info resolved from hostname


host backup01-dal: backup01-dal at 10.229.133.229 (0xae585e5)
checkhname: aliases:

-ip <IP address> - returns info resolved from IP


checkhaddr: host : backup01-dal: backup01-dal at 10.229.133.229 (0xae585e5)
checkhaddr: aliases:

-server <Master> - see –hn option

KEEPING PEOPLE AND INFORMATION CONNECTED ®


In Depth Client Troubleshooting

 Turn up logging on client


– Host properties or client BAR GUI
– Must have <install>\netbackup\logs\* dirs created
 Client Logs and Directories:
– bpbkar\<date>.log – Backup/Archive process
(BPBKAR32)
– bpcd\<date>.log – Client Daemon (BPCDW32)
– tar\<date>.log – Restores (TAR32)

KEEPING PEOPLE AND INFORMATION CONNECTED ®


In Depth Client Troubleshooting (cont’d)

 Run test backup/restore


 Examine logs after failure
 Logs structured as such:
00:00:03.125 [3652] <2> bpcd exit_bpcd: exit status 0 ------>exiting
09:55:33.941 [6092] <16> bpfsmap: ERR - open_snapdisk: NBU snapshot failed

 Search for <#> entries:


– <2>, <4>, <8>, <16>, <32>: <2>=informational and <32>=Critical Failure
 Search error message on Google and Symantec
 Test recommended solution
 Lather, rinse, repeat
 Last resort/time sensitive – open case with Symantec
– (800) 342-0652
– Customer Number 3680-5196-9875

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Example Log

 Error 41
5:20:55.454 PM: [1656.2600] <16> dtcp_write: TCP - failure: send socket (904) (TCP
10053: Software caused connection abort)
5:20:55.454 PM: [1656.2600] <16> dtcp_write: TCP - failure: attempted to send 6 bytes
5:20:55.486 PM: [1656.2600] <16> dtcp_write: TCP - failure: send socket (904) (TCP
10053: Software caused connection abort)

– The connection is being reset internally to the host.


Recommendation is to reload the NIC driver or replace the NIC.
– Error 41 can also produce TCP 10054 errors in the logs, but this
is an external closing of the connection. These can be caused by
loss of network connectivity, crashes or reboots.
– Error 41 has also been the result of corrupted VSS. Check the
Event Log for any related error messages and consult with
Systems Engineers, if necessary

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Windows Client Troubleshooting Checklist

 Narrow your effort based on error  Maximize logging values for client
code  Verify log dirs created in <install>\
 Check the simple stuff: netbackup\logs\*
 Is server cabled, decomm’ed, – bpbkar
under maint. – bpcd
 Verify hosts file(s) or DNS on all – tar
involved servers
 Network functional?  Start backup/restore
 Verify routing  Review logs searching for errors
(look for <4> <8> <16> <32>)
 Library or Media problem?  Search error message on Google
 Server hardware problem? and Symantec sites
 Check Windows event log  Test solution
 Correlate any issues  Repeat until resolved
 Run BPCLNTCMD on all involved  Open case with Symantec
servers using each option:
– -sv
– (800) 342-0652
– -pn – Cust. #: 3680-5196-9875
– -self
– -hn <hostname>
– -ip <ip address>
– -server <name of Master>

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Reporting

 NetBackup Reports
 Aptare
– In depth historical reporting and trending
– Supports several backup products, incl. TSM
– Command Center Dashboard
– Job Reports
– The Dot Report – “Don’t agitate the Dots”
– Billing – yes we can be a profit center IF we are
successful
– Media Reports

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Keeping Tabs on the Infrastructure

 Use Aptare
 Check for down drives/stuck tapes regularly
 Verify Drive Configuration
 Scratch
 Destaging
 Balance Jobs
 Tape Injects/Ejects

KEEPING PEOPLE AND INFORMATION CONNECTED ®


How To Shutdown/Restart NetBackup
 Shutdown  Startup
– Suspend/Cancel jobs – ‘netbackup start’
– Stop Aptare – Resume/Restart all jobs
– ‘netbackup stop’ – Start Aptare
– ‘bpps –a’ to see what’s
– Verify environment
running
functions
– ‘kill -9 <pid>’ to kill hung
processes
– Optionally rename startup
script
– Use ‘init 6’ to restart server if
processes will not die
– Ensure drives are empty
 ‘robtest’
 ACSLS server

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Management Tips and Tricks
 Use Activity Monitor, Restore, Policies, Device Monitor, Clients Properties most
often
 Policies – Use Summary of All Policies
 Sorting/Filtering
 Sort by State – long running jobs?
 Export to Excel – Selected rows or all rows
 Column Fields – Move, Hide, Show
 Built-in NetBackup Reports
 Help
 Use multiple windows
 Break up long running jobs
– Multiple streams per policy
– Multiple policies
– Watch jobs per policy and client settings
 Don’t forget about Aptare!
 It isn’t always clear, look at it, correlate it, think about it

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Education and Further Reading

 Google
 Symantec
– Detailed PDFs on EC troubleshooting
– Manuals/Troubleshooting Guide
– Technotes
 NetBackup Mailing List/Forums
– List: http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
– Forums
 Backup Central (mirrors the mail lists):
http://www.backupcentral.com/phpBB2/
 Symantec: https://forums.symantec.com/syment/board?board.id=21
 Tek-Tips:
http://www.tek-tips.com/threadminder.cfm?pid=776

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Questions and Answers
Altered Lyrics to the tune of the Beatles “Yesterday”

Yesterday,
All those backups seemed a waste of pay.
Now my database has gone away.
Oh I believe in yesterday.

Suddenly,
There's not half the files there used to be.
And there's a milestone hanging over me.
The system crashed, so suddenly.

I pushed something wrong,


What it was, I could not say.
Now all my data's gone,
And I long for yesterday-ay-ay-ay.

Yesterday, the need for back-ups seemed so far away.


I knew my data was all here to stay,
Now I believe in yesterday.

KEEPING PEOPLE AND INFORMATION CONNECTED ®


Thanks for attending!
KEEPING PEOPLE AND INFORMATION CONNECTED.®

You might also like