EMC Synthetic Full Backups For Incremental File Backups US10078555
EMC Synthetic Full Backups For Incremental File Backups US10078555
JETIT
(74 ) Attorney, Agent, or Firm — Staniford Tomita LLP
G06F 11/ 14 ( 2006 . 01)
H04L 29/06 ( 2006 .01) (57 ) ABSTRACT
GOOF 1730 (2006 .01) First and second virtual hard disk files are accessed . The first
(52) U .S . CI. virtual hard disk file corresponds to a backup of a file and
CPC ... GO6F 11/1451 (2013.01); G06F 17/30233 includes a first set of payload blocks to store data associated
(2013.01); H04L 65/60 (2013 .01); G06F with the backup . The second virtual hard disk file corre
2201 /815 (2013.01) sponds to an incremental backup of the file and includes a
(58 ) Field of Classification Search second set of payload blocks to store data associated with the
CPC ........... GO6F 11/ 1451; GO6F 17 /30233 ; GO6F incremental backup . Data from a payload block of the first
2201 /815 ; H04L 65 /60 set of payload blocks is merged with data from a payload
USPC ..... ..... ... 707 /645 , 646 , 640 , 656 block of a corresponding payload block of the second set of
See application file for complete search history . payload blocks to form a merged payload block . The merged
payload block is streamed for storage as a synthetic full
( 56 ) References Cited backup of the first and second virtual hard disk files . The
U .S. PATENT DOCUMENTS merging does not alter the first and second virtual hard disk
files .
5 ,210 , 866 A 5 / 1993 Milligan
5 , 758 ,347 A 5 / 1998 Lo 20 Claims, 19 Drawing Sheets
1705 –
Access a first virtual hard disk file corresponding to a backup of a file , the
first virtualhard disk file Including a first set ofpayload blocks to store
data associated with the backup
1710
1745
Merge the payload blocks to form Stream the payload block of the
a merged payload block first set of payload blocks for
1725 storage as a synthetic full backup
of the first and second virtual
hard disk files
1732 1740
707 /999 . 202 9 ,430 ,332 B1 * 8 /2016 Bahadure ............ GO6F 11/ 1451
7 , 321, 962 B1 1/2008 Fair 9 ,485 , 308B2 * 11/2016 Eisler .................. H04L 67/ 1095
7 ,478, 117 B1 * 1/ 2009 Lamb .................. GO6F 11/ 1466 9 , 507,673 B1 * 11/2016 Rangapuram ......... GO6F 3 /0619
7 ,653 , 624 B1 . 1/ 2010 Reitmeyer 9 ,535 , 907B1 * 1/ 2017 Stringham .......... GO6F 11/ 1435
7 ,689,623 B13 / 2010 Liu 9 ,606 , 875B2 * 3 / 2017 Bushman ............ GO6F 11/ 1469
7 ,725,438 B1 * 5 / 2010 Shah .................. G06F 11/ 1458 9 ,633 ,027 B1 * 4 /2017 Madiraju Varadaraju .. . .. .
---
Steorvaegr 114
alones -
-
-
-
BFSyanutchkleuipc Proces 116
-
-
-
-
100
%
2
.
FIG
BFSyanucthkleuipcGenrato 280 )
T2
(
B
File
Backup
.
Inc
Second
Synthetic
|
)
T3
(
B
File
Backup
Full
205 SBteaorcvkaeugrp220 BMaecdkiuap240 SP2avr7esn0t .
IV )
TO
(
B
File
Backup
Full
2SCFahi7vre5slAdt
)
T1
(
B
File
Backup
.
Inc
First
2SCaeh7vcio5slneBdt 285SFyanuvtehlstic
W
Genrato
Stream
& N
File
.
*
atent Sep. 18, 2018 Sheet 3 of 19 US 10 ,078 ,555 B1
GHeaPdeTr330
n
er
e GEntPriTes 325
t
mege
PGriPmaTry Entries 315
compHGeoaPdeTnrenter 310
they
are
MBR 305
U . S . Patent Sep. 18 , 2018 Sheet 4 of 19 US 10 ,078 ,555 B1
FIG . 4
Determine whether the file extent is identified in the map
440
505 - Y
Create virtual disk on localmachine
510
605
VirtualDisk FIG . 6A
(VHDNHDx )
TTTTTTTTTT
* Volume
**
*
*
FIG . 6B
***
****
615 620 .
t
625
+
N
*
-
+
• " *
P
-
W
* W
.
-
WWEWE
NRP
. i .
W
.INO I
. .
I
EIN FIG . 6C
. * *
* .
E
E r
• . .
IM
-
+
E E
+
-
.*
WW
+
IEI.
<
E
N.MENU
. .
*
-
I
F
**
***** E
. .
ESIM. .
. . -
+
. .
*
. .
.
: " .
.
.
I )
805 fRvoreloaumde825 8
.
FIG
835
BMaecdkiuap735
705
vhdx 730
/
vhd stream
7
.
FIG
|
€}}to
3
{e
}
>
>} {}} €213
. . . . . . .
III
. . . . . . .
- -
III
. . . . .
II
- -
24Kb
e
Vete
*}{*}}643
$#@3
)
)tote>${3
Wette*}******
Vette? **ote
}
tem**
{
C
24Kb
II
. . .
. . .
34 IIIIII
IIIIIII
. . . . . .
. . .
D 10
.
FIG
.
FIG
.
12
€?
+234
te}}{*
hetReteteet e
hette2403)tote
+23
}}
?
)
)**te
*@"
}> tex
}totex
Veote
te}}+6+€3*{{}*}
*** * }
*
@*
**
**
Start
, length
920
9491 16MB
IIIIIIII
16Kb 50MB
1 1
. . . . . .
. . . . .
. . . . . . .
9
.
FIG 11
.
FIG
2MB
- - - - - - ·
4Kb . . . . . . .
8MB 4Kb . . .
- 1 -
. . . .
-
1 1
- -
Vete
)B e
6+03
CA
*23)tete
4Kb vetette
Bette
Vente
*
.
9
*91919197483*
**
Kb
1
910
-
70
%
U . S . Patent Sep . 18, 2018 Sheet 8 of 19 US 10 ,078 ,555 B1
- - - - - - - - - -
1380
13 7
rbFeldiocltkieosn
1D3is0k 13 5 FDuimley 1340
Volume DFaitlae 1325 VSonluampe 1320 FChianlged Data 1365 SVonlaumpe1360
U . S . Patent Sep . 18, 2018 Sheet 9 of 19 US 10 ,078 ,555 B1
1505
Start
1405
| DB PB PB PB SB PB
6
SB
FIG . 15B
atent Sep. 18, 2018 Sheet 11 of 19 US 10 ,078 ,555 B1
6
PB
16 5 to
Link parentbackup 1655B
- PB
HGeaPdeTr 330 2
5
PB 165 A
1625C
L P
1
B
N
-
7 FT FTKETET E
N
- - -
ANI11
.
- -
.
- * - -
1
- bpfInalciryoemnkatesdl
EGPTntries 325 4
PB
16Kb 50MBL1625B PBO
+ + + +
1mm620 *
restFree
)999***
L1625A 1660 fInciremlntea toVHDXbackup1640
4Kb24KbLength 90489790483 Start1Kb2MB
8219
****1897**&
*
Var
99'!4897912a9
899+Or
2
PB 997971819
PGriPmaTryEntries315 1630C 2
-
1615 1650B
1
B
P
1615
50MB 1650A-
. . . . . .
1 . .
4Kb
1630B
GHeaPdeTr 310 PBO
TESTYre565199316697
4535131616
***31
* ***
4Kb 2MB
16A
.
FIG 16B
.
FIG
1630A 1645
MBR 305 _
1605 1610
U . S . Patent Sep . 18, 2018 Sheet 12 of 19 US 10 ,078 ,555 B1
1705
Access a first virtual hard disk file corresponding to a backup of a file , the
first virtual hard disk file including a first set of payload blocks to store
data associated with the backup
1710
1815 2015
2
PB 2
PB 2
PB 2
PB
1820C 21 57
1
B
P - -
1
- -
1 F
- - - - -
. .
1 + F1IF
- - - -
.
-
1820B
1
PB
Infpcbraielymoncteaklds 1
B
P
bpfInalciryoemnkatesdl 1
B
P 1 + 11E1F1 EF
PBO A A
1920APBO PBO
21057 PBO
teme3 4650 AVEC436Horeca
AGE
*******
BECE%AFPSH*$*631546
* *Eneste
–
1820A 7
2110
file to
VbaHcDkuXp1810 2020A fSyniuthleteic merge
full
First 1915 FinfcriemslntealtoVbaHcDkuXp1910 iSnecrmontad toVbfaHickDluXep2010
1805C
16Kb 50MB 1805C 8Kb 54MB 20 5A
1905A 4Kb
24Kb 2MB 2005A
-
1805B
€3757wy
16VIEWESY
2160161*€5414
1805B 2MB
+
4Kb
Kb
1
COTONE
4Kb E1421181471410A 1Kb
*
*
4MB
Metad 3MB
BAT
I
2MB
Log
1MB
|Reserved
320 KB
Region 2
Table 22
.
FIG
265 KB
Region Table1
192KB
2Header
128 KB
1Header
64 KB
File
Identifer KB
KB
U . S . Patent Sep . 18, 2018 Sheet 15 of 19 US 10 ,078 ,555 B1
End
2320
w
Start with first incremental
and loop through
incremental chain
2325
??????1g0?a5gn?????????
?
E? ? ? ? ?St?y ? 1999?
? ?? ??? ?? ? P?f? ? ?
?????????
?3416
:??
4 :
?
? ?????????PS3???????
????
,??????
?? ???
???????,ps??
34735?#?;????? ?
??? ? ? ? ?????????,? ? ? ? ? ?
???????{???????????????
????????;?????
???????????? ??????
,??????????????????
?????????????????????????
?????
?
BforPPBEalnyotcrakyd E
SB
S
for
B e
i
n
lc
to
m r
c
o
a y
k
r
p
,?“ ?,2015????
?????6164????????
:????? ? ? ??
?????????????
?
????19??????????
?
?????????????
?i?f y??1?5??3?0?5??? ??3 4 ?? ?????????3??N5160?59358pexx
4
???
?19?4??193?
??????????
? 6????? ?
?????????????????? ?
,38910?N??
??????6???
?? ?????? ?) ????394
? ??? ??5
?356536d2????X
?????
*????
??? ???
??AGEM ,?????
? 31/?Sigf
yy????????
)
????
5
?
?? ?
|
?
T
,
4,??19 ? ? ? Bo???????????,
??2? ?????? ?
?
?
??? ???????,
????
”
SB
|
PB
6
5
4
0
3
|
2
1
|
0
?15 ,?????GARM32?? 91??? ,
????????
?
?? ?
? ? ? ? ? ?s
, ?
???? ?43?????D?43??»??????
??????????
?????????????,????? ? ? ?“??? ?
?????
???????
???????
wen????
:ND
?
”
? ????????????? ? ??
? ,
A,?pany?? ? ? ? ?
??????????
,? ? ? ?? ??? ?3?0 ? ? ?
?????
36????????
)
30
???p35?$
?????,????4???????
46:31??15??? ???????
) ???????1953?
?
$
??????? ?
=f
= ??
inter 24
.
FIG
:? ? ? 3?1 ? ? ? ?
?????“
31899
199????14
PA
?????
??
??? ??
??? , “““““ 1
#
Fo
A
coanctkaiunps
entry
both
full
table
BAT
s
'
thetopofBATtAraebrpalrey .esdixtreibnuatemsd forlastBATtcN-1eoantbalrineys
andbfcoaunctklaiunps forldesacrsiptor ci.nhreamintal .cinhreamintal
Distrbued ESxtrenatm ;tIsnta6r4t Int64_tlength;Int64_tid; doffAesric lpatoeyrs toreadfromothentry
Contains
entry
1
-
N
U . S . Patent Sep . 18, 2018 Sheet 17 of 19 US 10 ,078 ,555 B1
Start
End
Read sector bitmap Convert bitmap into
corresponding to the payload extents of sector size
block 2525
2520
Full|
1Level
????
Level
2
v3? ?43y
Merged ???y
? ? ? ? ????
/???5 * /???
?????“ **
???? G
(**
*¢€
*
?? .3859?
?6?9?
?????????'
29**442
/
.*
*???
*??
? t e? ?
*
? ? ?t?
*
*??
* *
4??
?ust115 ??#3#&?????»ty????
????"?b44a
?
???5? ?
? ? M? v?*?????????????43442?
? ? y? ? ?40513?4 9? 4 4
??*#349443~1?4?,129???b
vdt?? 4
? ??tesff???
??442965???Y???4
4?%?????? ???????÷¥?49b4
???????
?t?
???
?6c324f456
?A ? •i? ? ?3 ? ?
? \?¢39y?#**?*?????»i?8bs%c???is»e??i?????s
3b?5f¢??????????
??”???5????????£??34?sfa?#5#**»???
??
Yese
?
???e??
????de15 •31442034
»34?:33itt??
?¢3?*41¢???
39544965?
*¢*??????ci?
**????**??
? ? $??#*????#?????????????
?806548964•“
????????
????ves%4»»?????#???%??
?¢??4
?????(Wexb# ?<+??<**
???????
tabi?8""???????
?????c?????????????:?Eskatu??893???????
???**
bi?????
???
*
***
#%
5??
??9%
?
***???»
*»?
#?bd»$$(?????
&?»934
345##*
#?£3** &{????
logy
? ? 393i4%G?ikiw
& ????
%?????????
*34???%?a?£3????itt
BforPPBEalnyotcrakyd BforSSBEielntcomcrakyp
?$**43#*??? 248138-93 9b8b?? % #??”????????
3
???
?
.
.
??
????????
? s?????t
349???
???
A????????????
?5£55tide??????
*
????
**
*???b
* (????
???????? #
*4€*)f???????€?93%?6
? ? 3#1?*?*????*?$%of????f2b4?????t?
?&**?©???????
¢?1*5 ??????*?* * )
(54e9
©?? ????? ?? 9
**
**£G??
|
?????? b4493?3
*
**
?? ????????***&**?»
?)?4??ayy????
???(??»???
»???????? v?
v ?
* *
:
5
????54443bqq??
???????**?15
????»9b
»5%{345?
{i???yi
#&xt»354€???s??bi??a
% »?????????? »»??????£*» *
#
*#*
#*
??????
? ? y
»????
18?????$$??934244???y¢svo??y??
*
******%3?
a4 #%?k?88
9? ??iyy&???????
**#* ???????42????? *?&???4
A??? ????s v«??444??????
*&#*
??? ?? »??? ?•%
94%
8b%
bb%
44»? ??48?Da£??9%955a4%?
????b???
???» **\{
??????? ?
48?\???????216€1
{
{ 39
}-
% % 13* #*89%???ity?(????
)43?????4 ??(8? ? ? ?????Ayt?34Abd
?
???????????
3?#
?g?
????Et*b???????????So****
*
#
4*
???4.
043bb???
??89??dia
_ ??
**
*
*
#
*9
?
Ve2?????b146? 54
€3???8336????** 6*5
%453394. '?
?
44?53fety
? aft?
"39x4??:????4 ??
***#* 34?? 3? ??s ?9?44 **
4
?????%
????**}2(*
ab%??b
)
?5
(
* )
.14433{{v ????%9b%'*???? .£4
**?d?
.
6
5
4
3
|
2
1
0 ??? )£3 ?¥?yc???? (exte??? ????? ??????????????? E ??« ?
0
SB
|
PB ??ys????????
????idt?M?
**#:!été
*
*
b4f9??? ?ta??88
4??ex???
????kf??
*&$????34
›?????
?A4?????35
9F?????
*????%???8“
%????b
???
? 44b9b545
?<f?#?#%???
AF%%b4b4
?42???
F??%??4??
????**#
93v€1v?
?*?**??
#
«***?£???«
£ 83 cf5bd34dc8a???e?????????????
??????(3????
Exte?
??????&??
¢€3?
?481388
?¢???
:
13?s??
5??15(5?
-a3£9E?????c
% ???ss??
??)(%????9F
****3?????
?8035341st
??3A4
1 *¢??
????9e2588%25
**??????998??%a8%9b%
+vi????????5??????
*????»???
% ?
&????yto45453
?????????
**#R??#*????****
(****»???
???
?????????AK???kf?
(*4????????
??
**££*
*
*
»***»4fa%434c312b5
?? #4©p??
#?
&43343
?49:
3*
?
345%Art
Mus?1
??»
???
*»*????
??
f
??? ??
3 &**???
2
1
%?b?a4$<My»»??{«
9 % %?**
**& # ??? #*#43#
?iki?? ???????????* i?3??)439.44??449
?
?????
?te?????¢'**??*499
??????e
???????
?
????9???
????by?
?????yi?????? %9b5%????i?
??»?????»»??
??????
*3???
May??
?**4851stca»? ? ???
%???
?•
»% <»??
?**?fay¥Ké??
ac%9E%*
*??48
8?? ????????e? ?? \??i?e
?????%ad%??C???
??????
*
?????????+B5???
#%
?kickab
?« £69*
%8%i?? ?« ??“
1
??44????
??»???»**??**
1'.
%b4 #*
??ckb££
?? **
? ?r ????
HPBEPBl 0
P
?3??$()4493é 397????
? 43??????
??????????
***££***?ie?? .
??
??**&4***
? ?$ist??????
??954 ?*
&
*
49(
*
E
.????????
0???*%ab?E4
%
26
.
FIG
2
O
2042K
8K
6k
2
2K
4K
)
1
1K
,
0
(
=
PB0
1916K
132K
2
4K
)
O
128K
,
0
(
=
PB5
PB1 PB2
O
2024K
24K
2
4K
)
20K
,
0
(
= ?
2000K
48K
1
32K
)
O
16K
,
0
(
= 1776K
272K
1
16K
256k
)
O
256K
,
0
(
=
PB6
End
FIG . 27
US 10 ,078 , 555 B1
SYNTHETIC FULL BACKUPS FOR FIG . 1 is a diagram of a large -scale network implementing
INCREMENTAL FILE BACKUPS a data backup and recovery process that provides for full and
incremental backups of one or more files and for the
CROSS -REFERENCE TO RELATED synthesis of a full file backup , under some embodiments.
APPLICATIONS FIG . 2 shows an overall architecture of a system for
backup and recovery .
This patent application is related to U .S . patent applica FIG . 3 shows an example of a block -based backup image
tion Ser. Nos. 14 /686 ,400 ; 14 /686 ,438 ; and Ser. No . 14 /686 , format.
468 , all filed Apr. 14 , 2015 , which are all incorporated by 10 files in a4 mountable
FIG . shows a flow diagram for backing up one or more
format.
reference along with all other references cited herein . FIG . 5 shows a flow for creating a virtual disk container.
TECHNICAL FIELD FIG . 6A shows a block diagram of a virtual disk .
FIG . 6B shows a block diagram of a volume on the virtual
The present invention relates generally to the field of disk .
backing up computer data , and , more particularly, synthe 15 FIG . 6C shows a block diagram of various sections of the
volume.
sizing a full backup of a file . FIG . 7 shows another flow for backing up one or more
BACKGROUND files .
FIG . 8 shows a flow for processing a backup of a file into
In today 's digital society organizations depend on having 20 a VHD /VHDx stream .
FIG . 9 shows a block diagram of file blocks in a target
ready access to their data. Data, however, can be lost in a volume.
variety of ways such as through disasters and catastrophes FIG . 10 shows a block diagram of target file relative
( e. g ., fires or flooding ), media failures (e . g ., disk crash ), blocks
computer viruses , accidental deletion , and so forth . Thus , it 25 FIG . 11 shows a block diagram of file blocks in a source
is important that the data be backed up . An organization may volume.
have an immense amount of data that is critical to the FIG . 12 shows a block diagram of file relative blocks in
organization 's operation . Backing up data and subsequently a source volume
recovering backed up data , however, can involve large FIG . 13 shows a schematic of a full and incremental
amounts of computing resources such as network band - 30 backup .
width , processing cycles , and storage due to the complexity FIG . 14 shows a flow for an incremental backup of a file .
of data to be backed up and the amount of data that is backed FIG . 15A shows another flow for an incremental backup
up . of a file .
In some cases , it is desirable to selectively backup one or FIG . 15B shows an example of a Block Allocation Table
more individual files of a volume in a mountable format in 35 (BAT ) layout.
order to , for example , speed recoveries, enable the replay of FIG . 16A shows the structure or layout of a virtual hard
logs, and ensure data consistency . Excluding other files in disk .
the volume from the backup helps to conserve computing FIG . 16B shows an incremental backup of file blocks .
resources because a backup of a single file (or subset of files ) FIG . 17 shows a flow for creating a synthetic full backup
in the volume is faster than backing up the entire volume. 40 of a file.
Computing resources such as network bandwidth and stor - FIG . 18 shows a full backup of a file at a time TO .
age on the backup media will also be conserved . It is also FIG . 19 shows a first incremental backup of the file at a
desirable to perform incremental backups of a particular file time T1.
so that changes to the file are also backed up . Further, having FIG . 20 shows a second incremental backup of the file at
a full backup of a file in a mountable format (e. g ., can be 45 a time T2 .
assigned a drive letter and accessed through the computer 's FIG . 21 shows a synthetic full file backup at a time T3 .
file system ) helps to ensure a smooth recovery and reduce FIG . 22 shows an example of a new VHDx stream .
administrative overhead . FIG . 23 shows a flow to determine common payload
The subject matter discussed in the background section blocks across an incremental chain and generate a new BAT
should not be assumed to be prior art merely as a result of 50 table .
its mention in the background section . Similarly , a problem FIG . 24 shows the design elements and constructs used
mentioned in the background section or associated with the during the merge process .
subject matter of the background section should not be FIG . 25 shows a flow for determining merged payload
assumed to have been previously recognized in the prior art. blocks block -by -block from the merged BAT table .
The subjectmatter in the background section merely repre - 55 FIG . 26 shows a block diagram showing a merged dis
sents different approaches, which in and of themselvesmay tributed stream extents of a full backup followed by two
also be inventions . EMC , Data Domain , Data Domain incremental backups.
Restorer , and Data Domain Boost are trademarks of EMC FIG . 27 shows a flow for distributed stream extents
Corporation . representing merged payload block areas across an entire
60 incremental chain .
BRIEF DESCRIPTION OF THE FIGURES
DETAILED DESCRIPTION
In the following drawings like reference numerals desig
nate like structural elements . Although the figures depict A detailed description of one or more embodiments is
various examples, the one or more embodiments and imple - 65 provided below along with accompanying figures that illus
mentations described herein are not limited to the examples trate the principles of the described embodiments. While
depicted in the figures . aspects of the invention are described in conjunction with
US 10 ,078 ,555 B1
such embodiment(s ), it should be understood that it is not ments are not limited thereto , and may include smaller -scale
limited to any one embodiment. On the contrary , the scope networks, such as LANs (local area networks ). Thus , aspects
is limited only by the claims and the invention encompasses of the one or more embodiments described herein may be
numerous alternatives , modifications, and equivalents . For implemented on one or more computers executing software
the purpose of example, numerous specific details are set 5 instructions, and the computers may be networked in a
forth in the following description in order to provide a client- server arrangement or similar distributed computer
thorough understanding of the described embodiments, network .
which may be practiced according to the claims without FIG . 1 illustrates a computer network system 100 that
some or all of these specific details. For the purpose of implements one or more embodiments of a mountable
clarity , technical material that is known in the technical 10 container for full and incremental file backups and synthe
fields related to the embodiments has not been described in sizing a full backup of a file . In system 100 , a number of
detail so that the described embodiments are not unneces - clients 104 are provided to serve as backup clients or nodes .
sarily obscured . A network server computer 102 is coupled directly or
It should be appreciated that the described embodiments indirectly to these clients through network 110 , which may
can be implemented in numerous ways, including as a 15 be a cloud network , LAN , WAN or other appropriate net
process, an apparatus, a system , a device, a method , or a work . Network 110 provides connectivity to the various
computer-readable medium such as a computer -readable systems, components, and resources of system 100, and may
storage medium containing computer-readable instructions be implemented using protocols such as Transmission Con
or computer program code, or as a computer program trol Protocol ( TCP ) and / or Internet Protocol (IP ), well
product, comprising a computer -usable medium having a 20 known in the relevant arts . In a distributed network envi
computer -readable program code embodied therein . In the ronment, network 110 may represent a cloud -based network
context of this disclosure, a computer -usable medium or environment in which applications , servers and data are
computer - readable medium may be any physical medium maintained and provided through a centralized cloud com
that can contain or store the program for use by or in puting platform . In an embodiment, system 100 may repre
connection with the instruction execution system , apparatus 25 sent a multi-tenant network in which a server computer runs
or device . For example, the computer -readable storage a single instance of a program serving multiple clients
medium or computer -usable medium may be, but is not (tenants ) in which the program is designed to virtually
limited to , a random access memory (RAM ), read -only partition its data so that each client works with its own
memory (ROM ) , or a persistent store , such as a mass storage customized virtual application , with each virtual machine
device, hard drives, CDROM , DVDROM , tape, erasable 30 (VM ) representing virtual clients that may be supported by
programmable read -only memory (EPROM or flash one or more servers within each VM , or other type of
memory ), or any magnetic , electromagnetic , optical , or centralized network server.
electricalmeans or system , apparatus or device for storing The data generated within system 100 may be stored in
information . Alternatively or additionally , the computer- any number of persistent storage locations and devices, such
readable storage medium or computer -usable medium may 35 as local client storage, server storage 114 , or network
be any combination of these devices or even paper or storage , which may at least be partially implemented
another suitable medium upon which the program code is through storage device arrays, such as RAID components. In
printed , as the program code can be electronically captured , an embodiment network 100 may be implemented to pro
via , for instance , optical scanning of the paper or other vide support for various storage architectures such as storage
medium , then compiled , interpreted , or otherwise processed 40 area network (SAN ) , Network - attached Storage (NAS), or
in a suitable manner, if necessary , and then stored in a Direct- attached Storage (DAS ) that make use of large -scale
computer memory. Applications, software programs or com - network accessible storage devices , such as large capacity
puter - readable instructions may be referred to as compo - tape or drive (optical ormagnetic ) arrays . In an embodiment,
nents or modules. Applications may be hardwired or hard the target storage devices, such as tape or disk array may
coded in hardware or take the form of software executing on 45 represent any practical storage device or set of devices , such
a general purpose computer or be hardwired or hard coded as tape libraries , virtual tape libraries (VTL ), fiber- channel
in hardware such that when the software is loaded into (FC ) storage area network devices, and OST (OpenStorage )
and / or executed by the computer, the computer becomes an devices . In a specific embodiment, however , the target
apparatus for practicing the invention . Applications may storage devices represent disk - based targets implemented
also be downloaded , in whole or in part, through the use of 50 through virtual machine technology .
a software development kit or toolkit that enables the For the embodiment of FIG . 1 , network system 100
creation and implementation of the described embodiments . includes a server 102, one or more backup clients 104 that
In this specification , these implementations, or any other execute a process 112 for a full backup of a file , an
form that the invention may take, may be referred to as incremental backup of the file, or both , and storage server
techniques. In general, the order of the steps of disclosed 55 114 that executes a synthetic full backup process 116 of a
processes may be altered within the scope of the invention . file .
Disclosed herein aremethods and systems of a mountable In an embodiment, system 100 may represent a Data
container for performing full and incremental backups of Domain Restorer (DDR )-based deduplication storage sys
one or more files and methods and systems for artificially tem , and storage server 114 may be implemented as a DDR
creating a full backup of the one or more files that can be 60 Deduplication Storage server provided by EMC Corpora
used as part of a disaster recovery solution for large - scale tion . However , other similar backup and storage systems are
networks. also possible . System 100 may utilize certain protocol
Some embodiments of the invention involve automated specific namespaces that are the external interface to appli
backup recovery techniques in a distributed system , such as cations and include NFS ( network file system ) and CIFS
a very large - scale wide area network (WAN ) , metropolitan 65 ( common internet file system ) namespaces , as well as a
area network (MAN ), or cloud based network system , virtual tape library (VTL ) or DD Boost provided by EMC
however, those skilled in the art will appreciate that embodi- Corporation . In general, DD Boost (Data Domain Boost ) is
US 10 ,078 , 555 B1
a system that distributes parts of the deduplication process to backup storage server or protection storage managed by the
the backup server or application clients, enabling client- side backup storage server. The backup storage server and pro
deduplication for faster, more efficient backup and recovery . tection storage may include disk , tape , a deduplication
A data storage deployment may use any combination of storage system (e .g ., EMC Data Domain ), or combinations
these interfaces simultaneously to store and access data . 5 of these .
Data Domain (DD ) devices in system 100 may use the DD A feature of the system shown in FIG . 2 allows for the
Boost backup protocol to provide access from servers to DD backup of a single file (or a subset of files) from the source
devices . The DD Boost library exposes APIs (application volume in or to a mountable format rather than the entire
programming interfaces ) to integrate with a Data Domain source volume. For example , volume image backups can be
system using an optimized transport mechanism . These API 10 performed for full and incremental backups. Such backups
interfaces exported by the DD Boost Library provide mecha - can be advantageous in environments where there are mil
nisms to access or manipulate the functionality of a Data lions of files to be backed up such as in High Density File
Domain file system , and DD devices generally support both System (HDFS ) environments . Since volume backups read
NFS and CIFS protocol for accessing files . data from volumes and not from files , the number of
FIG . 2 shows a system 205 for backing up one or more 15 metadata operations during backup is much less. Since I/ O ' s
specific files from a client to a backup storage server in a are in sequential order , they also improve performance and
mountable format. In other words, the backed up file can be use less resource.
presented to an operating system of a computer hosting the In some cases, however, it is desirable to backup a single
backed up file as a volume or mounted as a volume in the file or subset of files from the source file system in or to a
host computer. For example , in a Windows OS , the backed 20 mountable format rather than performing an entire volume
up file may be assigned a drive letter and may be accessed block transfer. In particular, if only a file or a subset of files
through the assigned drive letter. from the source file system has to be backed up in or to a
In a specific embodiment, the file is backed up as a virtual mountable format, then significant space and time will be
hard disk file that may be formatted as a VHD (Microsoft saved because other files in the source file system not
Virtual Hard Disk Image ) or Microsoft VHDx file (a Micro - 25 needing backup will not be copied over to the backup
soft Hyper- V virtual hard disk ). The VHDx format is a storage server. In modern backup systems where the primary
container format which can contain disk related information . target is disk -based the option of mounting these backup
VHDx files can be mounted and used as a regular disk . images as it was in the source presents a major challenge .
Volumes such as NTFS (New Technology File System ), One example of a use case for near instant restore ready
ReFS (Resilient File System ), FAT32 ( 32 -bit File Allocation 30 backups includes applications where read -write access is
Table ), or any file system which the OS supports on the desired to achieve instant uptime of application files. For
mounted disk can also be created . Differencing VHDx ' s can example , such features can be desirable for database type
be created which will have internal references to the parent backups of, for example , Microsoft Exchange, SQL , Share
VHDx . Further discussion of the VHDx format is provided Point and Hyper - V which require the backups be exposed as
in “ VHDX Format Specification ,” Version 0 .95 , Apr. 25 , 35 files (source file system NTFS (New Technology File Sys
2012 , from Microsoft Corporation, and is incorporated by tem /ReFS (Resilient File System ), etc .) which have read
reference. The file to be backed up may be in any file format write permission so that they replay logs and other activities
and the format may be the same as or different from the to make the database online and consistent.
resulting backup file . For example , the file to be backed up The system shown in FIG . 2 addresses the above problem
may be formatted as a VHD /VHDx file, a Microsoft 40 in an efficientmanner. The system can be used in scenarios
Exchange DataBase (EDB ) file , a Microsoft SQL Server where there is a need to backup a database ( e. g ., Microsoft
(MDF) file , Oracle database file (DBF ) , or any other file Exchange database ), or where there is an application in
format. which it would be desirable to backup multiple files ( e.g .,
As shown in the example of FIG . 2 , this system includes two ormore files ) present in a particular folder on the source
a backup server 210 , one or more backup clients 215 , and a 45 volume. Such files can be very large . The system can be used
backup storage server 220, each of which are connected via to backup data at the block - level, e. g ., a block -based sub -file
a network 225 . The network may be as shown in FIG . 1 and backup . As discussed in further detail below , the system
described above. The servers, clients, or both can be general backs up the used blocks of a file by identifying the file
purpose computers having hardware and software . For extents occupied by the file . A file extent is a contiguous area
example , the client may include an operating system 226 50 of storage reserved for a file in the file system , represented
( e. g ., Microsoft Windows OS ), and storage 227 . The storage as a range , and a file can have zero or more extents . The file
includes a volume 228 that stores any number of files 229 extents provide the starting offset and the length of the
(e. g., file A , file B , file C . . . file N ) . Volume 228 may be particular extent occupied by the file (e .g ., an initial block
referred to as a source volume. address and the number of blocks that make up the extent).
Although FIG . 2 shows a single client, it should be 55 In other specific embodiments, the system further provides
appreciated that there can be any number of clients. For for incremental backups and artificially synthesizing full
example, there may be tens, hundreds, or even thousands of backups at the file or sub - file level.
clients to be backed up . Similarly, there can be multiple In a specific embodiment, techniques are provided for
backup storage servers or nodes to help increase perfor - creating a full and incremental backup of a target file by
mance, provide redundancy, or both . 60 copying all or only changed blocks of the target file into a
In a specific embodiment, there is a backup application VHD /VHDx format. The software module ( e.g ., backup
that includes a backup application server module 230A and application client module ) creates a VHD /VHDx stream
a backup application client module 230B . The backup which contains all the VHD /VHDx related metadata and the
application client and server modules communicate with disk metadata such as MBR ,GPT and the file contents on the
each other to backup data on the client. For example, the 65 fly, which is then streamed to the backup medium such as
backup application client module , when instructed by the tape or disk targets as a single stream . The resulting saveset
backup application server module , backs up client data to the can then be mounted which will contain the file backed up
US 10 ,078 ,555 B1
for recovery purposes. The resulting VHD /VHDx file may template virtual disk /volumemay be referred to as a backup
contain only one backed up file , which makes it easier to container from which the block -based backup is streamed .
chain incremental backups of a particular file, which will be The translation and mapper engine (which may be
linked to its parent. referred to as a file /block extents mapper engine) is respon
The backup storage server includes a catalog 235 and 5 sible for converting or managing the translation from virtual
backupmedia 240 . The backup media stores data backed up cluster numbers (VCNs) to logical cluster numbers (LCNs)
from the clients . The backup media may be referred to asa o r converting from Target Logical file blocks ( TLFB ) to
target. The storage may be local to the server or may be Target file relative blocks ( TFRB ) when a data block of the
external such as in the form of a deduplication appliance , or file needs to be read from disk for backup. File data is read
other storage configuration . The backed up data may include from the file residing in the source volume ( TFRB to Source
a volume, portion of a volume, applications, services , user - File Relative Blocks (SFRB )) . In other words , a mapping
generated or user data , logs, files, directories, databases , may be performed to translate , correlate, or convert between
operating system information , configuration files, machine high - level logical identifiers and lower-level identifiers of
data , system data , and so forth . data . In a specific embodiment, the conversion is facilitated
The catalog provides an index of the data stored on the by the file system application programming interface (API)
backup storage server or protection storage managed by the FSCTL _ GET _RETRIEVAL _POINTERS as provided by
backup storage server. The backed up data may be stored as the Windows OS .
a logical entity referred to as a saveset. The catalog may The changed block tracking (CBT) driver is responsible
includemetadata associated with the backup (e .g., saveset) 20 for tracking the blocks that have changed in a volume since
such as an identification of the file or files stored on the the last backup . The CBT is an OS driver module that tracks
backup storage server ( e.g., globally unique identifier the writes to a particular volume. The CBT driver can
(GUID ) of a backed up database ), the time and date of provide all the cumulative changes of a volume since the last
backup , size of the backup , path information , and so forth . backup . The CBT driver can identify, for a particular vol
In the example shown in FIG . 2, a file B 265 A residing on 25 ume, the blocks that have changed since the last backup .
the source volume has been backed up at a time TI and This includes changes that include more than one file . The
saved to the backup media as part of a saveset 270 . In this CBT driver can monitor changes since the last file backup .
example , the backup is a full backup of the file and saveset The CBT filter is responsible for filtering the set of changed
270 may be referred to as a " parent” saveset. The backup blocks provided by the CBT driver in order to identify the
media may further include any number of incremental 30 changed blocks associated with the file or set of files to be
backups for a particular file which are linked to the parent incrementally backed up .
backup of that particular file. These incremental backups The synthetic full generator is responsible for merging
may be stored as separate savesets and may be referred to as one or more incremental backups of a parentbackup of a file
" child ” savesets . For example , FIG . 2 shows a first child to generate a synthetic full saveset. In the example shown in
saveset 275A , and a second child saveset 275B which are 35 FIG . 2 , the synthetic full generator has merged first and
linked to parent saveset 270 . The first child saveset includes second incremental backups of backed up file B with the
a first incremental backup of file B taken at a time T1 after parent backup of file B to create a synthetic full saveset 285
time TO . The second child saveset includes a second incre - at a time T3 , after times TO , T1 , and T2 .
mental backup of file B taken at a time T2 after time T1 and The components of the backup system shown in FIG . 2
TO. 40 are functional entities where the implementation of the
FIG . 3 , shows an example of the block based backup functions may vary . For example , in some cases the backup
image format within a VHD /VHDx formatted file . The manager stream generator and template generator are com
stream layout for volume backup shown in FIG . 3 includes bined into one code module . In other cases, the generators
a master boot record (MBR ) 305 , a GUID partition table reside in separate code modules. A component of the backup
(GPT) 310 , a GPT primary entries section 315 , a disk and 45 application client module may function at the application
volume contents section 320 , a GPT entries section 325 , and program level or the operating system level in order to carry
a GPT header section 330 . The data of the backed up file is out its functions.
stored in the disk and volume contents section . In a specific FIG . 4 shows an overall flow 405 for backing up one or
embodiment, one volume is embedded in one VHD con - more files from a client to a backup storage server in or to
tainer. The GPT partitioning style helps to avoid disk sig - 50 a mountable format . Some specific flows are presented in
nature collision when the virtual disks are mounted . The this application , but it should be understood that the process
GPT partitioning style is supported in both the client and is not limited to the specific flows and steps presented . For
server versions of, for example , the Microsoft Windows 8 example , a flow may have additional steps ( not necessarily
platforms. described in this application ), different steps which replace
Referring back to FIG . 2, in a specific embodiment, the 55 some of the steps presented , fewer steps or a subset of the
backup application client module includes a backup man - steps presented , or steps in a different order than presented ,
ager and stream generator 235 , a template generator 240 , a or any combination of these. Further, the steps in other
translation and mapper engine 245 , a changed block tracker embodiments may not be exactly the same as the steps
(CBT) driver 250 , and a changed block tracker filter 255 . presented and may be modified or altered as appropriate for
The backup storage server includes a synthetic full generator 60 a particular process , application or based on the data .
280. In a step 410 , one ormore files stored in a volume of the
The backup manager is responsible for coordinating the client are identified for backup in or to a mountable format
various components of the backup application clientmodule to the backup storage server. For example, the backup server
including creating a block -based backup stream for the data may generate and send to the client a backup request
to be backed up . The template generator is responsible for 65 specifying the file to backup . The client receives the backup
creating a template virtual disk or volume 260 on the client request and parses the request to determine the file to be
that corresponds structurally to source volume 228 . The backed up .
US 10 ,078 ,555 B1
10
In a step 415 , a virtual volume is created on the local tion ) associated with the file to allow the backed up file to
client with one or more files that emulate the one or more be mounted as a virtual hard disk .
files to be backed up . The template generator creates on the FIG . 5 shows a more detailed flow 505 for creating the
client a virtual volume that corresponds to the volume in corresponding template virtual volume including a template
which the file to be backed up is stored . This virtual volume 5 file in the virtual volume for each file to be backed up . In a
may be referred to as a template , dummy, or container. The step 510 , a virtual disk or volume is created on the local
container can be used to store objects in an organized way machine. For example , as shown in the example of FIG . 2 ,
following specific access rules . The container can be a class , there is a virtual volume 260 that has been created on the
a data structure , or an abstract data type. In a specific client. The virtual volume is formatted with a file system of
embodiment, the virtual volume is referred to as a VHDI 10 the volume storing the file to be backed up . For example, if
VHDx container. That is, in a specific embodiment, the source volume 228 that stores the file to be backed up is an
virtual disk or volume includes the VHD /VHDx format. NTFS file system , virtual volume 260 is formatted with
Creating a correspondence between the virtual volume NTFS .
and the volume storing the file to be backed up includes 16 In a step 515 ( FIG . 5 ), saveset or information about the
creating a file system on the virtual volume that is similar to file to be backed up is obtained . The information includes a
the file system of the source volume having the one ormore size or current size of each file in the volume to be backed
files to be backed up . More particularly, creating the corre - up. In a step 520 , the virtual volume is sized based on the
spondence includes replicating in the virtual volume the total size of the files in the volume to be backed up . That is,
directory structure in the volume associated with the file and 20 the virtual volume is sized to accommodate the total size of
creating a template file that corresponds to the file to be the files to be backed up . For example, the system may add
backed up . The virtual volume, and in particular the template or sum the file sizes of each backup file to compute the total
file , however , do not include the actual data of the file to be size of the backup files. The size of the virtual volumemay
backed up . That is, they are blank , empty , or without the be set so that it is equal to or greater than the total size of the
actual data of the file . 25 backup files .
Thus, the virtual volume may be referred to as a dummy In a step 525 , the template generator creates for each file
volume or dummy VHD /VHDx and represents a temporary in the volume to be backed up a template file in the virtual
or interim data storage element. The template file may be volumehaving a size thatmatches the corresponding backup
referred to as a dummy file . These dummy files help to file size, but where each template file is without data . The
ensure a recreation of the exact virtual disk structure to be 30 filenames of these template or dummy files may be set to
backed up . During the backup cycle , only the metadata some dummy values . For example , FIG . 2 shows an example
information in the virtual disk is copied from the template in which a file B 265A stored in the source volumehas been
file (e .g., dummy VHD /VHDx file ). The actual data of the identified for backup . A corresponding template file B 265B
file to be backed up is read from the file extent on the actual has been created in the virtual volume.
source volume. When the backup is complete , the dummy 35 Thus , the number of template or dummy files created in
virtual disk and dummy files can be deleted from the local the virtual volumemay be equal to the number of files in the
client. The dummy virtual disk and dummy files are tem - source volume that are to be backed up . For example , if a
porary constructs used during the backup operation . single file has been identified in the source volume for
In a step 420 , the backup manager identifies a set of file backup , a single corresponding template file may be created
extents occupied by the one or more files in the source 40 in the virtual volume. If two files have been identified for
volume to be backed up . In a step 425 , the identified set of backup , two corresponding template files may be created in
file extents are stored in a map . Themap may be stored at the virtual volume, and so forth .
the local client. The map thus identifies the source extents Similarly , the size of the template or dummy file is
occupied by the files . During the actual backup , these file configured to be equal to the size of the corresponding size
extents will then be read by the backup manager. 45 of the file in the source volume to be backed up . For
In other words , the used blocks of a file are backed up by example , if file B 265A (FIG . 2 ) has a size of Y bytes, the
identifying the file extents occupied by the file . The file size of corresponding template or dummy file B 265B will
extents provide the startingoffset and length of the particular be configured to have a size of Y bytes. Thus, the extents for
extent occupied by the file . Typically, the file extents of a that particular dummy file will have been created , but the
particular file will not necessarily be contiguous . The system 50 extents will not contain the actual data of the file to be
obtains the extents occupied by the file with respect to the backed up . During the backup operation , when there is a
volume storing the file . Consider, as an example , that the read of the actual data the read will be from the source .
particular file to be backed up occupies ten extents . The When , however, there is metadata associated with the
system creates or maintains a map that includes a starting backup file to read , the read will be from the dummy file or
offset and the length for each extent of the ten extents . 55 disk .
In a step 430 , the backup manager creates a stream or The total number of files in the source volume may be
backup stream from the virtual volume. In a step 435 , the different from the total number of template files in the virtual
backup manager reads from the stream to identify a file volume. For example , the total number of files in the source
extent. In a step 440 , the backup manager determines volume may be greater than the total number of template
whether the file extent is identified in the map . In a step 445 , 60 files in the virtual volume when , for example , only a subset
if the extent is not in the map the backup manager accesses of the files in the source volume are to be backed up .
the template virtual volume to backup a data block associ- Directories are created on the local virtual volume that
ated with the template virtual volume. Alternatively, in a step replicate or match the exact directory structure of the given
450, if the extent is in the map the backup manager accesses file path of the file to be backed up . The template generator
the file to backup a data block of the file . The creation of the 65 creates within the virtual volume a directory structure that
virtual volume container and template file facilitates the matches a directory structure of the file (or volume storing
backing up of metadata (e.g., directory structure informa- the file ) to be backed up .
US 10 ,078 ,555 B1
li
In a step 530 , the backup manager obtains location and When the read is identified to be in regions 625 (dotted
extent information of the file to be backed up . The file pattern ), the read is done from the actual file to be backed up .
extents associated with the backup file are gathered and The actual read is not done from the dummy volume but is
stored or updated in a known extents table . done from the actual file on the snapshot of the source
Table A below shows a specific embodiment of a flow for 5 volume. In other words, the actual read is done on the file
creating the template virtual disk and volume with files . that is remote from the dummy volume, i.e ., the file on the
client to be backed up . This process involves conversion
TABLE A from virtual cluster numbers (VCNs) to logical cluster
numbers (LCNs) ( TLFB to TFRB ). This data has to be
STEP DESCRIPTION 10 eventually read from the file residing in the source volume
Create VHD /VHDx container on the client machine equal to the ( TFRB to SFRB ). This conversion is achieved by file /block
source volume disk extents mapper engine shown in a flow 705 of FIG . 7 .
Mount the VHD /VHDX As shown in FIG . 7 , in step 710 , a source file list is
Create volume identified . The list includes the one or more files to be
Query volume offset relative to disk (a ), e. g., store the volume 15 backed up . In a step 715 , source file /block extents occupied
offset value in a variable (a )
vaaAWN Create the directory and the file , e.g ., create template file in the by the source files are gathered . In a step 720, the file/block
VHD /VHDx container mapper engine is called to convert from VCNs to LCNs. A
Set attributes and security information process of the conversion is shown in the example below .
Get free clusters of the source 1 Target File Relative block : ( 1 MB , 512 ), (3 MB , 1024 ) ,
Reserve file clusters for the largest file size possible on the
source volume 20 (5 MB , 512 )
o
Query extents reserved for file (b ), e. g., store the value in a 2 Source File Relative Blocks : (0 , 512), (512 , 1024),
variable (b ) ( 1536 , 512 )
10 Add (a ) offset to list (b ) and store it (c), e.g., store the sum in a Here , a first value in the pair represents offset. A second
variable (c )
11 Sync the file system , e .g., synchronize the file system of the value of the pair represents the length . 1 represents target
VHDAVI
VHD /VHDx container to the file system of the source volume. 25 volume extents , and 2 represents the corresponding file
extent.
Synchronizing the file systems involves creating or gen FIGS. 9 - 10 are block diagrams showing the extents (in
cluding starting offset and length ) of a particular file , e . g.,
erating a file system structure that is similar to that of the file 1 . The extents of the particular file may be spread across
source volume. Consider , as an example, a database filele toto 30 the source file system and will include different offsets . FIG .
be backed up resides in the directory 9 shows file blocks in the target volume. FIG . 10 shows
" C :\Microsoft \database\ filel .” A similar file system is then target file relative blocks . FIG . 11 shows file blocks in the
created within the dummy virtual disk ( e . g ., VHD / HDx source volume. FIG . 12 shows file relative blocks in the
container ) that includes the same directory structure , e . g ., source volume. In the example shown in FIGS. 9 - 10 , file 1
Microsoft \database . A dummy file , e. g., dummy _ filel, is 35 includes first, second, and third extents 910 , 915 , and 920 ,
then created within the Microsoft\database folder that has a respectively . Each extent has been filled with a particular
size equal to “ file1." Using the dummy virtual disk and file pattern to identify the extent. For example , first extent 910
during the backup operation facilitates recoveries by helping is drawn with a pattern of slanted lines . Second extent 915
to ensure that the particular backup file includes a file is drawn with a pattern of dots. Third extent 920 is drawn
structure similar or identical to the source volume. In a 40 with a pattern of cross-hatches .
specific embodiment, the file system structure thus created query is executed to identify the extents occupied by the
in the template volume is brought to a consistent state by particular file to be backed up . The query results including
flushing file buffers . This can be achieved by system calls the starting offsets and lengths are received and stored in a
APIs such as fsync (UNIX ), or FlushFileBuffers (Windows). map . The extents may be arranged so that they present in a
FIGS. 6A - C are block diagrams showing the template or 45 contiguous location in the actual destination volume. For
dummy virtual disk . In particular , FIG . 6A shows a virtual example , in the target file relative block , there is the first
disk 605 as created on the local client . In a specific embodi extent which is immediately followed by the third extent,
ment, the virtual disk includes a VHD /VHDx format. FIG . starting at offset 16 . Thus, even though the extents may be
6B shows a volume610 created within the virtual disk . FIG . spread across different offsets , they are read from a particu
6C shows various sections of the virtual disk and volume. 50 lar location and placed in a contiguous location in the actual
Sections 615 having a pattern of vertical lines represent disk destination .
information or virtual container/volume information . Sec - Referring now to FIG . 7 , in a step 725 , the file/block and
tions 620 having a pattern of diagonal lines represent the volume backup stream generator streams 730 a VHD /VHDX
volume information of the virtual disk . Sections having a stream to a backup media 735 . More particularly, once a
pattern of dots 625 represent the extents of the template file 55 backup is initiated , the backup manager examines or ana
corresponding to the file to be backed up . lyzes an extent to determine whether the extent lies within
More particularly, after creating a template or dummy the extents occupied by the file to be backed up . If the extent
virtual disk similar to the volume containing the file to be lies within the extents occupied by the file , the data block
backed up , extents of the file in the source volume are associated with the extent is then copied (e .g ., backed up ).
located and stored in a vector of file extents. During the 60 This process includes converting the extent to the logical
backup process while the dummy virtual disk is read , if the file extent of the file on the actual source file system to
read happens to be in the region having the pattern of backup those particular blocks of the file . That is , there is a
vertical lines (615 ), it would correspond to the disk infor - conversion of the data to be read from the actual extent to the
mation which is read from the dummy disk as it is . So is the file extent present in the volume. The mapper engine con
case when the read is in the region depicted as having the 65 verts the actual data to be read with respect to the current
pattern of slanted lines (620 ). This is the volume information volume offset to identify where exactly the data has to be
of the virtual disk . read to the backup file extents occupied by the current file to
US 10 ,078 ,555 B1
13 14
be backed up . If the offset resides in that particular file block , block since the last or previous backup , or both . Copying
the backup manager reads from that particular file block . If only the changed blocks of the desired file to backup is
the extent is outside of the extents occupied by the file , the especially advantageous where the file is very large and
copy is made from the template or dummy virtual disk , e . g ., constantly or continuously grows or increases in size as time
read from dummy VHDx. After the backup process is 5 progresses. It is desirable to backup only incremental file
complete , the dummy VHDx can be deleted . changes because sub - file backups are faster than performing
FIG . 8 shows another example of a flow 805 for process an entire volume block transfer . In addition to conserving
ing the backup of a file to a VHD /VHDx format stream . A network bandwidth , storage space on the backup media is
also saved because only changed blocks of a file may be
step 810 indicates a start of a read . In a step 815 , the volume
de 1010 backed up . Once a particular file has been backed up , it is
offset and length are obtained . In a step 820 , a check is made
as to whether the offset is within the file boundary. If the desirable to be able to perform subsequent incremental file
offset is not within the file boundary , in a step 825 , a read is backups to backup changes to the file .
made of the volume ( e . g ., the virtual volume). The read For example , as shown in FIG . 2 , the backup manager
operation then ends 830 and may loop back 835 to the start with the assistance of the changed block tracker and filter
file 1515 has backed up to the backup media incremental changes
of a next read . Alternatively, if the offset is within the file
boundary , in a step 840, a determination is made of the file made to file B . In particular, first child saveset 275A includes
offset from the volume offset. The flow then proceeds to a particularly first incremental backup file B taken at time T1. More
step 845 in which data is read from the remote backup file , the first child saveset includes changes made to
from the file offset ( e .g ., a read of the data block of the file file B between a time of the previous backup ( e. g., TO ) and
to be backed up ) . 20 T1 . Blocks of file B that have not changed between time TO
Table B below summarizes some steps involved in the file and T1 may not be included with the saveset . In other words,
and volume backup stream process . the unchanged blocks of the file may be excluded from the
saveset.
TABLE B Second child saveset 275B includes second incremental
25 backup file B taken at time T2 .More particularly , the second
STEP DESCRIPTION child saveset includes changes made to file B between a time
of the previous backup (e . g ., T1) and T2 . The previous
Ter a Create block based backup stream from backup container
Read from stream
Check if file extents are present in stream boundary (i.e ., check
whether the data currently to be backed up resides in the
boundary of the files identified for backup )
backup may be a last backup or a backup immediately before
the current backup . Blocks of file B that have not changed
30 between time T1 and T2 may not be included with the
saveset. Thus, subsequent incrementals may include a dif
If yes: ferencing disk that includes one of a block from a current
i) Convert TLFB to TFRB ( Target Logical File Blocks to Target
File Relative Blocks ) version of the file that has changed with respect to a previous
ii ) Convert TFRB to SFRB ( Target File Relative Blocks to version of the file or a newly occupied block from the
Source File Relative Blocks ) 35 current version of the file , where the newly occupied block
iii ) Read from source file is not in the previous version of the file .
8 Else :
i) Read from backup container volume FIG . 13 is a simplified block diagram showing the overall
(
Write to target stream in sequential order process of a container based mountable single file backup
4 End backup including an incremental file backup . There is a client 1305
) Commit various attributes to the media for subsequent 40 and a backup media 1310 . At a time T0 a full backup 1315
incremental backups such as
a . VOLUME_ SIZE of a particular file is performed . To perform the full backup ,
b . VOLUME_ START _ OFFSET
c . VDISK SIZE
there can be a volume snapshot 1320 of the source volume
d . VDISK _ SECTOR _ SIZE which includes data of a file 1325 for backup .
e. VDISK _ LOG _ SECTOR _ SIZE A template or dummy virtual disk 1330 is created along
f. FILE EXTENTS : ex : 4 MB: 16 KB , 16 MB, 32 KB 45 with a dummy volume 1335 and dummy file 1340. In other
g . SFILE SIZE : 48 KB words, a backup volume is created on the client machine
h . SFILE MAX SIZE : 10 GB with the required parameters for which the file is to be
i. RELATIVE _PATH _ ON _ TARGET : .. ..\10 . 31 \GUID backed up . In particular, in a specific embodiment, a virtual
disk of a dummy VHD /VHDx container is created at the
Some benefits of the system include the ability to mount 50 client. An NTFS or ReFS volume is created on the disk . A
the resulting backup image directly using, for example , the dummy file is created within the volume that represents the
standard Microsoft Windows VHD / HDx mount API; sup - file to be backed up . The dummy file , however, is not a copy
port for any target media in addition to disk -based as the of the file to be backed up because the dummy file will not
backup is stream -based ; support for file level restores in the include the file data .
case of a non -disk medium such as tapes provided extents 55 File blocks are redirected 1337 depending on whether the
are known; instant access of the backup file to the host in a backup of a particular block is associated with metadata
native file system with recovery being instantaneous; no (e. g., virtual disk information , or volume information of the
need to hop through for recovery ; and the backed up file can virtual disk ) or actual data of the file . If the block is
be exposed directly to any host to help ensure that recovery associated with metadata , the backup reads from the virtual
time objectives (RTO ) are met. 60 disk . If the block is associated with actual file data , the read
Referring back to FIG . 2 , a feature of the system further is from the source volume. The block is then streamed in a
provides for incremental backups at the file or sub - file level. container stream 1345 to the backup media and stored as a
In other words, rather than performing an entire volume full backup or saveset 1350 . The full saveset may be referred
block transfer , only a file , its incremental file blocks, or both to as a parent.
may be copied . Incremental file blocks are only those 65 In other words, during the container streaming of the
changes from a last backup . Changes may include a block VHD /VHDx container, the system interprets, analyzes, or
that has changed since the last or previous backup , a new examines a particular extent of the VHD /VHDx stream . If
US 10 ,078 ,555 B1
15 16
the particular extent is associated with the dummy file, rather order to perform a backup at the file or sub - file level. More
than reading from the dummy file , the system reads from the particularly , the changed blocks in the subset are changed
file data that is residing on the volume snapshot. blocks occupied by the particular file of interest on the
A snapshot of the volumemay be taken to initiate changed volume to be backed up . The set of changed blocks may be
block tracking of the volume. After time T0 , changes may be 5 filtered where the filter criteria includes information identi
made to the file. For example, information may be added to fying blocks associated with the file of interest. Changed
the file , deleted or removed from the file , modified or altered blocks associated with the file of interest are included in the
in the file, or combinations of these . Ata time T1 , after time subset of changed blocks. Changed blocks not associated
TO , an incremental file backup 1355 is performed . To with the file of interest are excluded from the subset and may
perform the incremental backup , there can be another vol- 10 not be streamed or backed up to the backup storage server.
ume snapshot 1360 of the source volume which includes A child VHD /VHDx is created for the subset of changed
changed file data 1365 . blocks and chained or linked to the parent VHD /VHDx file .
During an incremental backup a dummy file correspond - Consistency with respect to a particular file can be main
ing to the file to be incrementally backed up does not have tained because the parent (or full) saveset and child (or
to be created because the system stores or can determine 15 incremental) saveset will include blocks associated with that
exactly where the file starts in the full or parent VHD /VHDx . particular file . Blocks not associated with that particular file ,
In other words, the structure associated with the file has been such as blocks from other files not of interest , will be
stored in the previous full backup of the file . Thus, the data excluded or omitted from the parent and child savesets .
blocks to be streamed in an incremental backup can include FIG . 14 shows an overall flow 1405 for an incremental
the blocks of the file and blocks associated with metadata of 20 backup of a file . In a step 1410 , previousbackup information
the file (e . g ., directory structure information , disk informa- metadata is obtained . The previous attributes obtained may
tion , or volume information ) can be excluded from the include the source volume size , volume start offset, and file
stream . extents relative to the volume. In a step 1415 , the system
The set of changed blocks since the previous backup of obtains the parent container on the target. For example , the
the file at T0 are filtered to identify changed blocks associ- 25 system may obtain an identifier associated with the parent
ated with the file and exclude other changed blocks of the saveset on the backup media .
volume not associated with the file to be backed up. The In a step 1420 , the backup manager determines the
changed data blocks of the file are streamed in a container changed blocks since the previous backup . In a step 1425 ,
stream 1370 to the backup media and stored as an incre - the backup manager identifies blocks that correspond to the
mental VHD / HDx in an incremental backup or saveset 30 file . More specifically , a listing or identification of changed
1375 . The incremental saveset may be referred to as a blocks is provided by the changed block tracker. The backup
" child ” and is linked or associated 1380 to the full or manager obtains a changed block bitmap for the file , the
" parent” saveset. current size of the file , and generates incremental file relative
More particularly, as discussed above , the changed block blocks relative to previous backup .
tracking driver tracks changes to particular volume. How - 35 The steps for the incremental backup may be as shown in
ever, not all the changed blocks of a volumemay be relevant Table B above . In particular, the incremental backup work
to an incremental backup of a particular file because the flow process includes a conversion from virtual cluster
volumemay include other files that have also been changed . numbers (VCNs) to logical cluster numbers (LCNs) . The
The system can identify the blocks occupied by the particu - logic for this conversion is the same as the conversion logic
lar file to be backed up and backup only those changed 40 used for a full backup of a file . However, rather than backing
blocks . up all the blocks occupied by the file ( or files ), changed
Other changed blocks of the volume identified by the blocks returned by the CBT volume filter driver, are scanned
CBT driver that may be associated with other files not of to identify the modified blocks corresponding to the file.
interest may be excluded from the incremental backup of the Only these blocks are used to create a differencing disk of
particular file ( or files ) of interest. In a specific embodiment, 45 the previous virtual disk , thus creating a chain of differenc
identifying the blocks occupied by the particular file ing disk containing the delta changes .
includes calling or querying an API ( e . g ., Windows API ) to In a step 1430 , the backup manager creates a link between
obtain the extents occupied by the particular file . The system the child virtual container with the parent container . In a step
can then perform a comparison or cross -referencing between 1435 , the backup manager streams the child virtual con
the information returned by the API regarding the extents 50 tainer data and file data to the target ( e . g ., backup media ). In
occupied by the particular file and the set of changed blocks particular, the backup manager prepares an incremental
of the volume identified by the changed block tracking target volume stream in reference to a previous backup , links
driver to identify which changed blocks are associated with to the previous backup , and places file blocks as appropriate
the particular file . with respect to the target stream .
In a specific embodiment, a child VHD /VHDx file is 55 FIG . 15A shows a more detailed flow 1505 of a technique
created on the backup media target for the parent VHD / for an incremental file container backup stream process . The
VHDx file which was saved during the previous full backup flow shows the steps to merge delta file blocks into an
of the file . Blocks of the file that have changed since the last incremental target volume stream . In a step 1510 , the system
backup (e . g ., last full or incremental backup ) are stored as obtains the target volume blocks. In a step 1515 , the system
part of the child VHD /VHDx . In other words, in a specific 60 obtains target file relative blocks . In a step 1520 , the system
embodiment, a method includes receiving from the CBT creates a target volume stream . In a step 1525 , the target
information identifying a set of changed blocks. The set of volume extents are read . In a step 1530 , a check or deter
changed blocks are blocks on a volume that have changed mination is made as to whether the volume offset is within
since a previous backup. The previous backup may be a full the file boundary . In a step 1535 , if the volumeoffset is not
or incremental backup . 65 within the file boundary , the system reads from a zero
The CBT driver tracks the changed blocks at the volume stream . Alternatively, in a step 1540 , if the volume offset is
level. A subset of the set of changed blocks is identified in within the file boundary the system determines the file
US 10 ,078 ,555 B1
17 18
relative offset from the volume offset. In a step 1545 , a seek a synthetic full operation at a later date . The merge operation
and read is performed from the snapshot file . In a step 1550, of existing full and incremental block based file backups
the incremental backup ends. may be done " on the fly ” (i.e., during runtime of the backup
Some benefits of the system with regard to incremental operation ) and can be streamed to any backup media such as
backups include very fast incremental backups because only 5 tape or again to a disk . Block level file restores can be done
changed files are backed up rather than the entire volume; from the synthesized full backup. Individual file level
granular restores from incremental backups; instant access restores can be done from the synthesized full block level
of a file or set of files with recovery is instantaneous or near file backup . Cloning to a different target and stage to a
instantaneous if the target is disk ; a backup method that is different target can be performed of the synthesized full
also suitable for a sequential backup medium such as 10 block level file backup . A data domain native virtual syn
tape- based backups ; and optimized or improved backup for thetics feature may include not reading from the existing full
deduplication targets. and incremental file backup only offsets are rebased to the
Referring now to FIG . 2, a feature of the system further new synthetic full. This enables fast synthetic full.
provides for artificially creating at the backup storage server Full and incremental file changes may be scattered across
a current full backup of a file . The artificially created full 15 multiple backup copies. In a specific embodiment, an arti
backup may be referred to as a synthetic full backup . ficial or synthetic full backup of a file is created by inspect
Synthetic full generator 280 can merge a previous full ing each of the backup copies and merging those . The virtual
backup of a file ( e.g., a full backup of file B taken at time TO ) disk format allows changes to be represented within the
with one ormore incremental backups of the file ( e .g ., first format itself in terms of sector bitmap and Block Allocation
incremental backup of file B taken at time T1 , second 20 Table (BAT) .
incremental backup of file B taken at time T2, or both ) to The artificial full backup of the file can be created without
create a synthetic full backup of file B . In other words, the altering the backup copies . Consider, as an example, a
synthetic full generator may merge a previous full backup of scenario where there is a full backup ( e .g ., parent VHDx )
a file and one or more incremental backups of the file to followed by two incremental backups (e . g ., two differencing
create a synthetic full backup of the file. 25 or child VHDXs '). A synthetic full operation is performed
Techniques for synthesizing full backups are applicable involving the parent and child VHDxs '. The synthetic full
and advantageous in backup systems that provide for incre - operation , however, does not alter ormodify the parent disk .
mental backups . For example , over time there may be an After the synthetic full operation , the parent VHDx is still
increasing number of incremental backups stored at the available . In other words , the parent VHDx before the
backup storage server ( or other centralized server ). These 30 synthetic full operation may be the same as the parent VHDx
backup copies are dependent savesets . That is, they depend after the synthetic full operation . The parent VHDx before
on the previous backup copy and cannot be recovered the synthetic full operation may be identical to the parent
separately or without the previous backup copy. The number VHDx after the synthetic full operation . In a specific
incremental backups is inversely proportional to recovery embodiment, the parent VHDx (or a copy of the parent
performance . Thus , as the number of incremental backups 35 VHDx ) is preserved during the synthetic full operation .
increases the restore performance decreases . Further , the Preserving the parent VHDx ( or a copy of the parent VHDx )
management of separate incremental savesets in the media allows for intermediate recoveries. For example , after a
( e. g ., managing retention periods and expiration times and synthetic full operation involving the parent VHDx and the
dates ) becomes cumbersome. two child VHDxs, an administrator may perform another
Some advantages of the system shown in FIG . 2 with 40 synthetic full operation to generate another synthetic full
respect to the synthetic full feature includes enhancing backup including the parent VHDx and the first child VHDx ,
restore performance , particularly when a given full backup but not including the second child VHDx.
cycle contains many incremental backups; conserving com - FIG . 15B shows an example of a BAT layout. The BAT
puting resources such as in cases where the remote media is is a region having a single array of 64 -bit values , with an
too slow or is not well- suited to take a periodic full backup ; 45 entry for each block that determines the state and file offset
and facilitating periodic archiving to tape (e.g ., weekly of that block . The entries for the payload block and sector
archiving to tape ). For example, when archiving or sending bitmap block are interleaved in a way that the sector bitmap
to tape , the system may consolidate a full backup and any block entry associated with a chunk follows the entries for
number incremental backups ( e . g ., 1 , 2 , 3 , 4 , 5 , 6 , 7 , or more the payload blocks in that chunk . For example , if the chunk
than 7 incremental backups ). In a specific embodiment, the 50 ratio is 4 , the table ' s interleaving would be as shown in the
synthetic full operation of existing full backups and incre example of FIG . 15B . Other layouts and configurations are
mentalbackups runs on the storage or themedia server — the also possible .
processing is not done at the client node. Running the FIG . 16A is a block diagram showing the structure or
synthetic full operation on the storage or media server layout of a virtual hard disk file such as a VHDx file under
allows the client to perform other tasks ( e . g ., servicing 55 an example embodiment. Although specific example formats
production requests ). In another specific embodiment, the and configurations are shown , it should be noted that
synthetic full operation may be run on the client if desired . embodiments are not so limited and other alternative formats
Further benefits of the system include preserving the are also possible .
existing full and incremental block based file backups. In the example shown in FIG . 16A , a VHDx file 1605
Preserving the existing full and incrementals can allow for 60 includes a set of payload blocks (PBs ) 1610 that are each 2
rollbacks to particular points in time. Support is provided for MB . The size of the payload blocks can range from 1 MB to
the creation of an incremental block based file backup before 256 MB. The payload blocks may be located by the BAT
running a synthetic full so that any recent changes are which also forms a part of the layout of the VHDx file . There
captured in the incremental backup. Support is provided for can be sector bitmap blocks which are 1 MB in size and
the creation of only synthetic full file block based backup 65 include pieces of the sector bitmap .
from existing full and incremental block based file backups . A set of file blocks 1615 in the source volume are mapped
Support is provided for immediate creation or for scheduling 1620 to the payload blocks. In this example , a first extent
US 10 ,078 ,555 B1
19 20
1625A ( shown with a pattern of slanted lines ) is mapped to payload block , the extent of the first payload block thus
PBO 1630A . A second extent 1625B ( shown with a pattern having been replaced by the extent of the second payload
of dots ) is mapped to PB1 1630B . A third extent 1625C block .
( shown with a pattern of cross hatches ) is mapped to PB2 M erging may include copying or placing an extent of the
1630C . 5 first payload block and an extent of the second payload block
FIG . 16B is a block diagram showing an incremental into the same payload block , where the same payload block
backup 1640 of file blocks 1645 of the file to an incremental is the merged payload block and the extent of the second
payload block overwrites the extent of the first payload
VHDx file . In this example, an incremental backup included block
extents 1650A and 1650B . Extent 1650A is mapped to PB1 in the merged payload block . Merging may include
1655A of a set of payload blocks 1660 of the incremental an extentorofplacing
to 10 copying
the
an extent of the first payload block and
second payload block into the same payload
VHDx file . Extent 1650B is mapped to PB2 1655B . The block , where the same payload block is the merged payload
incremental VHDx file is linked 1665 to the parent backup . block and the extent of the second payload block does not
FIG . 17 shows a flow 1705 for creating a synthetic full
more 15 overwrite
backup of a file using a full file backup and one or more payload
the extent of the first payload block in the merged
block .
incremental file backups . In a specific embodiment, the Themerged payload block may be a payload block that is
technique includes a single pass approach that generates a maintained , distinct, or stored separate from the full and
single target stream which contains the merged data of the incremental backups. For example , the merged payload
previous full backup and its changed blocks in a sequential block may be stored in a file ( e .g ., synthetic full backup file )
manner, which can then be streamed to any backup media . 20 that is separate from the first or original full and incremental
The system identifies the merged data zones from the entire backup files. Maintaining or storing the merged payload
chain . Since the VHDx is itself described in termsof payload block separate from the full and incremental backups allows
blocks, a technique of the system first determines what for intermediate recoveries. For example , after a synthetic
payload blocks needs to be merged in the entire chain of full operation involving a full backup and one or more
backups . The merge granularity is a payload block which 25 incremental backups of a file to generate a synthetic full
can vary from 1 MB to 256 MB . This technique provides for backup of the file , the file may be recovered to its first full
merging one payload block at a time and then proceeding to backup even though there may have been one or more
the next. incremental backups . The system may maintain or store a
More particularly , in a step 1710 , the system ( e . g ., syn synthetic full backup of a file and a first or original full
thetic full backup generator 280 — FIG . 2 ) access a first 30 backup of a file along with any number of incremental
virtual hard disk file . The first virtual hard disk file corre - backups.
sponds to a backup of a file from a source volume at a time In a step 1730 , upon or after the payload block merging,
TO . The first virtual hard disk file includes a first set of the merged payload block is streamed for storage as a
payload blocks to store data associated with the backup of synthetic full backup of the first and second virtual hard disk
the file . The backup may be a full backup of the file . 35 files. The process may then loop 1732 back to perform
In a step 1715 , the system accesses a second virtualhard another determination for a next payload block of the first set
disk file . The second virtual hard disk file corresponds to an of payload blocks.
incremental backup of the file from the source volume at a Alternatively, in a step 1740 if there is no corresponding
time T1, after time TO . The second virtual hard disk file payload block of the second set of payload blocks having
includes a second set of payload blocks to store data 40 changes to be merged into the payload block of the first set
associated with the incremental backup of the file . of payload blocks, the payload block (or a copy of data in the
In a step 1720 , a determination is made for whether a payload block ) of the first set of payload blocks is streamed
payload block of the first set of payload blocks ( or first for storage as the synthetic full backup of the first and
payload block ) and a payload block of the second set of second virtual hard disk files . More particularly , in a specific
payload blocks ( or second payload block ) should be merged . 45 embodiment, for the child differencing disk (e . g ., the second
Payload blocks may be merged when , for example, there is virtual hard disk file corresponding to the incremental
a corresponding payload block of the second set of payload backup ), if there are no changes then there would be no
blocks (i.e., the incremental backup) having changes (e .g ., payload blocks corresponding to them . Hence , if the payload
new data ). The determination may include scanning or blocks of the incremental are not present, during the merge
searching for a payload block of the second set of payload 50 process , blocks are taken from the last non - empty payload
blocks that corresponds to the payload block of the first set block in the chain (e .g ., first virtual hard disk file ). In a
of payload blocks . specific embodiment, empty payload blocks indicate that no
In a step 1725 , if the second set of payload blocks includes changes have been made. The process may then loop 1745
a corresponding payload block having changes, the payload back to perform another determination for a next payload
blocks are merged to form a merged payload block . Data 55 block of the first set of payload blocks.
from the payload block of the first setof payload blocks may FIGS. 18 -21 are block diagrams showing an example of
be merged or combined with data from the corresponding a synthetic full file merge . Specifically, FIG . 18 shows a full
payload block of the second set of payload blocks. In a backup of a file at a time TO . Extents 1805A , 1805B , and
specific embodiment, the merging is performed without 1805C have starting offsets and lengths as shown in FIG . 18
altering or modifying the first and second virtual disks ( or 60 have been streamed 1810 in a first full file backup to a parent
copies of the virtual disks) so as to allow for intermediate VHDx file 1815 . Extent 1805A has been drawn with a
recoveries . Merging may include copying or placing an pattern of slanted lines . Extent 1805B has been drawn with
extent of the first payload block and an extent of the second a pattern of dots. Extent 1805C has been drawn with a
payload block into the same payload block . Merging may pattern of cross hatches.
include copying or placing an extent of the second payload 65 The parent VHDx file includes a parent set of payload
block into a merged payload block and not copying or not blocks . Extent 1805A is stored in a payload block (PB 0 )
placing an extent of the first payload block into the merged 1820A of the parent. Extent 1805B is stored in a payload
US 10 ,078 ,555 B1
21 22
block (PB 1 ) 1820B of the parent. Extent 1805C is stored in FIGS. 18 -21 show an example ofmerging a full backup
a payload block (PB 2 ) 1820C of the parent. with two incremental file backups to create a synthetic full
FIG . 19 shows a first incremental backup of the file at a file merge . It should be appreciated , however , that any
time T1, after time TO. An extent 1905A having a starting number of incremental file backups may be merged with a
offset and length as shown in FIG . 19 has been streamed 5 full backup . For example , there can be one, two, three , four,
1910 in a first incremental backup to a first child VHDx file five , six , seven , eight, nine, ten , or more than ten incremental
1915 . Extent 1905A has been drawn with a pattern of backups of a file that are merged with a full backup of the
horizontal lines . file to create a synthetic full file merge of the file .
The first child VHDx file includes a first child set of Depending upon factors such as the type of backup media ,
payload blocks . Extent 1905A is stored in a payload block 10 computing resources available , and other factors a synthetic
(PB 1 ) 1920A of the first child incremental backup . The full file merge may be performed as soon as the first
remaining payload blocks of the first child set of payload incremental backup of the file is made , after a threshold
blocks may be empty or not present, thus indicating that no number of incremental backups have been made, periodi
changes with respect to those payload blocks have been cally ( e .g ., weekly ), or on demand . In a specific embodi
made. 15 ment, an incremental backup is performed in which changed
FIG . 20 shows a second incremental backup of the file at blocks associated with a file are obtained . In this specific
a time T2 , after times T0 and T1 . An extent 2005A having embodiment, rather than creating a child VHDx , the
a starting offset and length as shown in FIG . 20 has been changed blocks are merged with a previous full or parent
streamed 2010 in a second incremental backup to a second backup to artificially create a current full file backup . The
child VHDx file 2015 . Extent 2005A has been drawn with a 20 newly synthesized full file backup then includes original
pattern of grid lines . unchanged blocks from the parent backup and new incre
The second child VHDx file includes a second child set of mental or changed blocks. Thus , recovery of the file does not
payload blocks. Extent 2005A is stored in a payload block have to depend on any previous incremental backups.
(PB 0 ) 2020A of the second child incremental backup . The As discussed above, a technique of the system includes a
remaining payload blocks of the second child set of payload 25 single pass approach that generates a single target stream
blocks may be empty or not present, thus indicating that no which contains the merged data of the previous full and its
changes with respect to those payload blocks have been changed blocks in a sequential manner, which can then be
made. streamed to any backup media . Merged data zones from the
FIG . 21 shows a synthetic full file merge of the full and entire chain are identified . Since the VHDx is itself
incremental backups at a time T3 , after times TO, T1 , and T2 . 30 described in terms of payload blocks , the method first
In this example , extents 1905A , 1805B , 2005A , and 1805C determines what payload blocks need to be merged in the
are included in a virtual hard disk file representing a syn - entire chain of backups. The merge granularity is a payload
thetic full backup based on the full backup, first incremental block which can vary from 1 MB to 256 MB according to
backup , and second incremental backup . Extent 1905A is the VHDx specification . This method merges one payload
from PB 1 1920A of the first incremental backup). Extent 35 block at a time and proceeds to the next.
1805B is from PB 1 1820B of the full backup . Extent 2005A FIG . 22 shows an example of a new VHDx stream which
is from PB 0 2020A of the second incremental backup . contains a new header, region , log, and merged BAT which
Extent 1805C is from PB 2 1820C of the full backup . is streamed first to the new file , i.e., the synthesized file . The
More particularly , payload blocks PB 1 1820B from the BAT is a region listed in the region table and includes a
full backup and corresponding PB 1 1920A from the first 40 single contiguous array of entries specifying the state and
incremental backup have been merged to form a merged the physical file offset for each block . The entries for
payload block PB 1 2105 in the synthesized full backup file . payload blocks and sector bitmap blocks in the BAT are
Payload block PB 1 2105 includes both extent 1905A and interleaved at regular intervals . Any updates to the BAT may
extent 1805B . In this example , however, extent 1805A bemade using the log to ensure that the updates are safe to
stored in PB 0 1820A of the full backup is not included in 45 corruptions from system power failure events.
PB 0 2110 of the synthesized full backup file because it has The new merged BAT table includes offsets relative to the
been replaced or overwritten by extent 2005A from corre - new target file which will be eventually streamed to the new
sponding payload block PB 0 2020A of the second incre - synthesized file once the new empty VHDx file is streamed
mental backup . A payload block PB 2 2115 of the synthetic out to the target. The new merged BAT table is prepared by
full file merge includes extent 1805C from payload block PB 50 inspecting the BAT entries of each of the backup starting
2 1820C of the full file backup which has not changed from full backup to N - 1 incremental chain . Ifthere is a BAT
In a specific embodiment, the size of a payload block of entry that contains a non - zero offset that means the payload
a VHDx file can range from 1 MB to 256 MB, and the size block which the index corresponds to needs to be merged .
of a sector bitmap block is 1 MB . Since , 1 Byte = 8 bits a 1 FIG . 23 shows a detailed view of a process 2305 to
MB sector bitmap can represent 8x1024x1024 ( 2 ^ 23 ) sec - 55 determine common payload blocks across an incremental
tors. The size of the logical sector is typically 512 or 4096 chain and generate a new BAT table . In a step 2310, the
bytes. Each payload block includes multiple logical sectors. system prepares an empty VHDx header , log , BAT, and
In FIG . 21, extents 1905A and 1805B represent changed metadata section . In a step 2315 , the system notes or
sectors which do not overlap . Hence , after merging PB 1 identifies the current offset, initializes the new BAT table ,
both the sectors are shown in payload block PB 1 2105 of the 60 and initializes the merged indexes array . In a step 2320 , the
synthetic full backup file . Extents 1805A and 2005A repre - system loops through each entry up to the number of entries
sent changed sectors which do overlap . Hence , after merging in the BAT table . In a step 2325 , the system starts with the
PB 0 , the sector represented by extent 1805A has been first incremental and loops through the incremental chain . In
replaced with the sector represented by extent 2005A which a step 2330 , the system checks if the BAT entry for the chain
is more recent than the sector represented by extent 1805A 65 is non zero . In a step 2335 , if the BAT entry is non zero , the
and payload block PB 0 2110 of the synthetic full backup file system loops 2337 back to step 2325 . Alternatively , if the
does not include the overwritten data . BAT entry is non zero , in a step 2340 the system sets the
US 10 ,078 ,555 B1
23 24
corresponding entry in the new BAT table with the current backup . The payload blocks which need to or should be
offset , adds the index to the merged indexes array , and merged are shown with a pattern of slanted lines.
advances the current offset by the block size. The process FIG . 27 shows a flow 2705 for distributed stream extents
then loops back 2345 to the top of the loop . representing merged payload block areas across an entire
FIG . 24 shows the design elements and constructs used 5 incremental chain . In a step 2710, the system obtains a
during the merge process. In a specific embodiment, the current payload block number from a current position . In a
distributed stream extent includes the values Int64 _ t start; step 2715 , a determination is made as to whether it is a next
Int64 _ t length ; and Int64 _ t id . An array of file descriptors to payload block . If yes, in a step 2717 , the system prepares a
read from the oth entry includes the full backup and the N - 1 final distributed stream merge extents for payload block . In
entry includes a descriptor for the last incremental chain . 1° a step 2718, the system resets the top index to zero .
There is an array of BAT table to prepare the distributed Alternatively, if it is not the next payload block , in a step
stream extents . A oth entry includes a full backup 's BAT 2725 , the system takes a minimum length of extent length
table . An N - 1 entry includes the BAT table for the last and count. In a step 2730 , a determination is made as to
incremental chain . whether the process conforms to a particular architecture of
FIG . 25 shows a flow 2505 for determining merged the backup system . In a specific embodiment , the particular
payload blocks block -by -block from the merged BAT table . architecture includes a Data Domain (DD ) architecture as
In a step 2510 , the system prepare initial distributed stream provided by EMC Corporation of Hopkinton , Mass . If the
extent list for each payload block , e.g., (0 ,512 ,0 ) (512 ,512,0 ) architecture is not a DD architecture , in a step 2733, the
( 1024 ,512 ,0 ) . . . . In a step 2515 , the system loops through 20 system sets the position and reads from the stream which
the first incremental to Nth incremental. In a step 2520, the denotes the index and writes it . Alternatively , in a step 2735 ,
system reads a sector bitmap corresponding to the payload the system rebases the current stream range to the file
block . In a step 2525 , the system converts the bitmap into pointed by the index and subtracts the count. If in a step
extents of sector size . In a step 2530, the system shortens the 2740 the count is not zero , the system advances 2745 to the
list if adjacent extents are contiguous and have the same 25 top index counter if the extent is consumed . Alternatively , if
index . In a step 2535 , the system adds the absolute base theTable count is zero , the process ends.
D below shows a flow of a specific embodiment of
offset of the payload block for each of the entries in the list.read and merge steps during streaming a merged payload
If offset is zero then the index is set to - 1. In a step 2540, block
for each extent found, the system sets the equivalent backup 30 syntheticfor merge
a first type of synthetic merge. This first type of
may be referred to as a regular synthetic
index in the distributed stream extent list. The process then merge .
loops 2545 back to step 2515 .
Table C below shows a flow of a specific embodiment for TABLE D
determining merged payload blocks.
STEP DESCRIPTION
TABLE C Set bytes to read to payload block size
Get the target file position
STEP DESCRIPTION
Get the next merged BAT index from the merged BAT indexes
WN Loop until the entire payload block is merged
Get the top index extent from the final merged array generated
array for this payload block zone.
N
N
Prepare distributed stream extent array in offsets of 512 which 40 3 (b ) Take minimum of extent length and bytes to read .
points to the base full backup , e.g., (0 , 512 , 0 ), (512 , 512 , 0 ), 3(c) Get extent index to which stream it belongs in the incremental
( 1024, 512 , 0 ) up to (2 MB -512, 512 , 0 ). chain .
Start with first incremental and loop up to N - 1 incremental 3 (d ) Get the corresponding stream object from the stream table array
ww3 ( a ) Get corresponding stream object and BAT table from the global
object table
3(e )
3 (f)
Set the stream position
Read the minimum length fully
3 (b )
??3 (c )
Get the sector bitmap offset identified in step (1 ) 33 ((gh )) Write it to the target file at the target position
Advance the target position by 3 (b )
Read the sector bitmap fully
3 (d ) Convert the sector bitmap into extents of size 512 's 3 (i) Add the extent start offset by 3 (b )
3 (e ) Loop through each of the extent Decrement the extent length by 3 (b )
3 (e ) (i) Get the corresponding extent from the array generated in step (2 ) Decrement the bytes read by 3 (b )
3 (e) (ii) Change the id to point to this stream If extent length is zero then move the top index to the next
AW Loop back to step ( 3 )
Inspect array (2 ) and join adjacent extents if it belongs to the 50 4
distributed stream extent.
End of loop
same stream and create a new final array .
a
Set the absolute payload offset for each of the extent generated
in step (5 ) Table E below shows a flow of a specific embodiment of
a
Check if the absolute payload offset is zero . If zero then change read and merge steps during streaming a merged payload
the id to point to - 1 so that zero's are filled for that range during 55 block for a second type of synthetic merge . This second type
actual read . In some cases, the sub range within a payload
block may not be found in any of the incremental backup chains of synthetic merge may be referred to as a DD native
so it will be pointing to full so the system sets the range to zero . synthetic merge.
This helps to avoid seeks and reads to the base file . It also
increases the speed and the de -duplication performance . TABLE E
60
FIG . 26 is a block diagram showing a merged distributed STEP DESCRIPTION
stream extents of a full backup followed by two incremental 1 Set bytes to read to payload block size
backups. In the example shown in FIG . 26 , the payload Get the target file position
block size is 2 MB and all payload blocks are fully occupied , 3(a ) WN Loop until the entire payload block is merged
Get the top index extent from the finalmerged array generated
e .g ., (0 , 2048K ) for a full backup level . Merged extents 650 for this payload block zone.
within common payload block are in the following format 3(b ) Take minimum of extent length and bytes to read .
“ (start, length , id )” where id refers to which level of the
US 10 ,078 ,555 B1
25 26
TABLE E - continued In another specific embodiment, there is a computer
program product, comprising a non - transitory computer
STEP DESCRIPTION readablemedium having a computer-readable program code
3 (c ) Get extent index to which stream it belongs in the incremental embodied therein , the computer-readable program code
chain .
3 (d ) Get the corresponding stream object from the stream table array
5 adapted to be executed by one or more processors to
3 (e )
??
set ddp _ synthesize extent to current extent start and lengthimplement a method including identifying a file , stored in a
3 (f) call ddp _ synthesize_ file API volume of a client, for backup in a mountable format to a
??3 (g ) Advance the target position by 3(b ) backup storage server, creating on the client a template
3 (h ) Add the extent start offset by 3 (b ) virtual volume that corresponds to the volume of the client
3(i) Decrement the extent length by 3 (b ) 10
in which the file is stored , identifying a set of file extents
3 () Decrement the bytes read by 3 (b )
3 (k ) If extent length is zero then move the top index to the next occupied by the file to be backed up , creating a backup
distributed stream extent. stream from the template virtual volume, if a file extent from
5 End of loop the backup stream is not within the set of file extents ,
accessing the template virtual volume to backup a data block
In the description above, certain embodiments were dis associated with the template virtual volume, and if the file
cussed in the context of a VHD formatted file , VHDx extent is within the set of file extents, accessing the file to
formatted file , or both . It should be appreciated , however, backup a data block associated with the file .
that aspects and principles of the system can be applied to In a specific embodiment, a method for making an incre
other virtual disk formats such as VMDK formatted files 20 mental backup of changes to a particular file includes
( e. g ., VMware virtual disk file ) which may be used in the receiving from a change block tracking (CBT ) module
Linux OS . information identifying a plurality of changed blocks on a
In a specific embodiment, a method includes identifying volume of a client, the changed blocks being blocks of the
a file , stored in a volume of a client, for backup in a volume that have changed since a previous backup of the
mountable format to a backup storage server, creating on the 25 client, filtering the plurality of changed blocks to identify a
client a template virtual volume that corresponds to the subset of changed blocks that are associated with the par
volume of the client in which the file is stored , identifying ticular file , streaming the subset of changed blocks to a
a set of file extents occupied by the file to be backed up , backup storage server for storage as an incremental virtual
creating a backup stream from the template virtual volume, hard disk file, and associating the incremental virtual hard
if a file extent from the backup stream is not within the set 30 disk file to a full backup virtual hard disk file , the full backup
of file extents , accessing the template virtual volume to virtual hard disk file being a full backup of a previous
backup a data block associated with the template virtual version of the particular file .
volume, and if the file extent is within the set of file extents , The method may further include not streaming blocks of
accessing the file to backup a data block associated with the 35 the plurality of changed blocks that are outside the subset of
file . changed blocks. Filtering the plurality of changed blocks
may include identifying a set of extents on the client
The creating on the client a template virtual volume that occupied by the particular file , comparing the identified set
corresponds to the volume of the client in which the file is
stored may include formatting the template virtual volume of extents to the information identifying the plurality of
changed blocks, and based on the comparison , if a changed
with a file system of the volume, creating within the tem - 40 block maps to an extent of the set of extents, determining
plate virtual volume a template file having a size that that the changed block is associated with the particular file .
matches a size of the file to be backed up , and creating Filtering the plurality of changed blocks may include
within the template virtual volume a directory structure that identifying a set of extents on the client occupied by the
matches a directory structure of the file to be backed up . The particular file , comparing the identified set of extents to the
file may be backed up as a Virtual Hard Disk (VHD ) 45 information identifying the plurality ofchanged blocks, and
formatted file or a Hyper- V (VHDx ) formatted file . based on the comparison , if the changed block does not map
Accessing the file to backup a data block associated with to any extent of the set of extents, determining that the
the file may include converting from a virtual cluster number changed block is not associated with the particular file . The
(VCN ) to a logical cluster number (LCN ). The method may file may include a database file .
further include after backing up a last data block of the 50 In a specific embodiment, the method further includes
template virtual volume and a last data block of the file , before the receiving from a change block tracking module
deleting the template virtual volume created on the client. information identifying a plurality of changed blocks , cre
The file to be backed up may include a database . ating a container having a size that accommodates a size of
In another specific embodiment , there is a system for the previous version of the particular file , creating a tem
backing up a file , the system including a processor-based 55 porary file having a size that corresponds to the size of the
system executed on a computer system and configured to : previous version of the particular file , the size of the
identify a file , stored in a volume of a client, for backup in temporary file being less than the size of the container,
a mountable format to a backup storage server, create on the backing up , using the container and the temporary file , the
client a template virtual volume that corresponds to the previous version of the particular file to create the full
volume of the client in which the file is stored , identify a set 60 backup virtual hard disk file , and after the backing up ,
of file extents occupied by the file to be backed up , create a deleting the container and the temporary file , wherein the
backup stream from the template virtual volume, if a file streaming the subset of changed blocks to a backup storage
extent from the backup stream is not within the set of file server for storage as an incremental virtual hard disk file
extents , access the template virtual volume to backup a data comprises not creating another container.
block associated with the template virtual volume, and if the 65 The incremental virtual hard disk file may include a block
file extent is within the set of file extents , access the file to from the particular file that has changed with respect to the
backup a data block associated with the file . previous version of the particular file, a newly occupied
US 10 ,078 ,555 B1
27 28
block from the particular file , the newly occupied block not backup media for storage as the synthetic full backup of the
being in the previous version of the particular file , or both . first and second virtual hard disk files .
In another specific embodiment, there is a system for The method may include streaming data of the merged
incrementally backing up a file , the system including a payload block in a stream , and streaming data of another
processor-based system executed on a computer system and 5 payload block in the same stream , the data of the other
configured to receive from a change block tracking (CBT) payload block comprising data from a next payload block of
module information identifying a plurality of changed the first plurality of payload blocks, data from a nextpayload
blocks on a volume of a client, the changed blocks being block of the second plurality of payload blocks, or both .
blocks of the volume that have changed since a previous 10 synthesizing a specific
In another
full
embodiment, there is a system for
backup of a file in a mountable format,
backup of the client, filter the plurality of changed blocks to the system includes a processor -based system executed on a
identify a subset of changed blocks that are associated with computer system and configured to : access a first virtual
the particular file , stream the subset of changed blocks to a hard
backup storage server for storage as an incremental virtual virtualdiskhardfilediskcorresponding to a backup of the file , the first
file comprising a first plurality of payload
hard disk file, and associate the incremental virtualhard disk 15 blocks to store data associated with the backup , access a
file to a full backup virtual hard disk file, the full backup second virtualhard disk file corresponding to an incremental
virtual hard disk file being a full backup of a previous backup of the file , the second virtual hard disk file compris
version of the particular file . ing a second plurality of payload blocks to store data
In another specific embodiment, there is a computer associated with the incremental backup , merge data from a
program product , comprising a non -transitory computer- 20 payload block of the first plurality payload blocks with data
readable medium having a computer -readable program code from a corresponding payload block of the second plurality
embodied therein , the computer-readable program code of payload blocks to form a merged payload block , and
adapted to be executed by one or more processors to stream the merged payload block to a backup media for
implement a method including : receiving from a change storage as a synthetic full backup of the first and second
block tracking (CBT) module information identifying a 25 virtual hard disk files, wherein the merge does not alter the
plurality of changed blocks on a volume of a client, the first and second virtual hard disk files.
changed blocks being blocks of the volume that have In another specific embodiment, there is a computer
changed since a previous backup of the client, filtering the program product, comprising a non -transitory computer
plurality of changed blocks to identify a subset of changed readable medium having a computer- readable program code
blocks that are associated with the particular file , streaming 30 embodied therein , the computer -readable program code
the subset of changed blocks to a backup storage server for adapted to be executed by one or more processors to
storage as an incremental virtual hard disk file , and associ- implement a method including accessing a first virtual hard
ating the incremental virtual hard disk file to a full backup disk file corresponding to a backup of a file , the first virtual
virtual hard disk file , the full backup virtual hard disk file hard disk file comprising a first plurality of payload blocks
being a full backup of a previous version of the particular 35 to store data associated with the backup , accessing a second
file . virtual hard disk file corresponding to an incremental backup
In a specific embodiment, a method includes accessing a of the file , the second virtual hard disk file comprising a
first virtualhard disk file corresponding to a backup of a file , second plurality of payload blocks to store data associated
the first virtual hard disk file comprising a first plurality of with the incremental backup, merging data from a payload
payload blocks to store data associated with the backup , 40 block of the first plurality payload blocks with data from a
accessing a second virtual hard disk file corresponding to an corresponding payload block of the second plurality of
incremental backup of the file , the second virtual hard disk payload blocks to form a merged payload block , and stream
file comprising a second plurality of payload blocks to store ing the merged payload block to a backup media for storage
data associated with the incremental backup , merging data as a synthetic full backup of the first and second virtual hard
from a payload block of the first plurality payload blocks 45 disk files , wherein the merging does not alter the first and
with data from a corresponding payload block of the second second virtual hard disk files .
plurality of payload blocks to form a merged payload block , In the description above and throughout, numerous spe
and streaming the merged payload block to a backup media c ific details are set forth in order to provide a thorough
for storage as a synthetic full backup of the first and second understanding of an embodiment of this disclosure. It willbe
virtual hard disk files, wherein themerging does not alter the 50 evident, however , to one of ordinary skill in the art, that an
first and second virtual hard disk files . embodimentmay be practiced without these specific details .
The payload block of the first plurality of payload blocks In other instances , well-known structures and devices are
may include a first extent, the corresponding payload block shown in block diagram form to facilitate explanation . The
of the second plurality of payload blocks may include a description of the preferred embodiments is not intended to
second extent, and the merging may include placing the first 55 limit the scope of the claims appended hereto . Further, in the
and second extents in the merged payload block . The merg - methods disclosed herein , various steps are disclosed illus
ing may include replacing the first extent with the second trating some of the functions of an embodiment. These steps
extent for the merged payload block , the merged payload are merely examples , and are not meant to be limiting in any
block thereby having the second extent and not having the way. Other steps and functionsmay be contemplated without
first extent. 60 departing from this disclosure or the scope of an embodi
In a specific embodiment, the method further includes ment. Other embodiments include systems and non - volatile
after themerging , determining whether a next payload block media products that execute , embody or store processes that
of the first plurality of payload blocks should be merged , implement the methods described above .
determining that the next payload block should not be What is claimed is :
merged because the second plurality of payload blocks do 65 1. A method comprising:
not include changes corresponding to the next payload receiving an identification of a file on a volume to be
block, and streaming data of the next payload block to the backed up ;
US 10 ,078 ,555 B1
29 30
performing a backup of the file to a first virtualhard disk payload blocks, data from a next payload block of the
file , the performing a backup comprising : second plurality of payload blocks, or both .
creating a template virtual volume; 6 . The method of claim 1 wherein the file comprises a
creating, on the template virtual volume, a file system database .
structure that corresponds to a file system structure 5 7 . The method of claim 1 comprising:
of the volume having the file to be backed up ; after the performing the backup of the file, deleting the
backing up metadata of the file from the template template virtual volume.
virtual volume; and 8 . The method of claim 1 wherein the creating on the
backing up content of the file from the volume; template virtual volume a file system structure that corre
tracking changes to blocks of the volume; 10 sponds to a file system structure of the volume having the
identifying a subset of tracked changed blocks on the file to be backed up comprises :
volume as being associated with the file ; creating on the template virtual volume a folder path that
performing an incremental backup of the file by backing matches a folder path of the volume in which the file is
up the subset of tracked changed blocks to a second located .
virtual hard disk file and filtering other tracked changed 15 9 . The method of claim 1 wherein the template virtual
blocks of the volume not associated with the file, volume does not include the content of the file .
wherein the performing an incremental backup com - 10 . A system for synthesizing a full backup of a file in a
prises not creating another template virtual volume; mountable format, the system comprising :
accessing the first virtual hard disk file corresponding to a processor-based system executed on a computer system
the backup of the file, the first virtual hard disk file 20 and comprising a hardware processor, wherein the
comprising a first plurality of payload blocks to store hardware processor is configured to :
data associated with the backup ; receive an identification of a file on a volume to be backed
accessing the second virtual hard disk file corresponding up ;
to the incremental backup of the file , the second virtual perform a backup of the file to a first virtualhard disk file ,
hard disk file comprising a second plurality of payload 25 the performance of the backup comprising:
blocks to store data associated with the incremental creating a template virtual volume;
backup ; creating, on the template virtual volume, a file system
merging data from a payload block of the first plurality structure that corresponds to a file system structure
payload blocks with data from a corresponding payload of the volume having the file to be backed up ;
block of the second plurality ofpayload blocks to form 30 backing up metadata of the file from the template
a merged payload block ; and virtual volume; and
streaming the merged payload block to a backup media backing up content of the file from the volume;
for storage as a synthetic full backup of the first and track changes to blocks of the volume;
second virtual hard disk files, wherein the merging does identify a subset of tracked changed blocks on the volume
not alter the first and second virtual hard disk files. 35 as being associated with the file ;
2 . The method of claim 1 wherein the payload block of the perform an incremental backup of the file by backing up
first plurality of payload blocks comprises a first extent, the the subset of tracked changed blocks to a second virtual
corresponding payload block of the second plurality of hard disk file and filter other tracked changed blocks of
payload blocks comprises a second extent, and the merging the volume not associated with the file , wherein the
comprises: 40 performance of the incremental backup comprises not
placing the first and second extents in the merged payload creating another template virtual volume;
block . access the first virtual hard disk file corresponding to the
3. The method of claim 1 wherein the payload block of the backup of the file, the first virtual hard disk file com
first plurality of payload blocks comprises a first extent, the prising a first plurality of payload blocks to store data
corresponding payload block of the second plurality of 45 associated with the backup ;
payload blocks comprises a second extent, and the merging access the second virtual hard disk file corresponding to
comprises: the incremental backup of the file , the second virtual
replacing the first extent with the second extent for the hard disk file comprising a second plurality of payload
merged payload block , the merged payload block blocks to store data associated with the incremental
thereby having the second extent and not having the 50 backup ;
first extent. merge data from a payload block of the first plurality
4 . The method of claim 1 comprising after the merging , payload blocks with data from a corresponding payload
determining whether a next payload block of the first block of the second plurality of payload blocks to form
plurality of payload blocks should be merged ; a merged payload block ; and
determining that the next payload block should not be 55 stream the merged payload block to a backup media for
merged because the second plurality of payload blocks storage as a synthetic full backup of the first and second
do not include changes corresponding to the next virtual hard disk files, wherein the merge does not alter
payload block ; and the first and second virtual hard disk files .
streaming data of the next payload block to the backup 11 . The system of claim 10 wherein the payload block of
media for storage as the synthetic full backup of the 60 the first plurality of payload blocks comprises a first extent ,
first and second virtual hard disk files . the corresponding payload block of the second plurality of
5 . The method of claim 1 comprising : payload blocks comprises a second extent, and the proces
streaming data of the merged payload block in a stream ; sor-based system is configured to :
and place the first and second extents in the merged payload
streaming data of another payload block in the same 65 block .
stream , the data of the other payload block comprising 12 . The system of claim 10 wherein the payload block of
data from a next payload block of the first plurality of the first plurality of payload blocks comprises a first extent,
US 10 ,078 ,555 B1
31 32
the corresponding payload block of the second plurality of accessing the first virtual hard disk file corresponding to
payload blocks comprises a second extent, and the proces the backup of the file , the first virtual hard disk file
sor- based system is configured to : comprising a first plurality of payload blocks to store
replace the first extent with the second extent for the data associated with the backup ;
merged payload block , the merged payload block 5 accessing the second virtual hard disk file corresponding
thereby having the second extent and not having the to the incremental backup of the file , the second virtual
first extent. hard disk file comprising a second plurality of payload
13 . The system of claim 10 wherein the processor -based blocks to store data associated with the incremental
system is configured to : backup ;
after the merge, determine whether a next payload block 10 merging data from a payload block of the first plurality
of the first plurality of payload blocks should be payload blocks with data from a corresponding payload
merged ; block of the second plurality of payload blocks to form
determine that the next payload block should not be a merged payload block ; and
merged because the second plurality of payload blocks streaming the merged payload block to a backup media
do not include changes corresponding to the next 15 for storage as a synthetic full backup of the first and
payload block ; and second virtual hard disk files , wherein the merging does
stream data of the next payload block to the backup media not alter the first and second virtual hard disk files .
for storage as the synthetic full backup of the first and the17payload
. The computer program product of claim 16 wherein
block of the first plurality of payload blocks
second virtual hard disk files.
14 . The system of claim 10 wherein the processor- based 20 comprises a first extent, the corresponding payload block of
system is configured to : the second plurality of payload blocks comprises a second
stream data of the merged payload block in a stream ; and extent, and the merging comprises:
stream data of another payload block in the same stream , placing the first and second extents in the merged payload
the data of the other payload block comprising data 25 18block .
. The computer program product of claim 16 wherein
from a next payload block of the first plurality of 25 the payload block of the first plurality of payload blocks
payload blocks, data from a next payload block of the comprises a first extent, the corresponding payload block of
second plurality of payload blocks , or both . the second plurality of payload blocks comprises a second
15 . The system of claim 10 wherein the template virtual extent , and themerging comprises:
volume does not include the content of the file .
16 . A computer program product , comprising a non- 30 replacing the first extent with the second extent for the
merged payload block , the merged payload block
transitory computer- readable medium having a computer
readable program code embodied therein , the computer thereby having the second extent and not having the
readable program code adapted to be executed by one or first extent.
more processors to implement a method comprising: 19 . The computer program product of claim 16 wherein
receiving an identification of a file on a volume to be he 3535 me
the method comprises after the merging, determining
backed up ; whether a next payload block of the first plurality of payload
performing a backup of the file to a first virtual hard disk blocks should be merged ;
determining that the next payload block should not be
file , the performing a backup comprising: merged because the second plurality of payload blocks
creating a template virtual volume;
creating , on the template virtual volume, a file system 40 do not include changes corresponding to the next
structure that corresponds to a file system structure payload block ; and
of the volume having the file to be backed up ; streaming data of the next payload block to the backup
backing up metadata of the file from the template media for storage as the synthetic full backup of the
virtual volume; and first and second virtual hard disk files .
backing up content of the file from the volume: 45 20 . The computer program product of claim 16 wherein
tracking changes to blocks of the volume; the method comprises :
identifying a subset of tracked changed blocks on the streaming data of the merged payload block in a stream ;
volume as being associated with the file ; and
performing an incremental backup of the file by backing streaming data of another payload block in the same
up the subset of tracked changed blocks to a second 50 stream , the data of the other payload block comprising
virtual hard disk file and filtering other tracked changed data from a next payload block of the first plurality of
blocks of the volume not associated with the file , payload blocks, data from a next payload block of the
wherein the performing an incremental backup com second plurality of payload blocks, or both .
prises not creating another template virtual volume; * * * * *