1、Theory
loss.backward()
computes dloss/dx
for every parameter x
which has requires_grad=True
. These are accumulated into x.grad
for every parameter x
. In pseudo-code:
x.grad += dloss/dx
optimizer.step
updates the value of x
using the gradient x.grad
. For example, the SGD optimizer performs:
x += -lr * x.grad
optimizer.zero_grad()
clears x.grad
for every parameter x
in the optimizer. It’s important to call this before loss.backward()
, otherwise you’ll accumulate the gradients from multiple passes.
If you have multiple losses (loss1, loss2) you can sum them and then call backwards once:
loss3 = loss1 + loss2
loss3.backward()
2、Example
if you have “net2” which is a pretrained network and you want to backprop the gradients of the loss of “net2” into “net1”. In pseudo-code:
import torch
from torch import optim
def train(n_epoch):
net1 = model1()
net2 = model2(pretrained=True)
optim1 = optim.SGD(net1.parameters(), lr=0.1)
for epoch in range(n_epoch):
for data in dataloader:
net2.eval()
with torch.no_grad():
loss2 = net2(data)
net1.train()
optim1.zero_grad()
loss1 = net1(data)
total_loss = loss1 + loss2
total_loss.backward()
optim1.step()
参考: What does the backward() function do? - autograd - PyTorch Forums