nn.Conv1d 和nn.Linear差别
1、两者的计算结果是相同的,只不过是nn.Conv1d是 Channel first,需要将通道数转换为(B,C,H)而nn.Linear的通道数是(B,H,C)
验证代码:
import torch
def count_parameters(model):
"""Count the number of parameters in a model."""
return sum([p.numel() for p in model.parameters()])
conv = torch.nn.Conv1d(8,32,1)
print(count_parameters(conv))
# 288
linear = torch.nn.Linear(8,32)
print(count_parameters(linear))
# 288
print(conv.weight.shape)
# torch.Size([32, 8, 1])
print(linear.weight.shape)
# torch.Size([32, 8])
# use same initialization
linear.weight = torch.nn.Parameter(conv.weight.squeeze(2))
linear.bias = torch.nn.Parameter(conv.bias)
tensor = torch.randn(128,256,8)
permuted_tensor = tensor.permute(0,2,1).clone().contiguous() # 注意此处进行了维度重新排列
out_linear = linear(tensor)
print(out_linear.mean())
# tensor(0.0067, grad_fn=<MeanBackward0>)
out_conv = conv(permuted_tensor)
print(out_conv.mean())
# tensor(0.0067, grad_fn=<MeanBackward0>)
2、由于nn.Conv1d和nn.Linear的计算方法有所区别,因此两者的执行速度也有所差别:
#Speed test:
%%timeit
_ = linear(tensor)
# 151 µs ± 297 ns per loop
%%timeit
_ = conv(permuted_tensor)
# 1.43 ms ± 6.33 µs per loop```
可见nn.Linear的计算速度更快。
3、由于nn.Conv1d和nn.Linear在网络的前向传播和梯度反传中数值的保存精度有所差别,因此导致网络越深,两者计算的数值差别越大。
以上均来自[Stack Overflow](https://stackoverflow.com/questions/55576314/conv1d-with-kernel-size-1-vs-linear-layer?answertab=oldest#tab-top)