Torch buffer vs Parameter
Torch parameter vs buffer
Torch Parameters
You can create a parameter in your torch model by using the following in your init method.
When I save the model’s state_dict
, I’ll find this within the model.parameters()
. However, what if we had something that didn’t need gradient and hence did not need to be a parameter? An example would the mean and variance used in batch normalization. That’s where the next idea comes into play:
Torch Buffers
You can create a buffer in your torch model by using the following in your init method of your nn.Module
In this example, k
is a string "some_tensor"
and v
is a tensor of ones. I can now access this tensor via self.some_tensor
, kind of like a python dictionary. The ones tensor never recieves and gradient, and will be stored in your state dict.
Conclusion
So to wrap up, you should use parameters for things that require gradient and buffers for things that don’t. Of course, instead of buffers you can use nn.Parameter
and set requires_grad = False
, but your optimizer will need to check the requires_grad
attribute of these tensors during the weight update, which is an unnecesary step.