Torch buffer vs Parameter
Torch parameter vs buffer
You can create a parameter in your torch model by using the following in your init method.
When I save the model’s
state_dict, I’ll find this within the
model.parameters(). However, what if we had something that didn’t need gradient and hence did not need to be a parameter? An example would the mean and variance used in batch normalization. That’s where the next idea comes into play:
You can create a buffer in your torch model by using the following in your init method of your
In this example,
k is a string
v is a tensor of ones. I can now access this tensor via
self.some_tensor, kind of like a python dictionary. The ones tensor never recieves and gradient, and will be stored in your state dict.
So to wrap up, you should use parameters for things that require gradient and buffers for things that don’t. Of course, instead of buffers you can use
nn.Parameter and set
requires_grad = False, but your optimizer will need to check the
requires_grad attribute of these tensors during the weight update, which is an unnecesary step.