Table of Contents
· Part 1: create models by functions
· Part 2: define models by class
· 2.1 Module properties
· 2.2 Transfer learning
Part 1: create models by functions
Here is an example:
nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
nn.ReLU(),
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
nn.ReLU(),
nn.Linear(hidden_sizes[1], output_size),
nn.Softmax(dim=1))
Or you can use ordereddict
from collections import OrderedDict
model = nn.Sequential(OrderedDict([
('fc1', nn.Linear(input_size, hidden_sizes[0])),
('relu1', nn.ReLU()),
('fc2', nn.Linear(hidden_sizes[0], hidden_sizes[1])),
('relu2', nn.ReLU()),
('output', nn.Linear(hidden_sizes[1], output_size)),
('softmax', nn.Softmax(dim=1))]))
Part 2: define models by class
2.1 Module properties
Model class is inherited from nn.Module
.
parameters
shows the parametersnamed_parameters
modules
andnamed_modules
- the model parameters have already been properly initialized automatically
- the structure of nn.Module is very similar to omegaconf, we can use the properties as dictionary or structure
- we can use
parameters()
function to retrieve trained parameters
a=model.parameters()
for t in a:
print(t.size)
However,iIf you try to assign a tensor to the nn.Module
object, it won't show up in the parameters()
unless you define it as nn.Parameter
object. This has been done to facilitate scenarios where you might need to cache a non-differentiable tensor, example in case, caching previous output in case of RNNs. You can check this example for more details.
An example of neural network from nn.Module
is listed as follows:
class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 10)
def forward(self, x):
# make sure input tensor is flattened
x = x.view(x.shape[0], -1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.log_softmax(self.fc4(x), dim=1)
return x
We can check the models in this way:
for name, layer in model.named_modules():
if isinstance(layer, torch.nn.Conv2d):
print(name, layer)
In this example, we check all the convolutional layers in the defined neural network.
for module in self.model.modules():
if isinstance(module, nn.Conv2d) or isinstance(module, nn.ConvTranspose2d):
weights = module.weight
weights = weights.reshape(-1).detach().cpu().numpy()
import matplotlib.pyplot as plt
plt.hist(weights)
plt.show()
plt.clf()
2.2 forward() method
In the nn.Module
class, there is a function called forward
, and the model defined by nn.Module
class is callable, and the difference between them is as follows:
forward()
method does accept any type of parameters. However, the goal of theforward()
method is to encapsulate the forward computational steps.forward()
is called in the__call__
function. In theforward()
method, PyTorch call the nested model itself to perform the forward pass.It is encouraged to:
NOT call the forward(x) method. You should call the whole model itself, as in model(x) to perform a forward pass and output predictions.
What happens if you do not do that?
If you call the
.forward()
method, and have hooks in your model, the hooks won’t have any effect.
2.3 Confusions
2.3.1 nn.ModuleList and nn.ParameterList()
Compare the following:
layer_list = [nn.Conv2d(5,5,3), nn.BatchNorm2d(5), nn.Linear(5,2)]
class myNet(nn.Module):
def __init__(self):
super().__init__()
self.layers = layer_list
def forward(x):
for layer in self.layers:
x = layer(x)
net = myNet()
print(list(net.parameters()))
[]
layer_list = [nn.Conv2d(5,5,3), nn.BatchNorm2d(5), nn.Linear(5,2)]
class myNet(nn.Module):
def __init__(self):
super().__init__()
self.layers = nn.ModuleList(layer_list)
def forward(x):
for layer in self.layers:
x = layer(x)
net = myNet()
print(list(net.parameters()))
you will have parameters
2.3.2 nn.modules() and nn.children()
modules() will give a list of all the layers while children() only gives the direct children layers.
2.3.3 named_*
It will return an iterator :
named_parameters
. Returns an iterator which gives a tuple containing name of the parameters (if a convolutional layer is assigned asself.conv1
, then it's parameters would beconv1.weight
andconv1.bias
) and the value returned by the__repr__
function of thenn.Parameter
named_modules
. Same as above, but iterator returns modules likemodules()
function does.named_children
Same as above, but iterator return modules likechildren()
returnsnamed_buffers
Return buffer tensors such as running mean average of a Batch Norm layer.
2.3.4 initialization
mport matplotlib.pyplot as plt
%matplotlib inline
class myNet(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Conv2d(10,10,3)
self.bn = nn.BatchNorm2d(10)
def weights_init(self):
for module in self.modules():
if isinstance(module, nn.Conv2d):
nn.init.normal_(module.weight, mean = 0, std = 1)
nn.init.constant_(module.bias, 0)
Net = myNet()
Net.weights_init()
for module in Net.modules():
if isinstance(module, nn.Conv2d):
weights = module.weight
weights = weights.reshape(-1).detach().cpu().numpy()
print(module.bias) # Bias to zero
plt.hist(weights)
plt.show()
2.5 Learning rate
2.5.1 Schedule
The schedule your learning rate is going to follow is a major hyperparameter that you want to tune. PyTorch provides support for scheduling learning rates with it’s torch.optim.lr_scheduler
module which has a variety of learning rate schedules. The following example demonstrates one such example.
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimiser, milestones = [10,20], gamma = 0.1)
The above scheduler, multiplies the learning rate by gamma
each time when we reach epochs contained in the milestones
list. In our case, the learning rate is multiplied by 0.1 at the 10nth and the 20nth epoch. You will also have to write the line scheduler.step
in the loop in your code that goes over the epochs. Generally, training loop is made of two nested loops, where one loop goes over the epochs, and the nested one goes over the batches in that epoch. Make sure you call scheduler.step
at start of the epoch loop so your learning rate is updated. Be careful not to write it in the batch loop, otherwise your learning rate may be updated at the 10th batch rather than 10nth epoch.
Also remember that scheduler.step
is no replacement for optim.step
and you'll have to call optim.step
everytime you backprop backwards. (This would be in the "batch" loop)
2.5.2 Different learning rate
params_bias = []
params_wts = [] # seperate the bias and weights parameters for name, parameter in Net.named_parameters():
if "bias" in name:
params_bias.append(parameter)
elif "weight" in name:
params_wts.append(parameter) optimiser = torch.optim.SGD([{
"params": params_bias, 'lr' : 0.001, "momentum" : 0.99}, {"params": params_wts}], lr = 0.01, momentum = 0.9)
3. Transfer learning
We also use the following codes to illustrate how to use transfer learning
model = models.densenet121(pretrained=True)
for param in model.parameters():
param.requires_grad = False
from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(1024, 500)),
('relu', nn.ReLU()),
('fc2', nn.Linear(500, 2)),
('output', nn.LogSoftmax(dim=1))
]))
model.classifier = classifier
4. nn.Module vs nn.Functional
This is something that comes quite a lot especially when you are reading open source code. In PyTorch, layers are often implemented as either one of torch.nn.Module
objects or torch.nn.Functional
functions. Which one to use? Which is better?
import torch
import torch.nn as nn
import torch.nn.functional as F
inp = torch.randn(1,3,64,64) # random input image
# Same thing using two approaches
# ---------------------------------------
# torch.nn
avg_pool = nn.AvgPool2d(4) # create an object
nn_out = avg_pool(inp) # invoke the forward method
# torch.nn.Functional
f_out = F.avg_pool2d(inp, 4)
print (torch.equal(nn_out, f_out))