Create Neural Network with PyTorch

5 min readMay 4, 2021

· Part 1: create models by functions
· Part 2: define models by class
· 2.1 Module properties
· 2.2 Transfer learning

Part 1: create models by functions

Here is an example:

nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[0], hidden_sizes[1]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[1], output_size),
                      nn.Softmax(dim=1))

Or you can use ordereddict

from collections import OrderedDict
model = nn.Sequential(OrderedDict([
                      ('fc1', nn.Linear(input_size, hidden_sizes[0])),
                      ('relu1', nn.ReLU()),
                      ('fc2', nn.Linear(hidden_sizes[0], hidden_sizes[1])),
                      ('relu2', nn.ReLU()),
                      ('output', nn.Linear(hidden_sizes[1], output_size)),
                      ('softmax', nn.Softmax(dim=1))]))

Part 2: define models by class

2.1 Module properties

Model class is inherited from nn.Module.

parameters shows the parameters named_parameters
modules and named_modules
the model parameters have already been properly initialized automatically
the structure of nn.Module is very similar to omegaconf, we can use the properties as dictionary or structure
we can use parameters() function to retrieve trained parameters

a=model.parameters()
for t in a:
     print(t.size)

However,iIf you try to assign a tensor to the nn.Module object, it won't show up in the parameters() unless you define it as nn.Parameter object. This has been done to facilitate scenarios where you might need to cache a non-differentiable tensor, example in case, caching previous output in case of RNNs. You can check this example for more details.

An example of neural network from nn.Module is listed as follows:

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 64)
        self.fc4 = nn.Linear(64, 10)
        
    def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.log_softmax(self.fc4(x), dim=1)
        
        return x

We can check the models in this way:

for name, layer in model.named_modules(): 
 if isinstance(layer, torch.nn.Conv2d):           
    print(name, layer)

In this example, we check all the convolutional layers in the defined neural network.

for module in self.model.modules():
    if isinstance(module, nn.Conv2d) or isinstance(module, nn.ConvTranspose2d):
        weights = module.weight
        weights = weights.reshape(-1).detach().cpu().numpy()
        import matplotlib.pyplot as plt
        plt.hist(weights)
        plt.show()
        plt.clf()

2.2 forward() method

In the nn.Module class, there is a function called forward, and the model defined by nn.Module class is callable, and the difference between them is as follows:

forward() method does accept any type of parameters. However, the goal of the forward() method is to encapsulate the forward computational steps. forward() is called in the __call__ function. In the forward() method, PyTorch call the nested model itself to perform the forward pass.
It is encouraged to:
NOT call the forward(x) method. You should call the whole model itself, as in model(x) to perform a forward pass and output predictions.
What happens if you do not do that?
If you call the .forward() method, and have hooks in your model, the hooks won’t have any effect.

2.3 Confusions

2.3.1 nn.ModuleList and nn.ParameterList()

Compare the following:

layer_list = [nn.Conv2d(5,5,3), nn.BatchNorm2d(5), nn.Linear(5,2)]

class myNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.layers = layer_list
  
  def forward(x):
    for layer in self.layers:
      x = layer(x)

net = myNet()

print(list(net.parameters()))

[]

layer_list = [nn.Conv2d(5,5,3), nn.BatchNorm2d(5), nn.Linear(5,2)]

class myNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.layers = nn.ModuleList(layer_list)
  
  def forward(x):
    for layer in self.layers:
      x = layer(x)

net = myNet()

print(list(net.parameters()))

you will have parameters

2.3.2 nn.modules() and nn.children()

modules() will give a list of all the layers while children() only gives the direct children layers.

2.3.3 named_*

It will return an iterator :

named_parameters. Returns an iterator which gives a tuple containing name of the parameters (if a convolutional layer is assigned as self.conv1, then it's parameters would be conv1.weight and conv1.bias) and the value returned by the __repr__ function of the nn.Parameter
named_modules. Same as above, but iterator returns modules like modules() function does.
named_children Same as above, but iterator return modules like children() returns
named_buffers Return buffer tensors such as running mean average of a Batch Norm layer.

2.3.4 initialization

mport matplotlib.pyplot as plt
%matplotlib inline

class myNet(nn.Module):
 
  def __init__(self):
    super().__init__()
    self.conv = nn.Conv2d(10,10,3)
    self.bn = nn.BatchNorm2d(10)
  
  def weights_init(self):
    for module in self.modules():
      if isinstance(module, nn.Conv2d):
        nn.init.normal_(module.weight, mean = 0, std = 1)
        nn.init.constant_(module.bias, 0)

Net = myNet()
Net.weights_init()

for module in Net.modules():
  if isinstance(module, nn.Conv2d):
    weights = module.weight
    weights = weights.reshape(-1).detach().cpu().numpy()
    print(module.bias)                                       # Bias to zero
    plt.hist(weights)
    plt.show()

2.5 Learning rate

2.5.1 Schedule

The schedule your learning rate is going to follow is a major hyperparameter that you want to tune. PyTorch provides support for scheduling learning rates with it’s torch.optim.lr_scheduler module which has a variety of learning rate schedules. The following example demonstrates one such example.

scheduler = torch.optim.lr_scheduler.MultiStepLR(optimiser, milestones = [10,20], gamma = 0.1)

The above scheduler, multiplies the learning rate by gamma each time when we reach epochs contained in the milestoneslist. In our case, the learning rate is multiplied by 0.1 at the 10nth and the 20nth epoch. You will also have to write the line scheduler.step in the loop in your code that goes over the epochs. Generally, training loop is made of two nested loops, where one loop goes over the epochs, and the nested one goes over the batches in that epoch. Make sure you call scheduler.step at start of the epoch loop so your learning rate is updated. Be careful not to write it in the batch loop, otherwise your learning rate may be updated at the 10th batch rather than 10nth epoch.

Also remember that scheduler.step is no replacement for optim.step and you'll have to call optim.step everytime you backprop backwards. (This would be in the "batch" loop)

2.5.2 Different learning rate

params_bias = [] 
params_wts = []  # seperate the bias and weights parameters for name, parameter in Net.named_parameters():   
if "bias" in name:     
   params_bias.append(parameter)   
elif "weight" in name:     
   params_wts.append(parameter)   optimiser = torch.optim.SGD([{
"params": params_bias, 'lr' : 0.001, "momentum" : 0.99},                              {"params": params_wts}], lr = 0.01, momentum = 0.9)

3. Transfer learning

We also use the following codes to illustrate how to use transfer learning

model = models.densenet121(pretrained=True)
for param in model.parameters():
    param.requires_grad = False

from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 500)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
    
model.classifier = classifier

4. nn.Module vs nn.Functional

This is something that comes quite a lot especially when you are reading open source code. In PyTorch, layers are often implemented as either one of torch.nn.Module objects or torch.nn.Functional functions. Which one to use? Which is better?

import torch
import torch.nn as nn
import torch.nn.functional as F

inp = torch.randn(1,3,64,64)     # random input image

# Same thing using two approaches
# ---------------------------------------

# torch.nn
avg_pool = nn.AvgPool2d(4)     # create an object
nn_out = avg_pool(inp)         # invoke the forward method

# torch.nn.Functional
f_out = F.avg_pool2d(inp, 4)


print (torch.equal(nn_out, f_out))

5. Reference

PyTorch Advance Usage