Loss Functions in PyTorch

4 min readMay 3, 2021

Part 0: Loss Function

The loss function should return high values for bad predictions and low values for good predictions.
In real application, the loss function is a summary of several sub-loss-function, and there are two ways to handle them. One solution is to summarize sub-loss-function and create one general loss function, and the other is to keep sub-loss-function as the following example shows:

w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)a = torch.add(w, x)
b = torch.add(w, 1)y0 = torch.mul(a, b)    # y0 = (x+w) * (w+1)
y1 = torch.add(a, b)    # y1 = (x+w) + (w+1)    dy1/dw = 2loss = torch.cat([y0, y1], dim=0)       # [y0, y1]
    
grad_tensors = torch.tensor([1., 1.])loss.backward(gradient=grad_tensors)

Part 1: Binary Cross Entropy

1.1 Definition

Binary cross entropy is defined as

where y is the label (1 for green points and 0 for red points) and p(y) is the predicted probability of the point being green for all N points.

The log loss function profile looks like:

1.2 Relation with entropy and cross entropy

If we, somewhat miraculously, match p(y) to q(y) perfectly, the computed values for both cross-entropy and entropy will match as well.

The difference between cross-entropy and entropy is Kullback-Leibler Divergence:

1.3 Implement

In the following we implement binary cross entropy both in sklearn and pytorch.

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss
import numpy as npx = np.array([-2.2, -1.4, -.8, .2, .4, .8, 1.2, 2.2, 2.9, 4.6])
y = np.array([0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0])logr = LogisticRegression(solver='lbfgs')
logr.fit(x.reshape(-1, 1), y)y_pred = logr.predict_proba(x.reshape(-1, 1))[:, 1].ravel()
loss = log_loss(y, y_pred)print('x = {}'.format(x))
print('y = {}'.format(y))
print('p(y) = {}'.format(np.round(y_pred, 2)))
print('Log Loss / Cross Entropy = {:.4f}'.format(loss))import torch.nn as nn
import torch
loss = nn.BCELoss()
a = torch.from_numpy(y_pred)
b = torch.from_numpy(y)
output = loss(a, b)
print('Log Loss from pytorch = {:.4f}'.format(output))

A few words about BCELoss in PyTorch: it supports the target is in the range [0, 1] as a floating value. However, in log_loss it supports target either 0 or 1. Here we give an example:

import torch.nn as nn
import torch
loss = nn.BCELoss()
m = nn.Sigmoid()
input = torch.randn(3)
y = np.array([0.9])
y_pred = y
a = torch.from_numpy(y_pred)
b = torch.from_numpy(y)
output = loss(a, b)
print(output)

The way how it works is shown in the following code:

-0.9*np.log(0.9)-0.1*np.log(0.1)

BCELoss in PyTorch supports the case where the target is a floating value in the range [0, 1].

I use the following codes to tell the difference between BCELoss and

target = torch.ones([10, 64], dtype=torch.float32)  # 64 classes, batch size = 10
output = torch.full([10, 64], 2.0)  # A prediction (logit)
pos_weight = torch.ones([64])  # All weights are equal to 1
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
print(criterion(output, target))
m = nn.Sigmoid()
print(nn.BCELoss()(m(output), target))
m2 = torch.sigmoid
print(nn.BCELoss()(m2(output), target))

The outputs are the same.

1.4 Reference

Understanding binary cross-entropy / log loss: a visual explanation

Part 2: Cross Entropy

2.1 Basic idea

To calculate how similar two vectors are, calculate their dot product! This is also the basic idea of collaborative filtering.

S(y) is the output of your softmax function. L is the ground truth!

2.2 Implementation

import torch.nn as nn
import torch
import torch.nn.functional as F
class CrossEntropyLoss2d(nn.Module):
    
    def __init__(self):
        super(CrossEntropyLoss2d, self).__init__()
        self.nll_loss = nn.NLLLoss() #nn.NLLLoss2d()   def forward(self, inputs, targets):
        # input: (n, c, h, w), target: (n, h, w)
        return self.nll_loss(F.log_softmax(inputs), targets)if __name__ == "__main__":
    import torch
    x = torch.Tensor([[[1,2,1],
                       [2,3,1],
                       [0, 1,1]],
                      [[0, 1, 3],
                       [2, 3, 1],
                       [0, 0, 1]],
                      [[0,1,3],
                       [2,3,1],
                       [0, 0,1]]])
    x = x.view(1, 3, 3, 3)
    print(x.shape)
    import torch.nn as nn
    soft = nn.LogSoftmax(dim=1)(x)
    print(soft.shape)    y = torch.LongTensor([[2,0,1],[0,0,1],[1,1,1]])
    y = y.view(1,3,3)
    loss = nn.NLLLoss2d()
    out = loss(soft,y)
    print(out)    a_fun = CrossEntropyLoss2d()
    print(a_fun(x,y))

In the above code, we implement crossentropy in two ways.

log_softmax is combined with nll_loss from import torch.nn.functional as F
nn.NLLLoss

Here I give another example to use cross entropy for classification purpose cross_entory.py

Pytorch also provides nn.CrossEntropyLoss() function, and the purpose of this function is to combine log_softmax and nll_loss , see this example in my GITHUB.

Part 3: Reference

PyTorch Loss Functions: The Ultimate Guide