Part 0: Loss Function
- The loss function should return high values for bad predictions and low values for good predictions.
- In real application, the loss function is a summary of several sub-loss-function, and there are two ways to handle them. One solution is to summarize sub-loss-function and create one general loss function, and the other is to keep sub-loss-function as the following example shows:
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)a = torch.add(w, x)
b = torch.add(w, 1)y0 = torch.mul(a, b) # y0 = (x+w) * (w+1)
y1 = torch.add(a, b) # y1 = (x+w) + (w+1) dy1/dw = 2loss = torch.cat([y0, y1], dim=0) # [y0, y1]
grad_tensors = torch.tensor([1., 1.])loss.backward(gradient=grad_tensors)
Part 1: Binary Cross Entropy
1.1 Definition
Binary cross entropy is defined as
where y is the label (1 for green points and 0 for red points) and p(y) is the predicted probability of the point being green for all N points.
The log loss function profile looks like:
1.2 Relation with entropy and cross entropy
If we, somewhat miraculously, match p(y) to q(y) perfectly, the computed values for both cross-entropy and entropy will match as well.
The difference between cross-entropy and entropy is Kullback-Leibler Divergence:
1.3 Implement
In the following we implement binary cross entropy both in sklearn and pytorch.
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss
import numpy as npx = np.array([-2.2, -1.4, -.8, .2, .4, .8, 1.2, 2.2, 2.9, 4.6])
y = np.array([0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0])logr = LogisticRegression(solver='lbfgs')
logr.fit(x.reshape(-1, 1), y)y_pred = logr.predict_proba(x.reshape(-1, 1))[:, 1].ravel()
loss = log_loss(y, y_pred)print('x = {}'.format(x))
print('y = {}'.format(y))
print('p(y) = {}'.format(np.round(y_pred, 2)))
print('Log Loss / Cross Entropy = {:.4f}'.format(loss))import torch.nn as nn
import torch
loss = nn.BCELoss()
a = torch.from_numpy(y_pred)
b = torch.from_numpy(y)
output = loss(a, b)
print('Log Loss from pytorch = {:.4f}'.format(output))
A few words about BCELoss
in PyTorch: it supports the target is in the range [0, 1] as a floating value. However, in log_loss
it supports target either 0 or 1. Here we give an example:
import torch.nn as nn
import torch
loss = nn.BCELoss()
m = nn.Sigmoid()
input = torch.randn(3)
y = np.array([0.9])
y_pred = y
a = torch.from_numpy(y_pred)
b = torch.from_numpy(y)
output = loss(a, b)
print(output)
The way how it works is shown in the following code:
-0.9*np.log(0.9)-0.1*np.log(0.1)
BCELoss
in PyTorch supports the case where the target is a floating value in the range [0, 1].
I use the following codes to tell the difference between BCELoss
and
target = torch.ones([10, 64], dtype=torch.float32) # 64 classes, batch size = 10
output = torch.full([10, 64], 2.0) # A prediction (logit)
pos_weight = torch.ones([64]) # All weights are equal to 1
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
print(criterion(output, target))
m = nn.Sigmoid()
print(nn.BCELoss()(m(output), target))
m2 = torch.sigmoid
print(nn.BCELoss()(m2(output), target))
The outputs are the same.
1.4 Reference
Part 2: Cross Entropy
2.1 Basic idea
To calculate how similar two vectors are, calculate their dot product! This is also the basic idea of collaborative filtering.
S(y) is the output of your softmax function. L is the ground truth!
2.2 Implementation
import torch.nn as nn
import torch
import torch.nn.functional as F
class CrossEntropyLoss2d(nn.Module):
def __init__(self):
super(CrossEntropyLoss2d, self).__init__()
self.nll_loss = nn.NLLLoss() #nn.NLLLoss2d() def forward(self, inputs, targets):
# input: (n, c, h, w), target: (n, h, w)
return self.nll_loss(F.log_softmax(inputs), targets)if __name__ == "__main__":
import torch
x = torch.Tensor([[[1,2,1],
[2,3,1],
[0, 1,1]],
[[0, 1, 3],
[2, 3, 1],
[0, 0, 1]],
[[0,1,3],
[2,3,1],
[0, 0,1]]])
x = x.view(1, 3, 3, 3)
print(x.shape)
import torch.nn as nn
soft = nn.LogSoftmax(dim=1)(x)
print(soft.shape) y = torch.LongTensor([[2,0,1],[0,0,1],[1,1,1]])
y = y.view(1,3,3)
loss = nn.NLLLoss2d()
out = loss(soft,y)
print(out) a_fun = CrossEntropyLoss2d()
print(a_fun(x,y))
In the above code, we implement crossentropy
in two ways.
log_softmax
is combined withnll_loss
fromimport torch.nn.functional as F
nn.NLLLoss
Here I give another example to use cross entropy for classification purpose cross_entory.py
Pytorch also provides nn.CrossEntropyLoss()
function, and the purpose of this function is to combine log_softmax
and nll_loss
, see this example in my GITHUB.