My Fastai Course Note (6): Other Computer Vision Problems

ifeelfree
2 min readNov 8, 2020

--

This part is based on Fastbook.

  1. Multi-label classification

Multi-label classification refers to the problem of identifying the categories of objects in images that may not contain exactly one type of object.

Because in practice it is probably more common to have some images with zero matches or more than one match, we should probably expect in practice that multi-label classifiers are more widely applicable than single-label classifiers.

2. Re-visit Dataset and DataLoader

As we have seen, PyTorch and fastai have two main classes for representing and accessing a training set or validation set:

  • Dataset:: A collection that returns a tuple of your independent and dependent variable for a single item
  • DataLoader:: An iterator that provides a stream of mini-batches, where each mini-batch is a tuple of a batch of independent variables and a batch of dependent variables

On top of these, fastai provides two classes for bringing your training and validation sets together:

  • Datasets:: An object that contains a training Dataset and a validation Dataset
  • DataLoaders:: An object that contains a training DataLoader and a validation DataLoader

When we create a DataBlock, we build up gradually, step by step, and use the notebook to check our data along the way. This is a great way to make sure that you maintain momentum as you are coding, and that you keep an eye out for any problems. It’s easy to debug, because you know that if a problem arises, it is in the line of code you just typed!

dblock = DataBlock(get_x = lambda r: r['fname'], get_y = lambda r: r['labels'])dsets = dblock.datasets(df)dsets.train[0]

This will give you the following message:

(Path('/root/.fastai/data/pascal_2007/train/007159.jpg'), ['car', 'person'])

However, in deep learning training stage, we need to know the image and label, and therefore DataBlock provides an alternative:

def get_x(r): return path/'train'/r['fname']
def get_y(r): return r['labels'].split(' ')
def splitter(df):
train = df.index[~df['is_valid']].tolist()
valid = df.index[df['is_valid']].tolist()
return train, valid
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),get_x = get_x, get_y = get_y, splitter=splitter,
item_tfms=RandomResizedCrop(128),)
dsets = dblock.datasets(df)
dsets.train[0]
  • ImageBlock: will read image
  • MultiCategoryBlock: will transform labels into one-hot encoding
  • splitter: will be used to split data into training and validation

3. Binary cross entropy

def binary_cross_entropy(inputs, targets):
inputs = inputs.sigmoid()
return -torch.where(targets==1, inputs, 1-inputs).log().mean()

This is equal to:

loss_func = nn.BCEWithLogitsLoss() 
loss = loss_func(activs, y)

3. Regression

The difference between multi-class classification, multi-label classification and regression is loss function:

  • nn.CrossEntropyLoss for single-label classification
  • nn.BCEWithLogitsLoss for multi-label classification
  • nn.MSELoss for regression

4. y_range in regression learning model

def sigmoid_range(x, lo, hi): 
return x.sigmoid() * (hi-lo) + lo

This is often used as the last layer in regression model

--

--

No responses yet