My Fastai Course Note (7): Training a State-of-the-Art Model

ifeelfree
2 min readNov 8, 2020

--

This note is based on Fastbook.

  1. Normalization

With Fastai, input image can be normalized in DataBlock.

batch_tfms = [Normalize.from_stats(*imagenet_stats)]

When using pre-trained model, normalization is assumed to be implemented as default.

2. Progressive resizing

Start training with small images and then end training with large images. Completing training using large images makes the final accuracy much higher. We call this approach progressive resizing.

def get_dls(bs, size):
dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
get_y=parent_label,
item_tfms=Resize(460),
batch_tfms=[*aug_transforms(size=size, min_scale=0.75),
Normalize.from_stats(*imagenet_stats)])
return dblock.dataloaders(path, bs=bs)
# step 1: use small images for training
dls = get_dls(128, 128)
learn = Learner(dls, xresnet50(), loss_func=CrossEntropyLossFlat(), metrics=accuracy)
learn.fit_one_cycle(4, 3e-3)
# step 2: use large images
learn.dls = get_dls(64, 224)
learn.fine_tune(5, 1e-3)

Progressive resizing may actually hurt performance in transfer learning.

3. Test time augmentation

Random cropping can be problematic for validation image, where the central area is cropped. One solution to this problem is to use test time augmentation:

During inference or validation, creating multiple versions of each image, using data augmentation, and then taking the average or maximum of the predictions for each augmented version of the image.

preds, targs = learn.tta()

4. Mixup

model = xresnet50()
learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(),
metrics=accuracy, cbs=MixUp)
learn.fit_one_cycle(5, 3e-3)

Mixup requires far more epochs to train to get better accuracy, compared to other augmentation approaches we’ve seen.

Mixup is used for all leading results for trainings of >80 epochs, and for fewer epochs Mixup is not being used.

5. Label smoothing

Instead, we could replace all our 1s with a number a bit less than 1, and our 0s by a number a bit more than 0, and then train. This is called label smoothing. By encouraging your model to be less confident, label smoothing will make your training more robust, even if there is mislabeled data. The result will be a model that generalizes better.

model = xresnet50()
learn = Learner(dls, model, loss_func=LabelSmoothingCrossEntropy(),
metrics=accuracy)
learn.fit_one_cycle(5, 3e-3)

--

--

No responses yet