1. Project Definition
1.1 Project Overview
The purpose of this project to demonstrate how we can identify dog breeds from a given picture. If a person happens to be in the picture, we also want to know what kind of dogs shares similarity with this person. The following picture shows the one example of the project’s output:
1.2 Problem Statement
However, in order to identify the dog breed, we have several challenges to conquer:
- How can we tell there is a person in the picture?
- How can we tell there is a dog in the picture?
- If we know it is dog, how can we identify its category?
These questions are related to three important research topics in the project, namely, person detection, dog detection and dog breed classification. In this project, we will disclose the technical details regard to them. On top of it, we will also explain how we combine these components into a functional application.
1.3 Metrics
We use three metrics in our project, and they are:
- Person Detection Accuracy (PDA): it is the percentage of the first 100 images in the dog and human face datasets with a detected human face. For dogs data set, we expect the detection rate should be 0 while for human face datasets we expect the detection rate is 100.
- Dog Detection Accuracy (DDA): the way how we evaluate the dog detection accuracy is similar to what we did with person detection. We use the same human data set and dog data set, and the percentage of the first 100 images in the dog and human data sets with detected dogs is used as the evaluation metric. For human data set, ideally the detection rate should be 0 while the detection rate for dog data set should be 100.
- Dog Classification Accuracy (DCA): the evaluation metric we use for dog breed classification is defined as:
accuracy = sum(estimated_dog_breed==groundtruth_dog_breed)/test_dataset_number
2. Project Analysis
2.1 Data Exploration
The dog data set contains 8351 images, including 133 dog categories. An overview of the images within this data set is shown below:
We randomly separate the data set into three groups: training (6680), validation (835) and test (836):
2.2 Data Visualization
Thanks to OpenCV, we can have a good visualization of the images when faces can be detected:
3. Project Methodology
3.1 Data Preprocessing
Person Detection
For face detection (or person detection), we need to transform RGB color image to gray-scale image, and this is the only data preprocessing method we need.
Dog Breed Classification
Before the data can be used for dog breed classification neural network training, we need process the data so that they can become the inputs of the neural network. In our experiment, the following processing is implemented:
- Image size normalization: resize the input image so that its output dimension becomes [224, 224, 3], where 224 refers to the width and height of the input image and 3 refers to the bands of the image (RGB).
- Image pixel value normalization: we divide the pixel value by 255.0 so that the pixel value range is between 0.0 and 1.0
3.2 Implementation
Person Detection
The basic idea behind Person Detector is that if we can detect a person’s face in the image, there is a large chance that we can say that we have detected the person. In that sense, Person Detector is equal to Face Detector.
In this project we use one face detector implemented in OpenCV, which implements Haar feature-based cascade classifiers. OpenCV provides many pre-trained face detectors, stored as XML files on github.
Dog Detection
We use a pre-trained ResNet-50 model to detect dogs in images, and the weights of the model that have been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.
ResNet-50 is a Deep Neural Network architecture, and in total it has 25,636,712 parameters, among which 25,583,592 parameters are trainable. If you want to visualize it, please check netscope.
ResNet-50 outputs a probability vector of size 1000, and we use argmax
function to identify the integer that corresponds to the model’s predicted object class. After checking this dictionary we know if the identified integer is between 151 and 268, we are sure that dogs have been detected.
Dog Breed Classification
Before we use well-known Neural Network architecture with transfer learning, let’s try our first Neural Network for dog breed classification. Our Neural Network architecture is inspired by VGG-16, and it is a shallow Neural Network. We follow the following rules when designing the network:
- When the image size reduced by 2, we will increase the feature channel by 2. By doing so, we will make sure there will no information loss during the training.
- We also use Dropout layer to regularize the Neural Network.
- We use softmax layer as the last layer because this is multi-class classification problem, and we will use categorical cross-entropy as the loss function.
Application pipeline
we design the following processing chain that combines person detection, dog detection and dog breed classification:
- When we are given a picture, we first try Person Detector module, which will tell whether the given picture contains a person.
- In parallel, we also try the same picture on Dog Detector module, and the purpose is to tell whether the given picture contains a dog.
- If we are sure either Person or Dog exists, then we will continue with Dog Breed Classifier module, which tell us the dog category that the person/dog belongs to. We escape the cases where an object pass the examination of Person Detector module and Dog Detector module as this is a confusing case.
3.3 Refinement
Transfer learning with VGG16 for dog breed classification
Our self-designed network for dog breed classification does not work well. There are many reasons for it: the architecture is shallow, the training epoch is small, hyper-parameters can be refined…. Among these reasons, we training data set is relatively small compared to the large amount of model parameters to estimated is a contributing factor.
We can use data augmentation to artificially increase the training samples, which is a good direction to go. In our experiment, we decide to use another technique: transfer learning.
The basic idea behind transfer learning is to use well-trained neural network as feature extractor and only retrain the last layer (or a few last layers) for the target application. Transfer learning has physiological connection with our brains, and deep learning practitioners often use it. However, we cannot use our self-defined neural network in the transfer learning context and this is because we cannot find someone has already trained our self-defined network with a lot of images. Instead, we have to use well-designed network in the literature.
Among all the well-know neural network architectures, we select VGG16 due to its simplicity and popularity. You can check its visualization on netscope website. The trained parameters of VGG16 is part of Keras library, and we can use the following code to obtain the feature extractor:
def extract_VGG16(tensor):
from keras.applications.vgg16 import VGG16, preprocess_input
return VGG16(weights='imagenet', include_top=False).predict(preprocess_input(tensor))
Transfer learning with Inception 3 for dog breed classification
Encouraged by our interesting result with transfer learning, we decide to go further, and use Inception 3 neural network as our backbone network for transfer learning.
Inception 3 is a far more advanced neural network architecture compared to VGG-16, and it is deeper with many more parameters to train.
4. Results
4.1 Model Evaluation and Validation
Person Detection
We use Haar features for face detection, and these features are just like our convolutional kernel. Each feature is a single value obtained by subtracting the sum of pixels under the white rectangle from the sum of pixels under the black rectangle.
With the data base we have, for human data set, we achieved 100% accuracy while for dog data set, we achieved 12% accuracy. It seems that OpenCV face detector has relatively large false positive, but the performance is acceptable for the current experiment.
Dog Detection
Dog detection relies on pre-trained ResNet-50, and the architecture of the network is summarized as follows:
Model: "resnet50"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224, 224, 3) 0
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D) (None, 230, 230, 3) 0 input_1[0][0]
__________________________________________________________________________________________________
conv1_conv (Conv2D) (None, 112, 112, 64) 9472 conv1_pad[0][0]
__________________________________________________________________________________________________
conv1_bn (BatchNormalization) (None, 112, 112, 64) 256 conv1_conv[0][0]
__________________________________________________________________________________________________
conv1_relu (Activation) (None, 112, 112, 64) 0 conv1_bn[0][0]
__________________________________________________________________________________________________
pool1_pad (ZeroPadding2D) (None, 114, 114, 64) 0 conv1_relu[0][0]
__________________________________________________________________________________________________
pool1_pool (MaxPooling2D) (None, 56, 56, 64) 0 pool1_pad[0][0]
__________________________________________________________________________________________________
conv2_block1_1_conv (Conv2D) (None, 56, 56, 64) 4160 pool1_pool[0][0]
__________________________________________________________________________________________________
conv2_block1_1_bn (BatchNormali (None, 56, 56, 64) 256 conv2_block1_1_conv[0][0]
__________________________________________________________________________________________________
conv2_block1_1_relu (Activation (None, 56, 56, 64) 0 conv2_block1_1_bn[0][0]
__________________________________________________________________________________________________
conv2_block1_2_conv (Conv2D) (None, 56, 56, 64) 36928 conv2_block1_1_relu[0][0]
__________________________________________________________________________________________________
conv2_block1_2_bn (BatchNormali (None, 56, 56, 64) 256 conv2_block1_2_conv[0][0]
__________________________________________________________________________________________________
conv2_block1_2_relu (Activation (None, 56, 56, 64) 0 conv2_block1_2_bn[0][0]
__________________________________________________________________________________________________
conv2_block1_0_conv (Conv2D) (None, 56, 56, 256) 16640 pool1_pool[0][0]
__________________________________________________________________________________________________
conv2_block1_3_conv (Conv2D) (None, 56, 56, 256) 16640 conv2_block1_2_relu[0][0]
__________________________________________________________________________________________________
conv2_block1_0_bn (BatchNormali (None, 56, 56, 256) 1024 conv2_block1_0_conv[0][0]
__________________________________________________________________________________________________
conv2_block1_3_bn (BatchNormali (None, 56, 56, 256) 1024 conv2_block1_3_conv[0][0]
__________________________________________________________________________________________________
conv2_block1_add (Add) (None, 56, 56, 256) 0 conv2_block1_0_bn[0][0]
conv2_block1_3_bn[0][0]
__________________________________________________________________________________________________
conv2_block1_out (Activation) (None, 56, 56, 256) 0 conv2_block1_add[0][0]
__________________________________________________________________________________________________
conv2_block2_1_conv (Conv2D) (None, 56, 56, 64) 16448 conv2_block1_out[0][0]
__________________________________________________________________________________________________
conv2_block2_1_bn (BatchNormali (None, 56, 56, 64) 256 conv2_block2_1_conv[0][0]
__________________________________________________________________________________________________
conv2_block2_1_relu (Activation (None, 56, 56, 64) 0 conv2_block2_1_bn[0][0]
__________________________________________________________________________________________________
conv2_block2_2_conv (Conv2D) (None, 56, 56, 64) 36928 conv2_block2_1_relu[0][0]
__________________________________________________________________________________________________
conv2_block2_2_bn (BatchNormali (None, 56, 56, 64) 256 conv2_block2_2_conv[0][0]
__________________________________________________________________________________________________
conv2_block2_2_relu (Activation (None, 56, 56, 64) 0 conv2_block2_2_bn[0][0]
__________________________________________________________________________________________________
conv2_block2_3_conv (Conv2D) (None, 56, 56, 256) 16640 conv2_block2_2_relu[0][0]
__________________________________________________________________________________________________
conv2_block2_3_bn (BatchNormali (None, 56, 56, 256) 1024 conv2_block2_3_conv[0][0]
__________________________________________________________________________________________________
conv2_block2_add (Add) (None, 56, 56, 256) 0 conv2_block1_out[0][0]
conv2_block2_3_bn[0][0]
__________________________________________________________________________________________________
conv2_block2_out (Activation) (None, 56, 56, 256) 0 conv2_block2_add[0][0]
__________________________________________________________________________________________________
conv2_block3_1_conv (Conv2D) (None, 56, 56, 64) 16448 conv2_block2_out[0][0]
__________________________________________________________________________________________________
conv2_block3_1_bn (BatchNormali (None, 56, 56, 64) 256 conv2_block3_1_conv[0][0]
__________________________________________________________________________________________________
conv2_block3_1_relu (Activation (None, 56, 56, 64) 0 conv2_block3_1_bn[0][0]
__________________________________________________________________________________________________
conv2_block3_2_conv (Conv2D) (None, 56, 56, 64) 36928 conv2_block3_1_relu[0][0]
__________________________________________________________________________________________________
conv2_block3_2_bn (BatchNormali (None, 56, 56, 64) 256 conv2_block3_2_conv[0][0]
__________________________________________________________________________________________________
conv2_block3_2_relu (Activation (None, 56, 56, 64) 0 conv2_block3_2_bn[0][0]
__________________________________________________________________________________________________
conv2_block3_3_conv (Conv2D) (None, 56, 56, 256) 16640 conv2_block3_2_relu[0][0]
__________________________________________________________________________________________________
conv2_block3_3_bn (BatchNormali (None, 56, 56, 256) 1024 conv2_block3_3_conv[0][0]
__________________________________________________________________________________________________
conv2_block3_add (Add) (None, 56, 56, 256) 0 conv2_block2_out[0][0]
conv2_block3_3_bn[0][0]
__________________________________________________________________________________________________
conv2_block3_out (Activation) (None, 56, 56, 256) 0 conv2_block3_add[0][0]
__________________________________________________________________________________________________
conv3_block1_1_conv (Conv2D) (None, 28, 28, 128) 32896 conv2_block3_out[0][0]
__________________________________________________________________________________________________
conv3_block1_1_bn (BatchNormali (None, 28, 28, 128) 512 conv3_block1_1_conv[0][0]
__________________________________________________________________________________________________
conv3_block1_1_relu (Activation (None, 28, 28, 128) 0 conv3_block1_1_bn[0][0]
__________________________________________________________________________________________________
conv3_block1_2_conv (Conv2D) (None, 28, 28, 128) 147584 conv3_block1_1_relu[0][0]
__________________________________________________________________________________________________
conv3_block1_2_bn (BatchNormali (None, 28, 28, 128) 512 conv3_block1_2_conv[0][0]
__________________________________________________________________________________________________
conv3_block1_2_relu (Activation (None, 28, 28, 128) 0 conv3_block1_2_bn[0][0]
__________________________________________________________________________________________________
conv3_block1_0_conv (Conv2D) (None, 28, 28, 512) 131584 conv2_block3_out[0][0]
__________________________________________________________________________________________________
conv3_block1_3_conv (Conv2D) (None, 28, 28, 512) 66048 conv3_block1_2_relu[0][0]
__________________________________________________________________________________________________
conv3_block1_0_bn (BatchNormali (None, 28, 28, 512) 2048 conv3_block1_0_conv[0][0]
__________________________________________________________________________________________________
conv3_block1_3_bn (BatchNormali (None, 28, 28, 512) 2048 conv3_block1_3_conv[0][0]
__________________________________________________________________________________________________
conv3_block1_add (Add) (None, 28, 28, 512) 0 conv3_block1_0_bn[0][0]
conv3_block1_3_bn[0][0]
__________________________________________________________________________________________________
conv3_block1_out (Activation) (None, 28, 28, 512) 0 conv3_block1_add[0][0]
__________________________________________________________________________________________________
conv3_block2_1_conv (Conv2D) (None, 28, 28, 128) 65664 conv3_block1_out[0][0]
__________________________________________________________________________________________________
conv3_block2_1_bn (BatchNormali (None, 28, 28, 128) 512 conv3_block2_1_conv[0][0]
__________________________________________________________________________________________________
conv3_block2_1_relu (Activation (None, 28, 28, 128) 0 conv3_block2_1_bn[0][0]
__________________________________________________________________________________________________
conv3_block2_2_conv (Conv2D) (None, 28, 28, 128) 147584 conv3_block2_1_relu[0][0]
__________________________________________________________________________________________________
conv3_block2_2_bn (BatchNormali (None, 28, 28, 128) 512 conv3_block2_2_conv[0][0]
__________________________________________________________________________________________________
conv3_block2_2_relu (Activation (None, 28, 28, 128) 0 conv3_block2_2_bn[0][0]
__________________________________________________________________________________________________
conv3_block2_3_conv (Conv2D) (None, 28, 28, 512) 66048 conv3_block2_2_relu[0][0]
__________________________________________________________________________________________________
conv3_block2_3_bn (BatchNormali (None, 28, 28, 512) 2048 conv3_block2_3_conv[0][0]
__________________________________________________________________________________________________
conv3_block2_add (Add) (None, 28, 28, 512) 0 conv3_block1_out[0][0]
conv3_block2_3_bn[0][0]
__________________________________________________________________________________________________
conv3_block2_out (Activation) (None, 28, 28, 512) 0 conv3_block2_add[0][0]
__________________________________________________________________________________________________
conv3_block3_1_conv (Conv2D) (None, 28, 28, 128) 65664 conv3_block2_out[0][0]
__________________________________________________________________________________________________
conv3_block3_1_bn (BatchNormali (None, 28, 28, 128) 512 conv3_block3_1_conv[0][0]
__________________________________________________________________________________________________
conv3_block3_1_relu (Activation (None, 28, 28, 128) 0 conv3_block3_1_bn[0][0]
__________________________________________________________________________________________________
conv3_block3_2_conv (Conv2D) (None, 28, 28, 128) 147584 conv3_block3_1_relu[0][0]
__________________________________________________________________________________________________
conv3_block3_2_bn (BatchNormali (None, 28, 28, 128) 512 conv3_block3_2_conv[0][0]
__________________________________________________________________________________________________
conv3_block3_2_relu (Activation (None, 28, 28, 128) 0 conv3_block3_2_bn[0][0]
__________________________________________________________________________________________________
conv3_block3_3_conv (Conv2D) (None, 28, 28, 512) 66048 conv3_block3_2_relu[0][0]
__________________________________________________________________________________________________
conv3_block3_3_bn (BatchNormali (None, 28, 28, 512) 2048 conv3_block3_3_conv[0][0]
__________________________________________________________________________________________________
conv3_block3_add (Add) (None, 28, 28, 512) 0 conv3_block2_out[0][0]
conv3_block3_3_bn[0][0]
__________________________________________________________________________________________________
conv3_block3_out (Activation) (None, 28, 28, 512) 0 conv3_block3_add[0][0]
__________________________________________________________________________________________________
conv3_block4_1_conv (Conv2D) (None, 28, 28, 128) 65664 conv3_block3_out[0][0]
__________________________________________________________________________________________________
conv3_block4_1_bn (BatchNormali (None, 28, 28, 128) 512 conv3_block4_1_conv[0][0]
__________________________________________________________________________________________________
conv3_block4_1_relu (Activation (None, 28, 28, 128) 0 conv3_block4_1_bn[0][0]
__________________________________________________________________________________________________
conv3_block4_2_conv (Conv2D) (None, 28, 28, 128) 147584 conv3_block4_1_relu[0][0]
__________________________________________________________________________________________________
conv3_block4_2_bn (BatchNormali (None, 28, 28, 128) 512 conv3_block4_2_conv[0][0]
__________________________________________________________________________________________________
conv3_block4_2_relu (Activation (None, 28, 28, 128) 0 conv3_block4_2_bn[0][0]
__________________________________________________________________________________________________
conv3_block4_3_conv (Conv2D) (None, 28, 28, 512) 66048 conv3_block4_2_relu[0][0]
__________________________________________________________________________________________________
conv3_block4_3_bn (BatchNormali (None, 28, 28, 512) 2048 conv3_block4_3_conv[0][0]
__________________________________________________________________________________________________
conv3_block4_add (Add) (None, 28, 28, 512) 0 conv3_block3_out[0][0]
conv3_block4_3_bn[0][0]
__________________________________________________________________________________________________
conv3_block4_out (Activation) (None, 28, 28, 512) 0 conv3_block4_add[0][0]
__________________________________________________________________________________________________
conv4_block1_1_conv (Conv2D) (None, 14, 14, 256) 131328 conv3_block4_out[0][0]
__________________________________________________________________________________________________
conv4_block1_1_bn (BatchNormali (None, 14, 14, 256) 1024 conv4_block1_1_conv[0][0]
__________________________________________________________________________________________________
conv4_block1_1_relu (Activation (None, 14, 14, 256) 0 conv4_block1_1_bn[0][0]
__________________________________________________________________________________________________
conv4_block1_2_conv (Conv2D) (None, 14, 14, 256) 590080 conv4_block1_1_relu[0][0]
__________________________________________________________________________________________________
conv4_block1_2_bn (BatchNormali (None, 14, 14, 256) 1024 conv4_block1_2_conv[0][0]
__________________________________________________________________________________________________
conv4_block1_2_relu (Activation (None, 14, 14, 256) 0 conv4_block1_2_bn[0][0]
__________________________________________________________________________________________________
conv4_block1_0_conv (Conv2D) (None, 14, 14, 1024) 525312 conv3_block4_out[0][0]
__________________________________________________________________________________________________
conv4_block1_3_conv (Conv2D) (None, 14, 14, 1024) 263168 conv4_block1_2_relu[0][0]
__________________________________________________________________________________________________
conv4_block1_0_bn (BatchNormali (None, 14, 14, 1024) 4096 conv4_block1_0_conv[0][0]
__________________________________________________________________________________________________
conv4_block1_3_bn (BatchNormali (None, 14, 14, 1024) 4096 conv4_block1_3_conv[0][0]
__________________________________________________________________________________________________
conv4_block1_add (Add) (None, 14, 14, 1024) 0 conv4_block1_0_bn[0][0]
conv4_block1_3_bn[0][0]
__________________________________________________________________________________________________
conv4_block1_out (Activation) (None, 14, 14, 1024) 0 conv4_block1_add[0][0]
__________________________________________________________________________________________________
conv4_block2_1_conv (Conv2D) (None, 14, 14, 256) 262400 conv4_block1_out[0][0]
__________________________________________________________________________________________________
conv4_block2_1_bn (BatchNormali (None, 14, 14, 256) 1024 conv4_block2_1_conv[0][0]
__________________________________________________________________________________________________
conv4_block2_1_relu (Activation (None, 14, 14, 256) 0 conv4_block2_1_bn[0][0]
__________________________________________________________________________________________________
conv4_block2_2_conv (Conv2D) (None, 14, 14, 256) 590080 conv4_block2_1_relu[0][0]
__________________________________________________________________________________________________
conv4_block2_2_bn (BatchNormali (None, 14, 14, 256) 1024 conv4_block2_2_conv[0][0]
__________________________________________________________________________________________________
conv4_block2_2_relu (Activation (None, 14, 14, 256) 0 conv4_block2_2_bn[0][0]
__________________________________________________________________________________________________
conv4_block2_3_conv (Conv2D) (None, 14, 14, 1024) 263168 conv4_block2_2_relu[0][0]
__________________________________________________________________________________________________
conv4_block2_3_bn (BatchNormali (None, 14, 14, 1024) 4096 conv4_block2_3_conv[0][0]
__________________________________________________________________________________________________
conv4_block2_add (Add) (None, 14, 14, 1024) 0 conv4_block1_out[0][0]
conv4_block2_3_bn[0][0]
__________________________________________________________________________________________________
conv4_block2_out (Activation) (None, 14, 14, 1024) 0 conv4_block2_add[0][0]
__________________________________________________________________________________________________
conv4_block3_1_conv (Conv2D) (None, 14, 14, 256) 262400 conv4_block2_out[0][0]
__________________________________________________________________________________________________
conv4_block3_1_bn (BatchNormali (None, 14, 14, 256) 1024 conv4_block3_1_conv[0][0]
__________________________________________________________________________________________________
conv4_block3_1_relu (Activation (None, 14, 14, 256) 0 conv4_block3_1_bn[0][0]
__________________________________________________________________________________________________
conv4_block3_2_conv (Conv2D) (None, 14, 14, 256) 590080 conv4_block3_1_relu[0][0]
__________________________________________________________________________________________________
conv4_block3_2_bn (BatchNormali (None, 14, 14, 256) 1024 conv4_block3_2_conv[0][0]
__________________________________________________________________________________________________
conv4_block3_2_relu (Activation (None, 14, 14, 256) 0 conv4_block3_2_bn[0][0]
__________________________________________________________________________________________________
conv4_block3_3_conv (Conv2D) (None, 14, 14, 1024) 263168 conv4_block3_2_relu[0][0]
__________________________________________________________________________________________________
conv4_block3_3_bn (BatchNormali (None, 14, 14, 1024) 4096 conv4_block3_3_conv[0][0]
__________________________________________________________________________________________________
conv4_block3_add (Add) (None, 14, 14, 1024) 0 conv4_block2_out[0][0]
conv4_block3_3_bn[0][0]
__________________________________________________________________________________________________
conv4_block3_out (Activation) (None, 14, 14, 1024) 0 conv4_block3_add[0][0]
__________________________________________________________________________________________________
conv4_block4_1_conv (Conv2D) (None, 14, 14, 256) 262400 conv4_block3_out[0][0]
__________________________________________________________________________________________________
conv4_block4_1_bn (BatchNormali (None, 14, 14, 256) 1024 conv4_block4_1_conv[0][0]
__________________________________________________________________________________________________
conv4_block4_1_relu (Activation (None, 14, 14, 256) 0 conv4_block4_1_bn[0][0]
__________________________________________________________________________________________________
conv4_block4_2_conv (Conv2D) (None, 14, 14, 256) 590080 conv4_block4_1_relu[0][0]
__________________________________________________________________________________________________
conv4_block4_2_bn (BatchNormali (None, 14, 14, 256) 1024 conv4_block4_2_conv[0][0]
__________________________________________________________________________________________________
conv4_block4_2_relu (Activation (None, 14, 14, 256) 0 conv4_block4_2_bn[0][0]
__________________________________________________________________________________________________
conv4_block4_3_conv (Conv2D) (None, 14, 14, 1024) 263168 conv4_block4_2_relu[0][0]
__________________________________________________________________________________________________
conv4_block4_3_bn (BatchNormali (None, 14, 14, 1024) 4096 conv4_block4_3_conv[0][0]
__________________________________________________________________________________________________
conv4_block4_add (Add) (None, 14, 14, 1024) 0 conv4_block3_out[0][0]
conv4_block4_3_bn[0][0]
__________________________________________________________________________________________________
conv4_block4_out (Activation) (None, 14, 14, 1024) 0 conv4_block4_add[0][0]
__________________________________________________________________________________________________
conv4_block5_1_conv (Conv2D) (None, 14, 14, 256) 262400 conv4_block4_out[0][0]
__________________________________________________________________________________________________
conv4_block5_1_bn (BatchNormali (None, 14, 14, 256) 1024 conv4_block5_1_conv[0][0]
__________________________________________________________________________________________________
conv4_block5_1_relu (Activation (None, 14, 14, 256) 0 conv4_block5_1_bn[0][0]
__________________________________________________________________________________________________
conv4_block5_2_conv (Conv2D) (None, 14, 14, 256) 590080 conv4_block5_1_relu[0][0]
__________________________________________________________________________________________________
conv4_block5_2_bn (BatchNormali (None, 14, 14, 256) 1024 conv4_block5_2_conv[0][0]
__________________________________________________________________________________________________
conv4_block5_2_relu (Activation (None, 14, 14, 256) 0 conv4_block5_2_bn[0][0]
__________________________________________________________________________________________________
conv4_block5_3_conv (Conv2D) (None, 14, 14, 1024) 263168 conv4_block5_2_relu[0][0]
__________________________________________________________________________________________________
conv4_block5_3_bn (BatchNormali (None, 14, 14, 1024) 4096 conv4_block5_3_conv[0][0]
__________________________________________________________________________________________________
conv4_block5_add (Add) (None, 14, 14, 1024) 0 conv4_block4_out[0][0]
conv4_block5_3_bn[0][0]
__________________________________________________________________________________________________
conv4_block5_out (Activation) (None, 14, 14, 1024) 0 conv4_block5_add[0][0]
__________________________________________________________________________________________________
conv4_block6_1_conv (Conv2D) (None, 14, 14, 256) 262400 conv4_block5_out[0][0]
__________________________________________________________________________________________________
conv4_block6_1_bn (BatchNormali (None, 14, 14, 256) 1024 conv4_block6_1_conv[0][0]
__________________________________________________________________________________________________
conv4_block6_1_relu (Activation (None, 14, 14, 256) 0 conv4_block6_1_bn[0][0]
__________________________________________________________________________________________________
conv4_block6_2_conv (Conv2D) (None, 14, 14, 256) 590080 conv4_block6_1_relu[0][0]
__________________________________________________________________________________________________
conv4_block6_2_bn (BatchNormali (None, 14, 14, 256) 1024 conv4_block6_2_conv[0][0]
__________________________________________________________________________________________________
conv4_block6_2_relu (Activation (None, 14, 14, 256) 0 conv4_block6_2_bn[0][0]
__________________________________________________________________________________________________
conv4_block6_3_conv (Conv2D) (None, 14, 14, 1024) 263168 conv4_block6_2_relu[0][0]
__________________________________________________________________________________________________
conv4_block6_3_bn (BatchNormali (None, 14, 14, 1024) 4096 conv4_block6_3_conv[0][0]
__________________________________________________________________________________________________
conv4_block6_add (Add) (None, 14, 14, 1024) 0 conv4_block5_out[0][0]
conv4_block6_3_bn[0][0]
__________________________________________________________________________________________________
conv4_block6_out (Activation) (None, 14, 14, 1024) 0 conv4_block6_add[0][0]
__________________________________________________________________________________________________
conv5_block1_1_conv (Conv2D) (None, 7, 7, 512) 524800 conv4_block6_out[0][0]
__________________________________________________________________________________________________
conv5_block1_1_bn (BatchNormali (None, 7, 7, 512) 2048 conv5_block1_1_conv[0][0]
__________________________________________________________________________________________________
conv5_block1_1_relu (Activation (None, 7, 7, 512) 0 conv5_block1_1_bn[0][0]
__________________________________________________________________________________________________
conv5_block1_2_conv (Conv2D) (None, 7, 7, 512) 2359808 conv5_block1_1_relu[0][0]
__________________________________________________________________________________________________
conv5_block1_2_bn (BatchNormali (None, 7, 7, 512) 2048 conv5_block1_2_conv[0][0]
__________________________________________________________________________________________________
conv5_block1_2_relu (Activation (None, 7, 7, 512) 0 conv5_block1_2_bn[0][0]
__________________________________________________________________________________________________
conv5_block1_0_conv (Conv2D) (None, 7, 7, 2048) 2099200 conv4_block6_out[0][0]
__________________________________________________________________________________________________
conv5_block1_3_conv (Conv2D) (None, 7, 7, 2048) 1050624 conv5_block1_2_relu[0][0]
__________________________________________________________________________________________________
conv5_block1_0_bn (BatchNormali (None, 7, 7, 2048) 8192 conv5_block1_0_conv[0][0]
__________________________________________________________________________________________________
conv5_block1_3_bn (BatchNormali (None, 7, 7, 2048) 8192 conv5_block1_3_conv[0][0]
__________________________________________________________________________________________________
conv5_block1_add (Add) (None, 7, 7, 2048) 0 conv5_block1_0_bn[0][0]
conv5_block1_3_bn[0][0]
__________________________________________________________________________________________________
conv5_block1_out (Activation) (None, 7, 7, 2048) 0 conv5_block1_add[0][0]
__________________________________________________________________________________________________
conv5_block2_1_conv (Conv2D) (None, 7, 7, 512) 1049088 conv5_block1_out[0][0]
__________________________________________________________________________________________________
conv5_block2_1_bn (BatchNormali (None, 7, 7, 512) 2048 conv5_block2_1_conv[0][0]
__________________________________________________________________________________________________
conv5_block2_1_relu (Activation (None, 7, 7, 512) 0 conv5_block2_1_bn[0][0]
__________________________________________________________________________________________________
conv5_block2_2_conv (Conv2D) (None, 7, 7, 512) 2359808 conv5_block2_1_relu[0][0]
__________________________________________________________________________________________________
conv5_block2_2_bn (BatchNormali (None, 7, 7, 512) 2048 conv5_block2_2_conv[0][0]
__________________________________________________________________________________________________
conv5_block2_2_relu (Activation (None, 7, 7, 512) 0 conv5_block2_2_bn[0][0]
__________________________________________________________________________________________________
conv5_block2_3_conv (Conv2D) (None, 7, 7, 2048) 1050624 conv5_block2_2_relu[0][0]
__________________________________________________________________________________________________
conv5_block2_3_bn (BatchNormali (None, 7, 7, 2048) 8192 conv5_block2_3_conv[0][0]
__________________________________________________________________________________________________
conv5_block2_add (Add) (None, 7, 7, 2048) 0 conv5_block1_out[0][0]
conv5_block2_3_bn[0][0]
__________________________________________________________________________________________________
conv5_block2_out (Activation) (None, 7, 7, 2048) 0 conv5_block2_add[0][0]
__________________________________________________________________________________________________
conv5_block3_1_conv (Conv2D) (None, 7, 7, 512) 1049088 conv5_block2_out[0][0]
__________________________________________________________________________________________________
conv5_block3_1_bn (BatchNormali (None, 7, 7, 512) 2048 conv5_block3_1_conv[0][0]
__________________________________________________________________________________________________
conv5_block3_1_relu (Activation (None, 7, 7, 512) 0 conv5_block3_1_bn[0][0]
__________________________________________________________________________________________________
conv5_block3_2_conv (Conv2D) (None, 7, 7, 512) 2359808 conv5_block3_1_relu[0][0]
__________________________________________________________________________________________________
conv5_block3_2_bn (BatchNormali (None, 7, 7, 512) 2048 conv5_block3_2_conv[0][0]
__________________________________________________________________________________________________
conv5_block3_2_relu (Activation (None, 7, 7, 512) 0 conv5_block3_2_bn[0][0]
__________________________________________________________________________________________________
conv5_block3_3_conv (Conv2D) (None, 7, 7, 2048) 1050624 conv5_block3_2_relu[0][0]
__________________________________________________________________________________________________
conv5_block3_3_bn (BatchNormali (None, 7, 7, 2048) 8192 conv5_block3_3_conv[0][0]
__________________________________________________________________________________________________
conv5_block3_add (Add) (None, 7, 7, 2048) 0 conv5_block2_out[0][0]
conv5_block3_3_bn[0][0]
__________________________________________________________________________________________________
conv5_block3_out (Activation) (None, 7, 7, 2048) 0 conv5_block3_add[0][0]
__________________________________________________________________________________________________
avg_pool (GlobalAveragePooling2 (None, 2048) 0 conv5_block3_out[0][0]
__________________________________________________________________________________________________
predictions (Dense) (None, 1000) 2049000 avg_pool[0][0]
==================================================================================================
Total params: 25,636,712
Trainable params: 25,583,592
Non-trainable params: 53,120
In our experiment, we found the dog detection rate in dog data set reaches 100% percent; the dog detection rate in human data set reaches 1% percent. The result is very good for our application.
Dog Breed Classifier
In our experiment we designed and refined three Neural Networks, and their architectures are as follows:
(1) Self-defined Neural Network
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_6 (Conv2D) (None, 224, 224, 16) 448
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 112, 112, 16) 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 112, 112, 32) 4640
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 56, 56, 32) 0
_________________________________________________________________
conv2d_8 (Conv2D) (None, 56, 56, 64) 18496
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 28, 28, 64) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 50176) 0
_________________________________________________________________
dense_4 (Dense) (None, 300) 15053100
_________________________________________________________________
dropout_2 (Dropout) (None, 300) 0
_________________________________________________________________
dense_5 (Dense) (None, 133) 40033
=================================================================
Total params: 15,116,717
Trainable params: 15,116,717
Non-trainable params: 0
(2) Transfer learning with VGG16
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
global_average_pooling2d (Gl (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 133) 68229
=================================================================
Total params: 68,229
Trainable params: 68,229
Non-trainable params: 0
(3) Transfer learning with Inception 3
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
global_average_pooling2d (Gl (None, 2048) 0
_________________________________________________________________
dense (Dense) (None, 133) 272517
=================================================================
Total params: 272,517
Trainable params: 272,517
Non-trainable params: 0
In the following table compares their performance:
Table Dog Breed Classification Accuracy
+--------------+-----------+
| Architecture | Accuracy |
+--------------+-----------+
| Self-defined | 7.5359% |
| VGG-16 | 69.4976% |
| Inception 3 | 79.6651% |
+--------------+-----------+
From the table, we can clearly see the value of transfer learning, and Inception 3 Neural Network with transfer learning obtains the best result.
4.2 Justification
Based on our evaluation, it seems to us that person detection, dog detection and dog breed classification reaches acceptable performance for our purpose.
However, we have to admit that the person detection method has a limitation: it must require person with clear face. It may happen a person is captured by the camera without clear faces. In this situation, the method we just discussed will fail. A way to solve this problem is to use Deep Learning Neural Network, where we train people’s image with both clear faces and unclear/no faces. After that, we use the trained Neural Network to perform human detection.
5. Conclusion
5.1 Summary
In this project we set up a dog breed classification program for a given RGB picture, and it has shown good results:
- It can reject pictures if no dog or people appear in the picture.
- If the picture contains dogs, we can classify it into right dog category.
- If the picture contains people, we can find the kind of dog the person looks like.
For person detection, we use a traditional machine learning algorithm using OpenCV; for dog breed classification, we use Inception 3 neural network with transfer learning technique enabled. We verify that we can obtain descent results with the adopted techniques.
5.2 Improvement
However, like many computer vision projects, there are always exceptions that fails our system. Particular, the person detection module has a strong assumption that clear face must appear in the image. For our current application, it is a reasonable assumption, but it cannot be regarded as a good person detection method. An alternative is to use Deep Neural Network as we did with dog breed classification.
The dog breed classifier still has room for improvement:
- Image augmentation will help
- Hyper-parameter searching will help
- More data will help
- More advanced architecture will help
- ….
6. Deliverable
6.1 Application
- The application is demonstrated in dog_app.ipynb
- The best trained model for dog breed classification purpose can be found in Inception 3 dog classifier model
6.2 Gihub