Training a Computer Vision 101

Image Classification using Convolutional Neural Network and Transfer Learning

Computer Vision is a computer science field that focuses on replicating parts of the complexity of the human vision and enabling computers to identify and process objects in images and videos the same way that humans do. On a certain level Computer Vision is all about pattern recognition. One way to train a computer how to understand visual data is to feed it images, lots of it! Thousands or millions, if possible, that are diverse and have been labeled so we can then subject those images to various machine learning techniques or algorithms, which allow the computer to hunt down patterns and features in all the elements that relate to those labels.

Image Classification Algorithms came into existence for decreasing the gap between computer vision and human vision. The machine learning in Convolutional Neural Network consists of feature extraction or mapping module that extracts the important features such as edges, textures, etc and a classification module that classify based on the features extracted.

In this post, you will understand the steps on how to train a Convolutional Neural Network as an image classifier and explore the application of Transfer Learning from a pre-trained neural network.

After reading this post, you will know:

  • Steps needed in building and training a Convolutional Neural Network
  • Transfer learning allows the use of pre-trained models directly, as feature extraction preprocessing, and integrated it into entirely new models.

In order for us to build a Convolutional Neural Network, let’s first follow the basic steps on building a Machine Learning model. Here is our sample dataset from tensorflow that consists of 3677 images of flowers with 5 classes. 5 types of flowers under five classifications are sunflower, rose, tulip, dandelion, and daisy.

The flower dataset
  1. First step on building a Machine Learning model is to have a clear goal in mind. Since our dataset provides us images of 5 types of flowers, let’s build a model that classifies a flower in an image given.
  2. See the images above? Now let’s go to the next step which is Exploring our Dataset. What are you noticing base on the initial look with the above images? Notice any of the following?
  • Close-Ups and Zoom-Outs
  • Color Scheme
  • Flower’s life cycle
  • Focus
  • Frame Positioning
  • Lighting
  • Photo View
  • Pixel Sizes
  • Presence of Objects
Image count per classes in flower dataset

Understanding the dataset is crucial in setting up your Machine Learning model. Since when training a computer, it’s ideal to have thousands, if not millions, of images and we are only given hundreds of images per class, it is very important to understand the images that we have so we can strategize diversifying our dataset using data augmentation techniques.

3. After Exploratory Data Analysis, we can now start building our input pipeline. This involves splitting our data to train and unseen data. With a very limited amount of data and imbalanced image count on each class, we can take equal images from each classes and assign it to our train dataset and the rest to unseen dataset. The train dataset is where we do another split for training and validation of our Machine Learning model. Our unseen data will be our images that we can use to see if our model is working on new, unseen images.

Image Size of Images in flower dataset

After making our decision on our input, train dataset, we then now preprocess those images using keras and split it to the model’s training and validation data. Deep learning neural network models learn a mapping from input variables to an output variable. Above shows different image sizes in our dataset so we have to rescale our image during preprocessing.

The Structure of a Basic Convolution Neural Network

4. Now that the input data is ready, we can now build a base model. For this dataset, we can start with the most common Convolutional Layer used — Conv2D. Convolutional Layer is a layer that adds filters on the image so the machine can enhance the features in the image and learn from it. Since this will produce varying images from a single image, we can add a Pooling Layer after the Conv2D layer to help avoid overfitting by providing an abstracted form of the representation. Pooling also increases computational efficiency of the machine as it reduces the number of parameters to learn.

After building our Feature Map, Convolutional Layers and Pooling Layers, we can now send the data to our Neural Network.

5. Since we have very low amount of data, Data Augmentation techniques are a helpful way to diversify and expand our dataset by adding slightly modified copies of already existing data. Below are some of the data augmentation techniques that you can use using keras.

6. Now let’s train the model! Since it’s time consuming to go through each and every image in the dataset and account for adjustments for each of their features to train, it’s always a best practice in training a Machine Learning model to have a base model so we can fine-tune it and have a comparison for its performance.

Performance of the Base model compared to the Finetuned model.

Fine-tuning takes a model that has already been trained for a particular task and then fine-tuning or tweaking it to make it perform a second similar task. From this point on, you can think of each of the layer parameters as fine-tuning knobs. You can iterate from steps 4–6 as your fine-tuning steps and train the model until you get an acceptable score.

Now let’s explore Transfer Learning by using ResNet50. ResNet50 is a Convolutional Neural Network that is 50 layers deep. Pretrained version of the network is loadable and are trained on more than million images from the ImageNet database with hundreds of classes. Below are the steps for the basic transfer learning model using ResNet50 with Convolutional Neural Network.

  1. What we want is to use this pretrained model and replace the input and output layers with our dataset and classes. We can do this by ‘freezing’ all layers except the last layer.
Loading ResNet50 and keeping all layers at ‘freeze’ except the last one.

The code above should print all the layers from ResNet50 and it should show False to all of them except the last one.

2. Since ResNet50 is 50 layers of deep layer, we’ll need to use Normalization and Regularization techniques to avoid overfitting.

Connecting Transfer Learning from ResNet50 with Regularizers and Normalizers

3. Now, we can fit and train the model!

Here’s the results in comparison to our Convolutional Neural Network above:

Fine-tuned Convolutional Neural Network vs Transfer Learning Performance

4. Now, let’s visualize how our model is classifying an unseen image.

As we can see above, the model is still misclassifying a flower from the last image. Even though our Transfer Learning model, has a high accuracy rate of over 90%, when we input an unseen data it still has a high rate of misclassification. Accounting that we have a small number of dataset compared to the desirable hundreds of thousand of images to train an image classification model, it is understandable that there is a need for human oversight and more diverse dataset to train image classifier models. However with all the correct classifications, this shows the effectiveness of deep learning algorithm.


Data Scientist who is passionate about improving machine learning algorithms, creating a positive impact, and solving real-world problems.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store