A comparison between DenseNet and VGG16 (Part 2: Transfer learning and fine-tuning the models)
This is the second and last part of a blog series that compares the performance of DenseNet and VGG16. The comparison focuses on a specific image classification task using the DeFungi image dataset.
Here is the link to the first part of the series. In the first part, we went through the following steps:
- Explored the data set and inspected its balance
- Created train, dev, and test sets
- Extracted features using DenseNet
- Used PCA to visualize the extracted features
- Applied data augmentation
In this part, we are going to finalize our comparison using the following steps:
- Use transfer learning by VGG16 and DenseNet121
- Fine-tune the best model
- Analyze the results and make a conclusion
Transfer learning using DenseNet
At this step, we add a “top” to the previous model(see this post) to flatten the outcome (extracted features) and find their predicted classes using a fully connected network. I created the whole model using Keras' functional API. Two new layers are used in this model:
- Dropout: during training, random neurons are “dropped out”(switched off) with a certain probability (dropout rate). This helps to prevent the neurons at the next layers from relying too much on the previous neurons and improves the model’s generalization. You can check this video from “Andrew ng” about dropout layers.
- Global Average Pooling: Global average pooling computes the average value of each feature across all spatial dimensions. It is usually used to replace the flatten + dense layer at the top of the model.
# creating layers
densenet_base_model = DenseNet121(weights='imagenet', include_top=False)
densenet_base_model.trainable = False
global_average_layer = layers.GlobalAveragePooling2D()
# building the model
inputs = tf.keras.Input(shape=(224, 224, 3))
augmented = data_augmentation(inputs)
features_extracted = densenet_base_model(augmented)
avg_pooling = global_average_layer(features_extracted)
dropout = tf.keras.layers.Dropout(0.5)(avg_pooling)
outputs = layers.Dense(5, activation='softmax')(dropout)
model_densenet = tf.keras.Model(inputs, outputs)
In the next step, I compile the model using “sparse_categorical_crossentropy” since I haven’t one-hot encoded the y labels previously.
Tricky point: the default learning rate for the Adam optimizer is 1e-3(0.001), but we need to change it to 1e-4 since the model won’t converge.
# compiling the model using Adam optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4)
model_densenet.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
EPOCHS = 20
Before fitting the data, I create two callbacks for the model.
- early_stopping: to avoid overfitting tothe training data by monitoring the loss of validation data.
- lr_plateau: to reduce the learning rate if the validation loss does not improve for 5 consecutive epochs. In this scenario, the learning rate will be adjusted by multiplying 0.5. the minimum value for the learning rate will be 1e-7
#creating callbacks
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
lr_plateau = ReduceLROnPlateau(monitor='val_loss', patience=4, factor=0.5, min_lr=1e-7)
#fitting and storing model's data through the learning process
history_densenet = model_densenet.fit(train_ds, validation_data = val_ds,
epochs = EPOCHS,
callbacks = [early_stopping,lr_plateau])
Epoch 1/20
157/157 [==============================] - 30s 136ms/step - loss: 7.7975 - accuracy: 0.2184 - val_loss: 1.9962 - val_accuracy: 0.2647 - lr: 1.0000e-04
Epoch 2/20
157/157 [==============================] - 17s 109ms/step - loss: 5.8382 - accuracy: 0.2618 - val_loss: 1.5661 - val_accuracy: 0.4305 - lr: 1.0000e-04
Epoch 3/20
157/157 [==============================] - 17s 110ms/step - loss: 5.2994 - accuracy: 0.2830 - val_loss: 1.5497 - val_accuracy: 0.4661 - lr: 1.0000e-04
Epoch 4/20
157/157 [==============================] - 17s 111ms/step - loss: 4.5244 - accuracy: 0.3212 - val_loss: 1.4724 - val_accuracy: 0.4661 - lr: 1.0000e-04
Epoch 5/20
157/157 [==============================] - 18s 112ms/step - loss: 4.0421 - accuracy: 0.3356 - val_loss: 1.4113 - val_accuracy: 0.4928 - lr: 1.0000e-04
Epoch 6/20
157/157 [==============================] - 18s 113ms/step - loss: 3.5536 - accuracy: 0.3534 - val_loss: 1.3498 - val_accuracy: 0.5072 - lr: 1.0000e-04
Epoch 7/20
157/157 [==============================] - 17s 111ms/step - loss: 3.3167 - accuracy: 0.3614 - val_loss: 1.3512 - val_accuracy: 0.4816 - lr: 1.0000e-04
Epoch 8/20
157/157 [==============================] - 18s 112ms/step - loss: 3.0312 - accuracy: 0.3788 - val_loss: 1.2421 - val_accuracy: 0.5106 - lr: 1.0000e-04
Epoch 9/20
157/157 [==============================] - 18s 112ms/step - loss: 2.7309 - accuracy: 0.3830 - val_loss: 1.1883 - val_accuracy: 0.5284 - lr: 1.0000e-04
Epoch 10/20
157/157 [==============================] - 17s 111ms/step - loss: 2.5237 - accuracy: 0.3974 - val_loss: 1.2048 - val_accuracy: 0.5273 - lr: 1.0000e-04
Epoch 11/20
157/157 [==============================] - 18s 112ms/step - loss: 2.3720 - accuracy: 0.3928 - val_loss: 1.1541 - val_accuracy: 0.5373 - lr: 1.0000e-04
Epoch 12/20
157/157 [==============================] - 18s 112ms/step - loss: 2.1901 - accuracy: 0.4102 - val_loss: 1.1161 - val_accuracy: 0.5495 - lr: 1.0000e-04
Epoch 13/20
157/157 [==============================] - 17s 111ms/step - loss: 2.0632 - accuracy: 0.4196 - val_loss: 1.1398 - val_accuracy: 0.5406 - lr: 1.0000e-04
Epoch 14/20
157/157 [==============================] - 18s 112ms/step - loss: 1.9824 - accuracy: 0.4170 - val_loss: 1.0999 - val_accuracy: 0.5406 - lr: 1.0000e-04
Epoch 15/20
157/157 [==============================] - 18s 112ms/step - loss: 1.8620 - accuracy: 0.4356 - val_loss: 1.0513 - val_accuracy: 0.5717 - lr: 1.0000e-04
Epoch 16/20
157/157 [==============================] - 17s 111ms/step - loss: 1.7604 - accuracy: 0.4432 - val_loss: 1.0798 - val_accuracy: 0.5617 - lr: 1.0000e-04
Epoch 17/20
157/157 [==============================] - 18s 112ms/step - loss: 1.6790 - accuracy: 0.4548 - val_loss: 1.0373 - val_accuracy: 0.6029 - lr: 1.0000e-04
Epoch 18/20
157/157 [==============================] - 18s 112ms/step - loss: 1.6163 - accuracy: 0.4706 - val_loss: 1.0192 - val_accuracy: 0.6174 - lr: 1.0000e-04
Epoch 19/20
157/157 [==============================] - 18s 112ms/step - loss: 1.5657 - accuracy: 0.4596 - val_loss: 0.9980 - val_accuracy: 0.6140 - lr: 1.0000e-04
Epoch 20/20
157/157 [==============================] - 18s 112ms/step - loss: 1.5131 - accuracy: 0.4860 - val_loss: 0.9824 - val_accuracy: 0.6229 - lr: 1.0000e-04
I create a dictionary to store the results:
model_comparison = {}
model_comparison['densenet'] = pd.DataFrame(history_densenet.history)
Plotting the training and validation curves
Here, I create a function to show the performance of models to better understand what happens inside them while training.
def plot_performance(model_name, epochs = EPOCHS):
dataframe = model_comparison[model_name]
plt.figure(figsize = (12,3))
plt.subplot(1,2,1)
plt.plot(dataframe[["val_loss", "loss"]])
plt.xticks(np.arange(0,epochs,5), np.arange(1, epochs+1,5))
plt.title("Loss")
plt.legend(["val_loss", "loss"])
plt.subplot(1,2,2)
plt.plot(dataframe[["val_accuracy", "accuracy"]])
plt.legend(["val_accuracy", "accuracy"], loc = "lower right")
plt.xticks(np.arange(0,epochs,5), np.arange(1, epochs+1,5))
plt.title("Accuracy")
# showing model's performance
plot_performance('densenet')
Transfer learning using VGG16
VGG16 consists of 13 convolutional layers and 3 fully connected layers(in total, 16, VGG16). The architecture is characterized by the repeated use of 3x3 convolutional filters, which helps maintain a small receptive field and allows for a deeper network. In the picture below, you can see the structure of VGG16 model:
# loading the base model
vgg16_base_model = VGG16(input_shape=(224,224,3), include_top=False, weights='imagenet')
vgg16_base_model.trainable = False # freezing base_model's layers
global_average_layer = layers.GlobalAveragePooling2D()
#
inputs = tf.keras.Input(shape=(224, 224, 3))
augmented = data_augmentation(inputs)
features_extracted = vgg16_base_model(augmented)
avg_pooling = global_average_layer(features_extracted)
dropout = tf.keras.layers.Dropout(0.5)(avg_pooling)
outputs = layers.Dense(5, activation='softmax')(dropout)
model_vgg16 = tf.keras.Model(inputs, outputs)
model_vgg16.summary()
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58889256/58889256 [==============================] - 0s 0us/step
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) [(None, 224, 224, 3)] 0
sequential (Sequential) (None, 224, 224, 3) 0
vgg16 (Functional) (None, 7, 7, 512) 14714688
global_average_pooling2d_1 (None, 512) 0
(GlobalAveragePooling2D)
dropout_1 (Dropout) (None, 512) 0
dense_1 (Dense) (None, 5) 2565
=================================================================
Total params: 14717253 (56.14 MB)
Trainable params: 2565 (10.02 KB)
Non-trainable params: 14714688 (56.13 MB)
_________________________________________________________________
# compiling the model just like before
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4)
model_vgg16.compile(optimizer=optimizer, loss="sparse_categorical_crossentropy", metrics=['accuracy'])
#training the vgg16 while using the same api callbacks and learning rate
history_vgg16 = model_vgg16.fit(
train_set, epochs=EPOCHS, validation_data=validation_set,
callbacks = [early_stopping,lr_plateau])
Epoch 1/20
157/157 [==============================] - 33s 172ms/step - loss: 4.5763 - accuracy: 0.2048 - val_loss: 2.3839 - val_accuracy: 0.2937 - lr: 1.0000e-04
Epoch 2/20
157/157 [==============================] - 25s 156ms/step - loss: 3.7497 - accuracy: 0.2332 - val_loss: 1.9313 - val_accuracy: 0.3593 - lr: 1.0000e-04
Epoch 3/20
157/157 [==============================] - 24s 155ms/step - loss: 3.2277 - accuracy: 0.2792 - val_loss: 1.5969 - val_accuracy: 0.4305 - lr: 1.0000e-04
Epoch 4/20
157/157 [==============================] - 25s 155ms/step - loss: 2.8560 - accuracy: 0.3086 - val_loss: 1.4047 - val_accuracy: 0.4805 - lr: 1.0000e-04
Epoch 5/20
157/157 [==============================] - 24s 155ms/step - loss: 2.5932 - accuracy: 0.3364 - val_loss: 1.2908 - val_accuracy: 0.5184 - lr: 1.0000e-04
Epoch 6/20
157/157 [==============================] - 24s 155ms/step - loss: 2.3110 - accuracy: 0.3694 - val_loss: 1.1900 - val_accuracy: 0.5662 - lr: 1.0000e-04
Epoch 7/20
157/157 [==============================] - 24s 155ms/step - loss: 2.1896 - accuracy: 0.3806 - val_loss: 1.1396 - val_accuracy: 0.5795 - lr: 1.0000e-04
Epoch 8/20
157/157 [==============================] - 24s 155ms/step - loss: 1.9857 - accuracy: 0.4144 - val_loss: 1.1039 - val_accuracy: 0.5907 - lr: 1.0000e-04
Epoch 9/20
157/157 [==============================] - 25s 155ms/step - loss: 1.8629 - accuracy: 0.4236 - val_loss: 1.0641 - val_accuracy: 0.5973 - lr: 1.0000e-04
Epoch 10/20
157/157 [==============================] - 25s 156ms/step - loss: 1.7573 - accuracy: 0.4322 - val_loss: 1.0459 - val_accuracy: 0.6073 - lr: 1.0000e-04
Epoch 11/20
157/157 [==============================] - 24s 155ms/step - loss: 1.6662 - accuracy: 0.4494 - val_loss: 1.0409 - val_accuracy: 0.6096 - lr: 1.0000e-04
Epoch 12/20
157/157 [==============================] - 24s 155ms/step - loss: 1.6013 - accuracy: 0.4652 - val_loss: 1.0314 - val_accuracy: 0.6129 - lr: 1.0000e-04
Epoch 13/20
157/157 [==============================] - 24s 155ms/step - loss: 1.5416 - accuracy: 0.4748 - val_loss: 1.0007 - val_accuracy: 0.6174 - lr: 1.0000e-04
Epoch 14/20
157/157 [==============================] - 24s 155ms/step - loss: 1.4830 - accuracy: 0.4808 - val_loss: 1.0057 - val_accuracy: 0.6229 - lr: 1.0000e-04
Epoch 15/20
157/157 [==============================] - 24s 155ms/step - loss: 1.4268 - accuracy: 0.4956 - val_loss: 0.9918 - val_accuracy: 0.6318 - lr: 1.0000e-04
Epoch 16/20
157/157 [==============================] - 24s 155ms/step - loss: 1.4033 - accuracy: 0.4996 - val_loss: 0.9840 - val_accuracy: 0.6329 - lr: 1.0000e-04
Epoch 17/20
157/157 [==============================] - 24s 155ms/step - loss: 1.3541 - accuracy: 0.5006 - val_loss: 0.9849 - val_accuracy: 0.6318 - lr: 1.0000e-04
Epoch 18/20
157/157 [==============================] - 24s 155ms/step - loss: 1.3029 - accuracy: 0.5196 - val_loss: 0.9591 - val_accuracy: 0.6396 - lr: 1.0000e-04
Epoch 19/20
157/157 [==============================] - 25s 155ms/step - loss: 1.3073 - accuracy: 0.5200 - val_loss: 0.9588 - val_accuracy: 0.6407 - lr: 1.0000e-04
Epoch 20/20
157/157 [==============================] - 24s 155ms/step - loss: 1.2559 - accuracy: 0.5338 - val_loss: 0.9517 - val_accuracy: 0.6352 - lr: 1.0000e-04
model_comparison["vgg16"] = pd.DataFrame(history_vgg16.history)
plot_performance("vgg16")
The loss and accuracy curve of the VGG16 model showed a smoother behavior.
Comparing models
In this part, I compare the performance of both models on the test set. The model with higher performance is chosen for fine-tuning and a more in-depth inspection of results:
vgg16_results = model_vgg16.evaluate(test_ds)
densenet_results = model_densenet.evaluate(test_ds)
29/29 [==============================] - 5s 165ms/step - loss: 0.9320 - accuracy: 0.6353
29/29 [==============================] - 4s 138ms/step - loss: 0.9854 - accuracy: 0.6053
print("VGG16 model test loss and accuracy score(in order):", vgg16_results[0],vgg16_results[1] )
print("DenseNet model test loss and accuracy score(in order):", densenet_results[0],densenet_results[1])
VGG16 model test loss and accuracy score(in order): 0.9319728016853333 0.635254979133606
DenseNet model test loss and accuracy score(in order): 0.9854319095611572 0.6053215265274048
Although the accuracy score is not a comprehensive measure for an imbalanced dataset with 5 classes, I use it as an early indicator to choose between the two models. As we expected, the VGG16 model that showed a smoother behavior got a higher accuracy score and lower loss in comparison to the DenseNet model.
Fine-tuning the VGG16
Fine-tuning, in most cases, is when we decide to unfreeze (some of the final and consecutive) layers of the base model(the model that we used for transfer learning) to make the model more specific to our task. As a result, we might get a higher performance score for our model.
Un-freeze the top layers of the model
# how many layers are in the base model?
print("Number of layers in the base model: ", len(vgg16_base_model.layers))
vgg16_layers = [layer.name for layer in vgg16_base_model.layers]
print(vgg16_layers)
Number of layers in the base model: 19
['input_4', 'block1_conv1', 'block1_conv2', 'block1_pool', 'block2_conv1', 'block2_conv2', 'block2_pool', 'block3_conv1', 'block3_conv2', 'block3_conv3', 'block3_pool', 'block4_conv1', 'block4_conv2', 'block4_conv3', 'block4_pool', 'block5_conv1', 'block5_conv2', 'block5_conv3', 'block5_pool']
There are 19 layers in this model. The last 4 layers are a new start after a pooling layer(‘block5_conv1’, ‘block5_conv2’, ‘block5_conv3’, ‘block5_pool’). I select this part of the model to be fine-tuned:
# swithcing on all layers to be trainable
vgg16_base_model.trainable = True
# swithcing off(freezing) all layers except the last 4 layers
fine_tune_last = 4
for layer in vgg16_base_model.layers[:-fine_tune_last]:
layer.trainable = False
Tricky point: Since we are training a larger model, it might overfit the training data so fast. So, it’s recommended to use a lower learning rate. instead of 1e-4, I use 1e-5 as the new learning rate. other settings are the same as before for compiling the model:
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-5)
model_vgg16.compile(optimizer=optimizer, loss="sparse_categorical_crossentropy", metrics=['accuracy'])
model_vgg16.summary()
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) [(None, 224, 224, 3)] 0
sequential (Sequential) (None, 224, 224, 3) 0
vgg16 (Functional) (None, 7, 7, 512) 14714688
global_average_pooling2d_1 (None, 512) 0
(GlobalAveragePooling2D)
dropout_1 (Dropout) (None, 512) 0
dense_1 (Dense) (None, 5) 2565
=================================================================
Total params: 14717253 (56.14 MB)
Trainable params: 7081989 (27.02 MB)
Non-trainable params: 7635264 (29.13 MB)
_________________________________________________________________
I run the model 10 epochs for fine-tuning and continue learning(from the last epoch).
Tricky point: when fine-tuning on small datasets, we usually use fewer epochs to avoid overfitting.
fine_tune_epochs = 10
total_epochs = EPOCHS + fine_tune_epochs
history_fine = model_vgg16.fit(train_set,
epochs=total_epochs,
initial_epoch=history_vgg16.epoch[-1],
validation_data=validation_set)
Epoch 20/30
157/157 [==============================] - 32s 178ms/step - loss: 1.0849 - accuracy: 0.5742 - val_loss: 0.8869 - val_accuracy: 0.6630
Epoch 21/30
157/157 [==============================] - 29s 176ms/step - loss: 0.9637 - accuracy: 0.6252 - val_loss: 0.7979 - val_accuracy: 0.6830
Epoch 22/30
157/157 [==============================] - 28s 175ms/step - loss: 0.8655 - accuracy: 0.6494 - val_loss: 0.7542 - val_accuracy: 0.6963
Epoch 23/30
157/157 [==============================] - 27s 174ms/step - loss: 0.8052 - accuracy: 0.6730 - val_loss: 0.7138 - val_accuracy: 0.7063
Epoch 24/30
157/157 [==============================] - 28s 175ms/step - loss: 0.7499 - accuracy: 0.7022 - val_loss: 0.6835 - val_accuracy: 0.6986
Epoch 25/30
157/157 [==============================] - 28s 175ms/step - loss: 0.7151 - accuracy: 0.7142 - val_loss: 0.6479 - val_accuracy: 0.7219
Epoch 26/30
157/157 [==============================] - 28s 175ms/step - loss: 0.7014 - accuracy: 0.7168 - val_loss: 0.6268 - val_accuracy: 0.7353
Epoch 27/30
157/157 [==============================] - 28s 175ms/step - loss: 0.6616 - accuracy: 0.7424 - val_loss: 0.6160 - val_accuracy: 0.7497
Epoch 28/30
157/157 [==============================] - 28s 179ms/step - loss: 0.6327 - accuracy: 0.7456 - val_loss: 0.6157 - val_accuracy: 0.7397
Epoch 29/30
157/157 [==============================] - 28s 175ms/step - loss: 0.6357 - accuracy: 0.7406 - val_loss: 0.5891 - val_accuracy: 0.7542
Epoch 30/30
157/157 [==============================] - 28s 175ms/step - loss: 0.6163 - accuracy: 0.7568 - val_loss: 0.5918 - val_accuracy: 0.7486
fine_tune_df = pd.DataFrame(history_fine.history)
model_comparison['vgg16_fineTuned'] = pd.concat([model_comparison['vgg16'],fine_tune_df]).reset_index()
plot_performance('vgg16_fineTuned')
Fune tuning decreased training loss quickly but with a lower rate for the training set. This process also improved validation accuracy from 64.4% to 76.4%.
It’s time to find the final classification report on the test data.
Measuring the final results
predictions = model_vgg16.predict(test_ds)
vgg16_fineTuned_results = model_vgg16.evaluate(test_ds)
29/29 [==============================] - 4s 123ms/step
29/29 [==============================] - 3s 119ms/step - loss: 0.5808 - accuracy: 0.7749
print("the VGG16 tuned model test loss and accuracy score(in order):", vgg16_fineTuned_results[0],vgg16_fineTuned_results[1] )
the VGG16 tuned model test loss and accuracy score(in order): 0.5807520151138306 0.774944543838501
The model was capable of generalizing results to the test set. It got a 76.8% accuracy score on the test set with a low loss value(0.57) in comparison to the first epochs of the model.
# true labels using test dataset
y_true = np.concatenate([label for pic, label in test_ds], axis=0)
# final predicted labels
y_pred = np.argmax(predictions, axis=1)
# showing the confusion matrix of the second model
cm = confusion_matrix(y_true, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix = cm,
display_labels=class_names)
disp.plot()
plt.show()
print(classification_report(y_true, y_pred, target_names=class_names, digits=3))
precision recall f1-score support
H1 0.773 0.895 0.829 437
H2 0.694 0.468 0.559 233
H3 0.698 0.817 0.753 82
H5 0.900 0.900 0.900 80
H6 0.952 0.857 0.902 70
accuracy 0.775 902
macro avg 0.803 0.787 0.789 902
weighted avg 0.771 0.775 0.764 902
Results
- The VGG16 model showed a higher performance(measured by test set accuracy score) in comparison to the DenseNet121 model.
- The model showed the highest precision score for the H6 class and the lowest for the H2 class.
- If we consider the f1 score as a measure of the overall performance of the model, our model’s performance is as follows: H6, H5, H1, H3, and H2. For H2, the model’s performance was just a little bit higher than 0.5.
- Fine-tuning helped to increase our validation accuracy score from 64.4% to 76.4%.
- The overall accuracy we achieved, as shown, was 76.8% after 30 epochs.
Future actions
- We can run the tuned vgg16 model for more epochs, like 50 or more. I ran the initial models only for 20 epochs due to limitations in computational power.
- Also, we can add a more advanced top to the base model. The top layers I used were the simplest ones and were used just for demonstration purposes. In a real-world project, there is a big game there to try and test different heads for our model.
- We can also test other base models like InceptionV3, MobileNet, and … and compare the results.
You can find the full notebook of this experiment on my GitHub, where there are other experiments regarding NLP, Computer Vision, R, and …
Follow me on LinkedIn and GitHub.