A comparison between DenseNet and VGG16 (Part 1: PCA and DenseNet)
Introduction
This series of blog posts aims to compare the performance of DenseNet and VGG16 for a specific image classification task using the DeFungi image dataset.
About the dataset
This dataset comes with 5000 quality images for training. The images are from superficial fungal infections caused by yeasts, molds, or dermatophyte fungi. The images have been manually labeled into five classes and curated with assistance from a subject matter expert. The images have been cropped with automated algorithms to produce the final dataset.
You can find the reference article of the dataset at P456 Defungi: direct mycological examination of microscopic fungi images. In the directory of the dataset, here are 5 sub-directories representing the following fungi types, each corresponding to a class:
- H1: Candida albicans
- H2: Aspergillus niger
- H3: Trichophyton rubrum
- H5: Trichophyton mentagrophytes
- H6: Epidermophyton floccosum
Let’s read the data and take a closer look at it answering these questions:
- How many examples do we have for training and testing?
- How big are the images?
- What’s the percentage of each category we observe?
# path for each partition of the data
data_directory = pathlib.Path("/kaggle/input/microscopic-fungi-images/")
# creating a pathlib object for each partition
train_subdir = data_directory.joinpath("train")
validation_subdir = data_directory.joinpath("valid")
test_subdir = data_directory.joinpath("test")
training_images = [file for subdir in train_subdir.glob('*') for file in subdir.glob('*')]
validation_images = [file for subdir in validation_subdir.glob('*') for file in subdir.glob('*')]
test_images = [file for subdir in test_subdir.glob('*') for file in subdir.glob('*')]
print(f"Number of training examples: {len(training_images)}",
f"\nNumber of validation examples: {len(validation_images)}",
f"\nNumber of test examples: {len(test_images)}")
Resutl:
Number of training examples: 5000
Number of validation examples: 899
Number of test examples: 902
Before comparing the models, we need to make sure whether our training data is balanced or not. In case we have an imbalanced dataset, it’s highly recommended to create synthetic images for the minor classes so that they have as many observations as the majority class.
# counting the number of observations per training labels
labels = []
image_count = []
for subdir in train_subdir.iterdir():
count = 0
label = subdir.name
labels.append(label)
for image in subdir.glob("*.jpg"):
count += 1
image_count.append(count)
print(f"there are {len(labels)} labels and {sum(image_count)} images in the data set\n")
print(labels,"-->", image_count)
Result:
['H3', 'H6', 'H5', 'H1', 'H2'] --> [1000, 1000, 1000, 1000, 1000]
As we can see, the training dataset is completely balanced, making our training process much easier. There are one thousand images per label. As the last step, let’s find the dimensions of an exemplar picture.
Creating train-dev-test set
Then, we create our train, dev, and test sets with batch size equal to 32.
batch_size = 32
target_size = (224,224)
# creating the training, validation and test sets
train_set= image_dataset_from_directory(
train_subdir,
image_size=target_size,
batch_size=batch_size,
)
validation_set= image_dataset_from_directory(
validation_subdir,
image_size=target_size,
batch_size=batch_size,
)
test_set = image_dataset_from_directory(
test_subdir,
image_size=target_size,
batch_size=batch_size,
)
Visualizing the pictures
At this step, we visualize nine exemplar pictures:
Since these are microscopic images of fungi, they’re not easy to distinguish by users outside the domain knowledge. To my understanding, the H5 class is like bubbles and is rounded, while H1 and H2 are more stretched and thinner.
Configuring the datasets for better performance
- It is recommended to cache the datasets, using cache() method before feeding the data to the model. It stores the elements of a dataset in memory or on disk and helps avoid redundant data loading during training.
- Also, prefetch() methods overlaps data loading and model execution and prefetches batches in the background while the model is training on the current batch and helps reduce data loading latency and keeps the GPU or CPU busy.
AUTOTUNE = tf.data.AUTOTUNE #dynamically tuning the parallelism based on available system resources
train_ds = train_set.cache().prefetch(buffer_size=AUTOTUNE)
val_ds = validation_set.cache().prefetch(buffer_size=AUTOTUNE)
test_ds = test_set.cache().prefetch(buffer_size=AUTOTUNE)
Comparing DenseNet and VGG16
DenseNet-121 is a convolutional neural network (CNN) architecture that belongs to the family of Densely Connected Convolutional Networks (DenseNets). The key characteristic of DenseNet architectures is the dense connectivity pattern, where each layer receives input from all preceding layers. This model attempts to solve the problem of vanishing gradients in deep CNN architecture by reusing features through a connection of each block with one another. The structure of this model is shown:
# creating the base model
densenet_base_model = DenseNet121(weights='imagenet', include_top=False, input_shape=(224,224,3))
densenet_base_model.trainable = False
# getting the result of applying densenet model
extracted_features = densenet_base_model.predict(train_set)
Using PCA to visualize the extracted features
The output (training_features) of the Densenet will be a 4D array (batch size, height, width, channels):
# finding the shapes of the model output
n_training, x, y, z = extracted_features.shape
print("Number of training samples:", n_training)
print("Height of each feature map:", x)
print("Width of each feature map:", y)
print("Number of channels in each feature map:", z)
Result:
Number of training samples: 5000
Height of each feature map: 7
Width of each feature map: 7
Number of channels in each feature map: 1024
As you remember, we have 5000 training pictures. The output of the DenseNet121 model transforms them from (224,224,3) shape into (7,7,1024)(a reduction in the height and width of pictures and an increase in the number of channels).
To visualize the data, we need to reduce the number of dimensions for each observation from 7*7*1024 to 2(a 2D plot). To do this, we need to use PCA(principle component analysis), which is a dimensionality reduction technique used in machine learning and statistics. The primary goal of PCA is to transform high-dimensional data into a lower-dimensional representation while retaining as much of the original variability as possible.
#reduce the dimensionality to 2 principal components.
pca_2 = decomposition.PCA(n_components = 2)
# reshapeing the train_features array into a 2D array
X = extracted_features.reshape((n_training, -1))
pca_2.fit(X)
transformed_data = pca_2.transform(X)
print("the shape of the transformed data: ", transformed_data.shape)
print("the percentage of variance explained by each of the selected components ",
np.round(pca_2.explained_variance_ratio_,3)*100)
Result:
the shape of the transformed data: (5000, 2)
the percentage of variance explained by each of the selected components [28.2 15.700001]
We have 5000 rows(training observations), and each one of these training examples(that belongs to a specific class) has only two dimensions(previously, they had 771024 dimensions). Now, we can visualize them using a scatter plot:
first_component = transformed_data[:,0]
second_component = transformed_data[:,1]
training_labels = np.concatenate([label for pic, label in train_set], axis=0)
# creating a separeted scatter plot for each class
plt.subplots(figsize=(10,10))
for i, class_name in enumerate(class_names):
plt.subplot(3, 2, i + 1)
plt.scatter(first_component[training_labels == i][:100], # showing 100 examples of each class
second_component[training_labels == i][:100],
label = class_name, alpha=0.4)
plt.title(class_name)
plt.tight_layout()
plt.suptitle("PCA Projection", y = 1.02, fontsize = 13)
plt.show()
By comparing the outcome of the Densent model after applying PCA, we tend to find whether we can find some similarities (using a simple 2D scatter plot) by human-eye inspection or not. H2, H3, H5 and H6 have shown similar behaviors, while H1 is the most distinctive class for this model.
Data Augmentation
In data augmentation, we create synthetic data from real data to improve the generalization and robustness of our model. In fact, we feed our model with different versions of the same file, so it won’t get sensitive to a specific direction, size, gesture, or …(any pattern that is not related to the class of data). In fact, we are vaccinating our model for different versions of the same virus!
I apply random rotation, zoom, flip, shifting images up and down, and change in contrast and brightness.
data_augmentation = tf.keras.Sequential([
layers.RandomRotation(factor=0.2), # Random rotation (up to 20%)
layers.RandomZoom(height_factor=0.2, width_factor=0.2), # Random zoom
layers.RandomFlip(mode="horizontal"), # Random horizontal flip
layers.RandomTranslation(height_factor=0.1, width_factor=0.1), # Random translation
layers.RandomContrast(factor=0.2), # Random contrast adjustment
layers.RandomBrightness(factor=0.2), # Random brightness adjustment
])
#showing the effect of image augmentation
plt.figure(figsize = (5,5))
for image, _ in train_set.take(1):
first_image = image[2] # selecting the second image as an example
for i in range(9): #apply the augmentation layer 9 times
ax = plt.subplot(3, 3, i + 1)
augmented_image = data_augmentation(tf.expand_dims(first_image, 0))
plt.imshow(augmented_image[0] / 255)
plt.axis('off')
plt.suptitle('Nine augmented versions of the same picture', fontsize=13)
plt.show()
In the picture above, you can see that some pictures are flipped, some have more contrast, and one of them is brighter than other examples. This process prevents the models from overfitting to the training data.
In the next blog post, we use DenseNet and VGG16 to classify the data and compare their performances.