TensorFlow: Evaluating the Saved Bird Species Prediction Model02 May 2022 / 19 minutes to read Elena Daehnhardt |
Introduction
In my previous post “TensorFlow: Transfer Learning (Fine-Tuning) in Image Classification”,
I have described building a convolutional neural network based on EffecientNetB0 (initially trained on
the ImageNet dataset), which underwent the feature extraction and fine-tuning steps using
the 400 Bird Species Dataset at Kaggle.
This was an exciting experiment since the ImageNet dataset contains only 40 bird species,
while the Kaggle dataset has 400 bird species. Despite such differences in the underlying data,
the model trained so well that the final model reached 98.5% accuracy on the test set.
In this blog post, I am going to load this model saved in my deep learning repository and
evaluate its performance in detail to determine which birds are not well predicted.
Getting Data and Code
Using Helper Functions
I have shared my helpers.py Python script contains some useful functions for data preprocessing, model creation, and evaluation. You can use this file as you like, change it and share with me your ideas :) I will discuss the code parts that are useful in analysing the fitted bird species prediction model.
# Getting helper functions
!wget https://raw.githubusercontent.com/edaehn/deep_learning_notebooks/main/helpers.py
--2022-05-02 10:47:37-- https://raw.githubusercontent.com/edaehn/deep_learning_notebooks/main/helpers.py Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 33925 (33K) [text/plain] Saving to: 'helpers.py' helpers.py 100%[===================>] 33.13K --.-KB/s in 0.002s 2022-05-02 10:47:38 (14.4 MB/s) - ‘helpers.py’ saved [33925/33925]
# Import files library from google.colab
from google.colab import files
# Import all functions from the helpers.py
from helpers import *
Downloading the Birds Species Dataset from Kaggle
Before getting the dataset, you need to upload your kaggle.json into the Colab file system.
# Setup to download Kaggle datasets into a Colab instance
! pip install kaggle
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json
Requirement already satisfied: kaggle in /usr/local/lib/python3.7/dist-packages (1.5.12) Requirement already satisfied: python-slugify in /usr/local/lib/python3.7/dist-packages (from kaggle) (6.1.2) Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.7/dist-packages (from kaggle) (1.15.0) Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from kaggle) (4.64.0) Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from kaggle) (2.23.0) Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/dist-packages (from kaggle) (1.24.3) Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from kaggle) (2.8.2) Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from kaggle) (2021.10.8) Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.7/dist-packages (from python-slugify->kaggle) (1.3) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->kaggle) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->kaggle) (2.10)
You see here “Requirement already satisfied” messages. I have already installed the kaggle library. You will need to run these commands for installing the kaggle package. Next, we can get the dataset directly from Kaggle.
! kaggle datasets download gpiosenka/100-bird-species/birds -p /content/sample_data/birds --unzip
Downloading 100-bird-species.zip to /content/sample_data/birds 100% 1.49G/1.49G [00:21<00:00, 60.5MB/s] 100% 1.49G/1.49G [00:21<00:00, 75.6MB/s]
Getting the Trained Model
I have created a fine-tuned bird species predictive model in my previous set of experiments. This model is saved in my GitHub repository, and we further reuse it.
# Getting saved fine-tuned EffecientNetB0 model
!wget https://github.com/edaehn/deep_learning_notebooks/raw/main/models/model_4_bird_species_prediction.zip
--2022-05-02 10:48:38-- https://github.com/edaehn/deep_learning_notebooks/blob/main/models/model_4_bird_species_prediction.zip Resolving github.com (github.com)... 140.82.113.4 Connecting to github.com (github.com)|140.82.113.4|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘model_4_bird_species_prediction.zip’ model_4_bird_specie [ <=> ] 123.66K --.-KB/s in 0.08s 2022-05-02 10:48:38 (1.59 MB/s) - ‘model_4_bird_species_prediction.zip’ saved [126623]
Let’s unzip the trained model. The model is unzipped into the “model_4” directory.
# Unzipping saved model
unzip_file("/content/model_4_bird_species_prediction.zip")
True
Checking that the Dataset is Loaded Correctly
The function “walk_directory” (helpers.py) shows the number of directories and files in the “sample_data/birds” directory.
# Define the directory wherein the dataset is stored
dataset_path = "sample_data/birds"
# Show file numbers in the directory "sample_data/birds"
walk_directory(dataset_path)
There are 4 directories and '5" files in sample_data/birds. There are 400 directories and '0" files in sample_data/birds/train. There are 0 directories and '146'' files in sample_data/birds/train/AFRICAN EMERALD CUCKOO. There are 0 directories and '160'' files in sample_data/birds/train/CANARY. There are 0 directories and '197" files in sample_data/birds/train/RED BEARDED BEE EATER. There are 0 directories and '154'' files in sample_data/birds/train/SCARLET CROWNED FRUIT DOVE. There are 0 directories and '201" files in sample_data/birds/train/VIOLET GREEN SWALLOW. There are 0 directories and '130'' files in sample_data/birds/train/GOULDIAN FINCH. .....
The function show_five_birds() draws five random birds from the dataset.
show_five_birds(dataset_path=dataset_path)
sample_data/birds/train/ALEXANDRINE PARAKEET ['015.jpg'] Image shape: (224, 224, 3) sample_data/birds/train/CHESTNET BELLIED EUPHONIA ['056.jpg'] Image shape: (224, 224, 3) sample_data/birds/train/ANDEAN SISKIN ['111.jpg'] Image shape: (224, 224, 3) sample_data/birds/train/EMERALD TANAGER ['129.jpg'] Image shape: (224, 224, 3) sample_data/birds/train/MOURNING DOVE ['032.jpg'] Image shape: (224, 224, 3)
Getting Training and Test Data
# Getting training and test datasets
train_data, test_data = get_image_data(dataset_path=dataset_path, IMG_SIZE = (224, 224))
Found 58388 files belonging to 400 classes. Found 2000 files belonging to 400 classes.
Loading and Evaluating the Trained Model
With Keras’ load_model(), we load in the previously trained model.
# Load unzipped model
loaded_model = tf.keras.models.load_model("model_4")
Next, the loaded model is evaluated with the test dataset. It got an accuracy of .985, which is pretty good! However, it is always nice to check the not well-predicted samples. Knowing the wrong predictions could give us ideas on how to improve our model. For instance, we could add more bird samples in their respective training folders.
# Evaluate on the full test dataset
loaded_model.evaluate(test_data)
63/63 [==============================] - 20s 142ms/step - loss: 0.0537 - accuracy: 0.9845 [0.053718529641628265, 0.984499990940094]
The Wrongest Bird Predictions
It is pretty interesting and helpful to find the test samples mispredicted. This could give us insights into how the model works and what could still be improved. My initial thought on this problem was that possibly, incorrectly predicted birds are somehow similar (for instance, in color or shape) with the bird species they are wrongly assigned to. Let’s check it out using the following steps:
- load the test dataset;
- use the model for predicting bird species probabilities;
- get the classes (bird species) corresponding with the highest prediction probabilities;
- create a Pandas dataframe storing image paths to the test bird images, their actual class labels,
- predicted class labels, prediction probabilities;
- get only incorrectly predicted bird images into a new dataframe, and sort it out in the descending order of prediction probability;
- show images of test birds (left side) and images of their predictions (right side).
To realise these steps, I have created two functions (see helpers.py), show_wrongly_predicted_images() for building up Pandas DataFrames using the test dataset and the trained model, and show_one_wrongly_predicted() for showing two bird species side by side for each test sample (step 7).
def show_wrongly_predicted_images(model, dataset_directory="sample_data/birds", top_wrong_predictions_number_to_show=False):
test_data = tf.keras.preprocessing.image_dataset_from_directory(
directory=dataset_directory + "/test",
label_mode="categorical",
image_size=(224, 224),
shuffle=False
)
class_names = test_data.class_names
# 2. Use model for predictions
prediction_probabilities = model.predict(test_data, verbose=1)
# Check the predictions we have got
# print(f"Number of test rows: {len(test_data)}, \
# number of predictions: {len(prediction_probabilities)}, \
# shape of predcitions: {prediction_probabilities.shape}, \
# the first prediction: {prediction_probabilities[0]}")
# Getting indices of the predicted classes
prediction_classes_index = prediction_probabilities.argmax(axis=1)
# Get indices of our test_data BatchDataset
test_labels = []
for images, labels in test_data.unbatch():
test_labels.append(labels.numpy().argmax())
sklearn_accuracy = accuracy_score(y_true=test_labels,
y_pred=prediction_classes_index)
# 3. Finding where our model is most wrong
# Find all files in the test dataset
filepaths = []
for filepath in test_data.list_files(dataset_directory + "/test/*/*.jpg",
shuffle=False):
filepaths.append(filepath.numpy())
# Create a dataframe
predictions_df = pd.DataFrame({"images_path": filepaths,
"y_true": test_labels,
"y_predicted": prediction_classes_index,
"prediction_confidence": prediction_probabilities.max(axis=1),
"true_classname": [class_names[i] for i in test_labels],
"predicted_classname": [class_names[i] for i in prediction_classes_index]})
# See which birds predicted correctly/incorrectly
predictions_df["correct_prediction"] = predictions_df["y_true"] == predictions_df["y_predicted"]
# Sort out the dataframe to find the most wrongly predicted classes
top_wrong = predictions_df[predictions_df["correct_prediction"] == False].sort_values("prediction_confidence",
ascending=False)
# 4. Plot top top_wrong_predictions_number_to_show number of predictions
top = zip(top_wrong["images_path"], top_wrong["true_classname"], top_wrong["predicted_classname"], top_wrong["prediction_confidence"])
print(f"Wrongly predicted {len(top_wrong)} out of {len(predictions_df)}")
if top_wrong_predictions_number_to_show:
top = top[:top_wrong_predictions_number_to_show]
for filename1, label1, label2, prob in top:
filename2 = "/content/sample_data/birds/train/"+ label2 + "/" + random.sample(os.listdir("/content/sample_data/birds/train/" + label2), 1)[0]
# print(f"{filename1}: {filename2}")
show_one_wrongly_predicted(filename1, filename2, label1, label2+f" (prob={prob:.2f})")
return sklearn_accuracy
def show_one_wrongly_predicted(filename1, filename2, label1, label2):
"""
Loads two images from their full-path filenames and show them in one plot with own titles corresponding to their
class labels.
:param filename1: full-path filename to the first image, the test image we are predicting.
:param filename2: full-path to the second image relating to the predicted class.
:param label1: true class label
:param label2: predicted class label.
:return:
"""
img1 = tf.io.read_file(filename1)
img1 = tf.image.decode_image(img1, channels=3)
img2 = tf.io.read_file(filename2)
img2 = tf.image.decode_image(img2, channels=3)
figure, ax = plt.subplots(1, 2);
ax.ravel()[0].imshow(img1);
ax.ravel()[0].set_title(label1);
ax.ravel()[0].set_axis_off();
ax.ravel()[1].imshow(img2);
ax.ravel()[1].set_title(label2);
ax.ravel()[1].set_axis_off();
plt.axis(False);
# Show top wrongly predicted birds
show_wrongly_predicted_images(loaded_model)
As we see from the images of wrongly predicted bird species, all of the are indeed alike in color and shape. Moreover, some species are very close to the bird families that you need to be an ornitologist or research the bird species to know the little differences between both species, predicted and the actual. For instance, Avadavat was predcited as a strawberry Finch with probability of 91%. In Wikipedia article about Red Avadavat we can learn that this bird is a Strawberry Finch belonging to the family of Estrildidae originating from India and is “is popular as a cage bird”.
At the end of this article, I am asking you, my dear readers, please, do not keep your pet birds in cages all the time. Birds need to be happy and fly, even only in a well-ventilated room or a proprietary-sized aviary. Otherwise, birds get depressed, suffer psychological trauma, and even a weakened heart due to obesity and lack of training. Do not imprison birds or other animals, and we all deserve to be happy and free! In return, your pet bird will become a loving and cheerful friend.
Predicting a Bird Downloaded from Web
As a bonus section, I will try predicting a bird species with an image of a red avadavat downloaded from the BlogSpot website. Will it be well predicted?
!wget http://2.bp.blogspot.com/-EB4avRIsLQ8/Tv25pjMDi3I/AAAAAAAAB9s/Io8ybYRjjFM/s1600/Red+avadavat+Amandava+amandava.jpg
--2022-05-03 12:21:25-- http://2.bp.blogspot.com/-EB4avRIsLQ8/Tv25pjMDi3I/AAAAAAAAB9s/Io8ybYRjjFM/s1600/Red+avadavat+Amandava+amandava.jpg Resolving 2.bp.blogspot.com (2.bp.blogspot.com)... 74.125.124.132, 2607:f8b0:4001:c14::84 Connecting to 2.bp.blogspot.com (2.bp.blogspot.com)|74.125.124.132|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 98904 (97K) [image/jpeg] Saving to: ‘Red+avadavat+Amandava+amandava.jpg’ Red+avadavat+Amanda 100%[===================>] 96.59K --.-KB/s in 0s 2022-05-03 12:21:25 (195 MB/s) - ‘Red+avadavat+Amandava+amandava.jpg’ saved [98904/98904]
filename="/content/Red+avadavat+Amandava+amandava.jpg"
predict_and_plot(loaded_model, filename, train_data.class_names, \
known_label=False, rescale=False)
As we see, the Avadavat bird image was assigned a correct specie name with the prediction probability=1. The function predict_and_plot() and all the code is available in my GitHub repository with deep learning experiments.
Conclusion
In this post, I have described the process of in-depth model evaluation. I have reused the previously created EffecientNetB0 model, which is fine-tuned with the 400 Bird Species Kaggle dataset. As a result, I have found out which bird species are not well predicted. Thanks for reading, and good look with coding!
References
1. TensorFlow Developer Certificate in 2022: Zero to Mastery
2. Birds 400 - Species Image Classification
4. Wikipedia article about Red Avadavat
Did you like this post? Please let me know if you have any comments or suggestions.
Posts that might be interesting for youAbout Elena Elena, a PhD in Computer Science, simplifies AI concepts and helps you use machine learning.
|