Loading and Evaluating a Saved TensorFlow Bird Species Model
Model evaluation is the process of measuring a trained model’s performance on unseen test data and identifying which classes it predicts incorrectly. In my previous post “TensorFlow: Transfer Learning (Fine-Tuning) in Image Classification”, I described building a convolutional neural network based on EfficientNetB0 (initially trained on the ImageNet dataset), which underwent feature extraction and fine-tuning steps using the 400 Bird Species Dataset at Kaggle. This was an instructive experiment because the ImageNet dataset contains only 40 bird species, while the Kaggle dataset has 400 bird species. Despite this difference in the underlying data, the final model reached 98.5% accuracy on the test set. This post loads the saved model from my deep learning repository and evaluates its performance in detail to determine which birds are not well predicted.
Getting the Dataset, Helper Functions, and Saved Model
Using Helper Functions
I have shared my helpers.py Python script contains some useful functions for data preprocessing, model creation, and evaluation. You can use this file as you like, change it and share with me your ideas :) I will discuss the code parts that are useful in analysing the fitted bird species prediction model.
# Getting helper functions
!wget https://raw.githubusercontent.com/edaehn/deep_learning_notebooks/main/helpers.py
--2022-05-02 10:47:37-- https://raw.githubusercontent.com/edaehn/deep_learning_notebooks/main/helpers.py Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 33925 (33K) [text/plain] Saving to: 'helpers.py' helpers.py 100%[===================>] 33.13K --.-KB/s in 0.002s 2022-05-02 10:47:38 (14.4 MB/s) - ‘helpers.py’ saved [33925/33925]
# Import files library from google.colab
from google.colab import files
# Import all functions from the helpers.py
from helpers import *
Downloading the Birds Species Dataset from Kaggle
Before getting the dataset, you need to upload your kaggle.json into the Colab file system.
# Setup to download Kaggle datasets into a Colab instance
! pip install kaggle
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json
Requirement already satisfied: kaggle in /usr/local/lib/python3.7/dist-packages (1.5.12) Requirement already satisfied: python-slugify in /usr/local/lib/python3.7/dist-packages (from kaggle) (6.1.2) Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.7/dist-packages (from kaggle) (1.15.0) Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from kaggle) (4.64.0) Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from kaggle) (2.23.0) Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/dist-packages (from kaggle) (1.24.3) Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from kaggle) (2.8.2) Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from kaggle) (2021.10.8) Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.7/dist-packages (from python-slugify->kaggle) (1.3) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->kaggle) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->kaggle) (2.10)
You see here “Requirement already satisfied” messages. I have already installed the kaggle library. You will need to run these commands for installing the kaggle package. Next, we can get the dataset directly from Kaggle.
! kaggle datasets download gpiosenka/100-bird-species/birds -p /content/sample_data/birds --unzip
Downloading 100-bird-species.zip to /content/sample_data/birds 100% 1.49G/1.49G [00:21<00:00, 60.5MB/s] 100% 1.49G/1.49G [00:21<00:00, 75.6MB/s]
Getting the Trained Model
I have created a fine-tuned bird species predictive model in my previous set of experiments. This model is saved in my GitHub repository, and we further reuse it.
# Getting saved fine-tuned EffecientNetB0 model
!wget https://github.com/edaehn/deep_learning_notebooks/raw/main/models/model_4_bird_species_prediction.zip
--2022-05-02 10:48:38-- https://github.com/edaehn/deep_learning_notebooks/blob/main/models/model_4_bird_species_prediction.zip Resolving github.com (github.com)... 140.82.113.4 Connecting to github.com (github.com)|140.82.113.4|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘model_4_bird_species_prediction.zip’ model_4_bird_specie [ <=> ] 123.66K --.-KB/s in 0.08s 2022-05-02 10:48:38 (1.59 MB/s) - ‘model_4_bird_species_prediction.zip’ saved [126623]
Let’s unzip the trained model. The model is unzipped into the “model_4” directory.
# Unzipping saved model
unzip_file("/content/model_4_bird_species_prediction.zip")
True
Verifying the Bird Species Dataset with walk_directory()
The function “walk_directory” (helpers.py) shows the number of directories and files in the “sample_data/birds” directory.
# Define the directory wherein the dataset is stored
dataset_path = "sample_data/birds"
# Show file numbers in the directory "sample_data/birds"
walk_directory(dataset_path)
There are 4 directories and '5" files in sample_data/birds. There are 400 directories and '0" files in sample_data/birds/train. There are 0 directories and '146'' files in sample_data/birds/train/AFRICAN EMERALD CUCKOO. There are 0 directories and '160'' files in sample_data/birds/train/CANARY. There are 0 directories and '197" files in sample_data/birds/train/RED BEARDED BEE EATER. There are 0 directories and '154'' files in sample_data/birds/train/SCARLET CROWNED FRUIT DOVE. There are 0 directories and '201" files in sample_data/birds/train/VIOLET GREEN SWALLOW. There are 0 directories and '130'' files in sample_data/birds/train/GOULDIAN FINCH. .....
The function show_five_birds() draws five random birds from the dataset.
show_five_birds(dataset_path=dataset_path)
sample_data/birds/train/ALEXANDRINE PARAKEET ['015.jpg'] Image shape: (224, 224, 3) sample_data/birds/train/CHESTNET BELLIED EUPHONIA ['056.jpg'] Image shape: (224, 224, 3) sample_data/birds/train/ANDEAN SISKIN ['111.jpg'] Image shape: (224, 224, 3) sample_data/birds/train/EMERALD TANAGER ['129.jpg'] Image shape: (224, 224, 3) sample_data/birds/train/MOURNING DOVE ['032.jpg'] Image shape: (224, 224, 3)
Figure 1. Five Random Birds from the Training Dataset
Getting Training and Test Data
# Getting training and test datasets
train_data, test_data = get_image_data(dataset_path=dataset_path, IMG_SIZE = (224, 224))
Found 58388 files belonging to 400 classes. Found 2000 files belonging to 400 classes.
Loading the Saved Model with tf.keras.models.load_model()
Keras load_model() loads the previously trained and saved model from disk.
# Load unzipped model
loaded_model = tf.keras.models.load_model("model_4")
Next, the loaded model is evaluated with the test dataset. The reloaded model reached an accuracy of 0.9845, confirming that a SavedModel keeps its performance after reloading. It is still useful to inspect the not well-predicted samples. Knowing the wrong predictions could give us ideas on how to improve our model. For instance, we could add more bird samples in their respective training folders.
# Evaluate on the full test dataset
loaded_model.evaluate(test_data)
63/63 [==============================] - 20s 142ms/step - loss: 0.0537 - accuracy: 0.9845 [0.053718529641628265, 0.984499990940094]
Finding Misclassified Bird Species by Prediction Confidence
Identifying the mispredicted test samples is helpful for understanding model failure modes. This could give us insights into how the model works and what could still be improved. My initial thought on this problem was that possibly, incorrectly predicted birds are somehow similar (for instance, in color or shape) with the bird species they are wrongly assigned to. Let’s check it out using the following steps:
- load the test dataset;
- use the model for predicting bird species probabilities;
- get the classes (bird species) corresponding with the highest prediction probabilities;
- create a Pandas dataframe storing image paths to the test bird images, their actual class labels,
- predicted class labels, prediction probabilities;
- get only incorrectly predicted bird images into a new dataframe, and sort it out in the descending order of prediction probability;
- show images of test birds (left side) and images of their predictions (right side).
To realise these steps, I have created two functions (see helpers.py), show_wrongly_predicted_images() for building up Pandas DataFrames using the test dataset and the trained model, and show_one_wrongly_predicted() for showing two bird species side by side for each test sample (step 7).
def show_wrongly_predicted_images(model, dataset_directory="sample_data/birds", top_wrong_predictions_number_to_show=False):
test_data = tf.keras.preprocessing.image_dataset_from_directory(
directory=dataset_directory + "/test",
label_mode="categorical",
image_size=(224, 224),
shuffle=False
)
class_names = test_data.class_names
# 2. Use model for predictions
prediction_probabilities = model.predict(test_data, verbose=1)
# Check the predictions we have got
# print(f"Number of test rows: {len(test_data)}, \
# number of predictions: {len(prediction_probabilities)}, \
# shape of predcitions: {prediction_probabilities.shape}, \
# the first prediction: {prediction_probabilities[0]}")
# Getting indices of the predicted classes
prediction_classes_index = prediction_probabilities.argmax(axis=1)
# Get indices of our test_data BatchDataset
test_labels = []
for images, labels in test_data.unbatch():
test_labels.append(labels.numpy().argmax())
sklearn_accuracy = accuracy_score(y_true=test_labels,
y_pred=prediction_classes_index)
# 3. Finding where our model is most wrong
# Find all files in the test dataset
filepaths = []
for filepath in test_data.list_files(dataset_directory + "/test/*/*.jpg",
shuffle=False):
filepaths.append(filepath.numpy())
# Create a dataframe
predictions_df = pd.DataFrame({"images_path": filepaths,
"y_true": test_labels,
"y_predicted": prediction_classes_index,
"prediction_confidence": prediction_probabilities.max(axis=1),
"true_classname": [class_names[i] for i in test_labels],
"predicted_classname": [class_names[i] for i in prediction_classes_index]})
# See which birds predicted correctly/incorrectly
predictions_df["correct_prediction"] = predictions_df["y_true"] == predictions_df["y_predicted"]
# Sort out the dataframe to find the most wrongly predicted classes
top_wrong = predictions_df[predictions_df["correct_prediction"] == False].sort_values("prediction_confidence",
ascending=False)
# 4. Plot top top_wrong_predictions_number_to_show number of predictions
top = zip(top_wrong["images_path"], top_wrong["true_classname"], top_wrong["predicted_classname"], top_wrong["prediction_confidence"])
print(f"Wrongly predicted {len(top_wrong)} out of {len(predictions_df)}")
if top_wrong_predictions_number_to_show:
top = top[:top_wrong_predictions_number_to_show]
for filename1, label1, label2, prob in top:
filename2 = "/content/sample_data/birds/train/"+ label2 + "/" + random.sample(os.listdir("/content/sample_data/birds/train/" + label2), 1)[0]
# print(f"{filename1}: {filename2}")
show_one_wrongly_predicted(filename1, filename2, label1, label2+f" (prob={prob:.2f})")
return sklearn_accuracy
def show_one_wrongly_predicted(filename1, filename2, label1, label2):
"""
Loads two images from their full-path filenames and show them in one plot with own titles corresponding to their
class labels.
:param filename1: full-path filename to the first image, the test image we are predicting.
:param filename2: full-path to the second image relating to the predicted class.
:param label1: true class label
:param label2: predicted class label.
:return:
"""
img1 = tf.io.read_file(filename1)
img1 = tf.image.decode_image(img1, channels=3)
img2 = tf.io.read_file(filename2)
img2 = tf.image.decode_image(img2, channels=3)
figure, ax = plt.subplots(1, 2);
ax.ravel()[0].imshow(img1);
ax.ravel()[0].set_title(label1);
ax.ravel()[0].set_axis_off();
ax.ravel()[1].imshow(img2);
ax.ravel()[1].set_title(label2);
ax.ravel()[1].set_axis_off();
plt.axis(False);
# Show top wrongly predicted birds
show_wrongly_predicted_images(loaded_model)
Figure 2. Wrongly Predicted Bird Species
As we see from the images of wrongly predicted bird species, all of the are indeed alike in color and shape. Moreover, some species are very close to the bird families that you need to be an ornitologist or research the bird species to know the little differences between both species, predicted and the actual. For instance, Avadavat was predcited as a strawberry Finch with probability of 91%. In Wikipedia article about Red Avadavat we can learn that this bird is a Strawberry Finch belonging to the family of Estrildidae originating from India and is “is popular as a cage bird”.
At the end of this article, I am asking you, my dear readers, please, do not keep your pet birds in cages all the time. Birds need to be happy and fly, even only in a well-ventilated room or a proprietary-sized aviary. Otherwise, birds get depressed, suffer psychological trauma, and even a weakened heart due to obesity and lack of training. Do not imprison birds or other animals, and we all deserve to be happy and free! In return, your pet bird will become a loving and cheerful friend.
Predicting a Bird Species from a Web-Downloaded Image
As a bonus section, I will try predicting a bird species with an image of a red avadavat downloaded from the BlogSpot website. Will it be well predicted?
!wget http://2.bp.blogspot.com/-EB4avRIsLQ8/Tv25pjMDi3I/AAAAAAAAB9s/Io8ybYRjjFM/s1600/Red+avadavat+Amandava+amandava.jpg
--2022-05-03 12:21:25-- http://2.bp.blogspot.com/-EB4avRIsLQ8/Tv25pjMDi3I/AAAAAAAAB9s/Io8ybYRjjFM/s1600/Red+avadavat+Amandava+amandava.jpg Resolving 2.bp.blogspot.com (2.bp.blogspot.com)... 74.125.124.132, 2607:f8b0:4001:c14::84 Connecting to 2.bp.blogspot.com (2.bp.blogspot.com)|74.125.124.132|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 98904 (97K) [image/jpeg] Saving to: ‘Red+avadavat+Amandava+amandava.jpg’ Red+avadavat+Amanda 100%[===================>] 96.59K --.-KB/s in 0s 2022-05-03 12:21:25 (195 MB/s) - ‘Red+avadavat+Amandava+amandava.jpg’ saved [98904/98904]
filename="/content/Red+avadavat+Amandava+amandava.jpg"
predict_and_plot(loaded_model, filename, train_data.class_names, \
known_label=False, rescale=False)
Figure 3. An Avadavat Prediction
As we see, the Avadavat bird image was assigned a correct specie name with the prediction probability=1. The function predict_and_plot() and all the code is available in my GitHub repository with deep learning experiments.
Conclusion: In-Depth Evaluation of a Saved Image Classifier
A saved Keras model reloaded with load_model() retains its full test accuracy, and analysing its highest-confidence wrong predictions reveals which classes share confusable visual features. This post described in-depth model evaluation of the previously created EfficientNetB0 model fine-tuned on the 400 Bird Species Kaggle dataset. The analysis identified which bird species are not well predicted — mostly species that are visually similar in colour and shape. Thanks for reading, and good luck with your coding!
Saved Model Evaluation FAQ
How do you load a saved model in TensorFlow Keras?
Use tf.keras.models.load_model("model_directory"). If the model is stored as a zip archive, unzip it first, then pass the resulting SavedModel directory path. A reloaded model keeps the same weights and architecture, so it maintains its original test accuracy.
How do you find which classes a classifier predicts incorrectly?
Run model.predict() on the test set, take argmax of the prediction probabilities, and compare predicted labels to true labels in a pandas DataFrame. Filter rows where y_true != y_predicted, then sort by prediction_confidence descending to surface the most confident wrong predictions.
Why does an image classifier confuse similar bird species?
Convolutional models classify by visual features such as colour and shape. When two species share those features — for example a Red Avadavat and a Strawberry Finch — the model assigns high-confidence predictions to the wrong class. Adding more training images of the confused species can reduce these errors.
How do you preprocess an image before predicting with a saved model?
Read the file with tf.io.read_file, decode it with tf.image.decode_image, resize to the model’s input shape (224x224 for EfficientNetB0), and rescale to [0, 1] only if the model has no built-in rescaling layer. EfficientNetB0 already normalises internally, so set rescale=False.
References
1. TensorFlow Developer Certificate in 2022: Zero to Mastery
2. Birds 400 - Species Image Classification
4. Wikipedia article about Red Avadavat
Did you like this post? Please let me know if you have any comments or suggestions.
Posts that might be interesting for youRelated Reading
Enjoyed this? Get more like it.
Weekly notes on AI tools, Python, and what I'm actually building — plus a free copy of Fantastic AI: The 2026 Toolkit.