Elena' s AI Blog

TensorFlow: Evaluating the Saved Bird Species Prediction Model

02 May 2022 (updated: 01 Jun 2026) / 21 minutes to read

Elena Daehnhardt


Python coding setup with glowing monitors


TL;DR:
  • Load saved models with tf.keras.models.load_model(). Use confusion matrices to identify misclassified classes. Evaluate on test set—saved models maintain performance after reloading.

Previous: Part 13 — TensorFlow: Transfer Learning (Fine-Tuning) in Image Classification

Next: Part 15 — Data exploration and analysis with Python Pandas

Loading and Evaluating a Saved TensorFlow Bird Species Model

Model evaluation is the process of measuring a trained model’s performance on unseen test data and identifying which classes it predicts incorrectly. In my previous post “TensorFlow: Transfer Learning (Fine-Tuning) in Image Classification”, I described building a convolutional neural network based on EfficientNetB0 (initially trained on the ImageNet dataset), which underwent feature extraction and fine-tuning steps using the 400 Bird Species Dataset at Kaggle. This was an instructive experiment because the ImageNet dataset contains only 40 bird species, while the Kaggle dataset has 400 bird species. Despite this difference in the underlying data, the final model reached 98.5% accuracy on the test set. This post loads the saved model from my deep learning repository and evaluates its performance in detail to determine which birds are not well predicted.

Getting the Dataset, Helper Functions, and Saved Model

Using Helper Functions

I have shared my helpers.py Python script contains some useful functions for data preprocessing, model creation, and evaluation. You can use this file as you like, change it and share with me your ideas :) I will discuss the code parts that are useful in analysing the fitted bird species prediction model.

# Getting helper functions
!wget https://raw.githubusercontent.com/edaehn/deep_learning_notebooks/main/helpers.py
--2022-05-02 10:47:37--  https://raw.githubusercontent.com/edaehn/deep_learning_notebooks/main/helpers.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 33925 (33K) [text/plain]
Saving to: 'helpers.py'

helpers.py          100%[===================>]  33.13K  --.-KB/s    in 0.002s  

2022-05-02 10:47:38 (14.4 MB/s) - ‘helpers.py’ saved [33925/33925]
# Import files library from google.colab
from google.colab import files

# Import all functions from the helpers.py
from helpers import *

Downloading the Birds Species Dataset from Kaggle

Before getting the dataset, you need to upload your kaggle.json into the Colab file system.

# Setup to download Kaggle datasets into a Colab instance
! pip install kaggle
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json
Requirement already satisfied: kaggle in /usr/local/lib/python3.7/dist-packages (1.5.12)
Requirement already satisfied: python-slugify in /usr/local/lib/python3.7/dist-packages (from kaggle) (6.1.2)
Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.7/dist-packages (from kaggle) (1.15.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from kaggle) (4.64.0)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from kaggle) (2.23.0)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/dist-packages (from kaggle) (1.24.3)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from kaggle) (2.8.2)
Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from kaggle) (2021.10.8)
Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.7/dist-packages (from python-slugify->kaggle) (1.3)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->kaggle) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->kaggle) (2.10)

You see here “Requirement already satisfied” messages. I have already installed the kaggle library. You will need to run these commands for installing the kaggle package. Next, we can get the dataset directly from Kaggle.

! kaggle datasets download gpiosenka/100-bird-species/birds -p /content/sample_data/birds --unzip
Downloading 100-bird-species.zip to /content/sample_data/birds
100% 1.49G/1.49G [00:21<00:00, 60.5MB/s]
100% 1.49G/1.49G [00:21<00:00, 75.6MB/s]

Getting the Trained Model

I have created a fine-tuned bird species predictive model in my previous set of experiments. This model is saved in my GitHub repository, and we further reuse it.

# Getting saved fine-tuned EffecientNetB0 model
!wget https://github.com/edaehn/deep_learning_notebooks/raw/main/models/model_4_bird_species_prediction.zip
--2022-05-02 10:48:38--  https://github.com/edaehn/deep_learning_notebooks/blob/main/models/model_4_bird_species_prediction.zip
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘model_4_bird_species_prediction.zip’

model_4_bird_specie     [ <=>                ] 123.66K  --.-KB/s    in 0.08s   

2022-05-02 10:48:38 (1.59 MB/s) - ‘model_4_bird_species_prediction.zip’ saved [126623]

Let’s unzip the trained model. The model is unzipped into the “model_4” directory.

# Unzipping saved model
unzip_file("/content/model_4_bird_species_prediction.zip")
True

Verifying the Bird Species Dataset with walk_directory()

The function “walk_directory” (helpers.py) shows the number of directories and files in the “sample_data/birds” directory.

# Define the directory wherein the dataset is stored
dataset_path = "sample_data/birds"

# Show file numbers in the directory "sample_data/birds"
walk_directory(dataset_path)
There are 4 directories and '5" files in sample_data/birds.
There are 400 directories and '0" files in sample_data/birds/train.
There are 0 directories and '146'' files in sample_data/birds/train/AFRICAN EMERALD CUCKOO.
There are 0 directories and '160'' files in sample_data/birds/train/CANARY.
There are 0 directories and '197" files in sample_data/birds/train/RED BEARDED BEE EATER.
There are 0 directories and '154'' files in sample_data/birds/train/SCARLET CROWNED FRUIT DOVE.
There are 0 directories and '201" files in sample_data/birds/train/VIOLET GREEN SWALLOW.
There are 0 directories and '130'' files in sample_data/birds/train/GOULDIAN FINCH.
.....

The function show_five_birds() draws five random birds from the dataset.

show_five_birds(dataset_path=dataset_path)
sample_data/birds/train/ALEXANDRINE PARAKEET
['015.jpg']
Image shape: (224, 224, 3)
sample_data/birds/train/CHESTNET BELLIED EUPHONIA
['056.jpg']
Image shape: (224, 224, 3)
sample_data/birds/train/ANDEAN SISKIN
['111.jpg']
Image shape: (224, 224, 3)
sample_data/birds/train/EMERALD TANAGER
['129.jpg']
Image shape: (224, 224, 3)
sample_data/birds/train/MOURNING DOVE
['032.jpg']
Image shape: (224, 224, 3)
Five Random Birds from the Training Dataset

Figure 1. Five Random Birds from the Training Dataset

Getting Training and Test Data

# Getting training and test datasets
train_data, test_data = get_image_data(dataset_path=dataset_path, IMG_SIZE = (224, 224))
Found 58388 files belonging to 400 classes.
Found 2000 files belonging to 400 classes.

Loading the Saved Model with tf.keras.models.load_model()

Keras load_model() loads the previously trained and saved model from disk.

# Load unzipped model
loaded_model = tf.keras.models.load_model("model_4")

Next, the loaded model is evaluated with the test dataset. The reloaded model reached an accuracy of 0.9845, confirming that a SavedModel keeps its performance after reloading. It is still useful to inspect the not well-predicted samples. Knowing the wrong predictions could give us ideas on how to improve our model. For instance, we could add more bird samples in their respective training folders.

# Evaluate on the full test dataset
loaded_model.evaluate(test_data)
63/63 [==============================] - 20s 142ms/step - loss: 0.0537 - accuracy: 0.9845
[0.053718529641628265, 0.984499990940094]

Finding Misclassified Bird Species by Prediction Confidence

Identifying the mispredicted test samples is helpful for understanding model failure modes. This could give us insights into how the model works and what could still be improved. My initial thought on this problem was that possibly, incorrectly predicted birds are somehow similar (for instance, in color or shape) with the bird species they are wrongly assigned to. Let’s check it out using the following steps:

  1. load the test dataset;
  2. use the model for predicting bird species probabilities;
  3. get the classes (bird species) corresponding with the highest prediction probabilities;
  4. create a Pandas dataframe storing image paths to the test bird images, their actual class labels,
  5. predicted class labels, prediction probabilities;
  6. get only incorrectly predicted bird images into a new dataframe, and sort it out in the descending order of prediction probability;
  7. show images of test birds (left side) and images of their predictions (right side).

To realise these steps, I have created two functions (see helpers.py), show_wrongly_predicted_images() for building up Pandas DataFrames using the test dataset and the trained model, and show_one_wrongly_predicted() for showing two bird species side by side for each test sample (step 7).

def show_wrongly_predicted_images(model, dataset_directory="sample_data/birds", top_wrong_predictions_number_to_show=False):

    test_data = tf.keras.preprocessing.image_dataset_from_directory(
        directory=dataset_directory + "/test",
        label_mode="categorical",
        image_size=(224, 224),
        shuffle=False
    )

    class_names = test_data.class_names

    # 2. Use model for predictions
    prediction_probabilities = model.predict(test_data, verbose=1)

    # Check the predictions we have got
    # print(f"Number of test rows: {len(test_data)}, \
            # number of predictions: {len(prediction_probabilities)}, \
            # shape of predcitions: {prediction_probabilities.shape}, \
            # the first prediction: {prediction_probabilities[0]}")

    # Getting indices of the predicted classes
    prediction_classes_index = prediction_probabilities.argmax(axis=1)

    # Get indices of our test_data BatchDataset
    test_labels = []
    for images, labels in test_data.unbatch():
        test_labels.append(labels.numpy().argmax())

    sklearn_accuracy = accuracy_score(y_true=test_labels,
                                      y_pred=prediction_classes_index)

    # 3. Finding where our model is most wrong

    # Find all files in the test dataset
    filepaths = []
    for filepath in test_data.list_files(dataset_directory + "/test/*/*.jpg",
                                         shuffle=False):
        filepaths.append(filepath.numpy())

    # Create a dataframe
    predictions_df = pd.DataFrame({"images_path": filepaths,
                                   "y_true": test_labels,
                                   "y_predicted": prediction_classes_index,
                                   "prediction_confidence": prediction_probabilities.max(axis=1),
                                   "true_classname": [class_names[i] for i in test_labels],
                                   "predicted_classname": [class_names[i] for i in prediction_classes_index]})


    # See which birds predicted correctly/incorrectly
    predictions_df["correct_prediction"] = predictions_df["y_true"] == predictions_df["y_predicted"]

    # Sort out the dataframe to find the most wrongly predicted classes
    top_wrong = predictions_df[predictions_df["correct_prediction"] == False].sort_values("prediction_confidence",
                                                                                              ascending=False)

    # 4. Plot top top_wrong_predictions_number_to_show number of predictions

    top = zip(top_wrong["images_path"], top_wrong["true_classname"], top_wrong["predicted_classname"], top_wrong["prediction_confidence"])
    print(f"Wrongly predicted {len(top_wrong)} out of {len(predictions_df)}")
    if top_wrong_predictions_number_to_show:
        top = top[:top_wrong_predictions_number_to_show]

    for filename1, label1, label2, prob in top:
        filename2 = "/content/sample_data/birds/train/"+ label2 + "/" + random.sample(os.listdir("/content/sample_data/birds/train/" + label2), 1)[0]
        # print(f"{filename1}: {filename2}")
        show_one_wrongly_predicted(filename1, filename2, label1, label2+f" (prob={prob:.2f})")

    return sklearn_accuracy

def show_one_wrongly_predicted(filename1, filename2, label1, label2):
    """
    Loads two images from their full-path filenames and show them in one plot with own titles corresponding to their
    class labels.
    :param filename1: full-path filename to the first image, the test image we are predicting.
    :param filename2: full-path to the second image relating to the predicted class.
    :param label1: true class label
    :param label2: predicted class label.
    :return:
    """
    img1 = tf.io.read_file(filename1)
    img1 = tf.image.decode_image(img1, channels=3)
    img2 = tf.io.read_file(filename2)
    img2 = tf.image.decode_image(img2, channels=3)
    figure, ax = plt.subplots(1, 2);
    ax.ravel()[0].imshow(img1);
    ax.ravel()[0].set_title(label1);
    ax.ravel()[0].set_axis_off();
    ax.ravel()[1].imshow(img2);
    ax.ravel()[1].set_title(label2);
    ax.ravel()[1].set_axis_off();
    plt.axis(False);

# Show top wrongly predicted birds
show_wrongly_predicted_images(loaded_model)
Wrongly Predicted Bird Species Wrongly Predicted Bird Species Wrongly Predicted Bird Species Wrongly Predicted Bird Species Wrongly Predicted Bird Species Wrongly Predicted Bird Species Wrongly Predicted Bird Species Wrongly Predicted Bird Species Wrongly Predicted Bird Species Wrongly Predicted Bird Species Wrongly Predicted Bird Species

Figure 2. Wrongly Predicted Bird Species

As we see from the images of wrongly predicted bird species, all of the are indeed alike in color and shape. Moreover, some species are very close to the bird families that you need to be an ornitologist or research the bird species to know the little differences between both species, predicted and the actual. For instance, Avadavat was predcited as a strawberry Finch with probability of 91%. In Wikipedia article about Red Avadavat we can learn that this bird is a Strawberry Finch belonging to the family of Estrildidae originating from India and is “is popular as a cage bird”.

At the end of this article, I am asking you, my dear readers, please, do not keep your pet birds in cages all the time. Birds need to be happy and fly, even only in a well-ventilated room or a proprietary-sized aviary. Otherwise, birds get depressed, suffer psychological trauma, and even a weakened heart due to obesity and lack of training. Do not imprison birds or other animals, and we all deserve to be happy and free! In return, your pet bird will become a loving and cheerful friend.

Predicting a Bird Species from a Web-Downloaded Image

As a bonus section, I will try predicting a bird species with an image of a red avadavat downloaded from the BlogSpot website. Will it be well predicted?

!wget http://2.bp.blogspot.com/-EB4avRIsLQ8/Tv25pjMDi3I/AAAAAAAAB9s/Io8ybYRjjFM/s1600/Red+avadavat+Amandava+amandava.jpg
--2022-05-03 12:21:25--  http://2.bp.blogspot.com/-EB4avRIsLQ8/Tv25pjMDi3I/AAAAAAAAB9s/Io8ybYRjjFM/s1600/Red+avadavat+Amandava+amandava.jpg
Resolving 2.bp.blogspot.com (2.bp.blogspot.com)... 74.125.124.132, 2607:f8b0:4001:c14::84
Connecting to 2.bp.blogspot.com (2.bp.blogspot.com)|74.125.124.132|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 98904 (97K) [image/jpeg]
Saving to: ‘Red+avadavat+Amandava+amandava.jpg’

Red+avadavat+Amanda 100%[===================>]  96.59K  --.-KB/s    in 0s      

2022-05-03 12:21:25 (195 MB/s) - ‘Red+avadavat+Amandava+amandava.jpg’ saved [98904/98904]
filename="/content/Red+avadavat+Amandava+amandava.jpg"
predict_and_plot(loaded_model, filename, train_data.class_names, \
                 known_label=False, rescale=False)
An Avadavat Prediction with probability=1

Figure 3. An Avadavat Prediction

As we see, the Avadavat bird image was assigned a correct specie name with the prediction probability=1. The function predict_and_plot() and all the code is available in my GitHub repository with deep learning experiments.

Conclusion: In-Depth Evaluation of a Saved Image Classifier

A saved Keras model reloaded with load_model() retains its full test accuracy, and analysing its highest-confidence wrong predictions reveals which classes share confusable visual features. This post described in-depth model evaluation of the previously created EfficientNetB0 model fine-tuned on the 400 Bird Species Kaggle dataset. The analysis identified which bird species are not well predicted — mostly species that are visually similar in colour and shape. Thanks for reading, and good luck with your coding!

Saved Model Evaluation FAQ

How do you load a saved model in TensorFlow Keras?

Use tf.keras.models.load_model("model_directory"). If the model is stored as a zip archive, unzip it first, then pass the resulting SavedModel directory path. A reloaded model keeps the same weights and architecture, so it maintains its original test accuracy.

How do you find which classes a classifier predicts incorrectly?

Run model.predict() on the test set, take argmax of the prediction probabilities, and compare predicted labels to true labels in a pandas DataFrame. Filter rows where y_true != y_predicted, then sort by prediction_confidence descending to surface the most confident wrong predictions.

Why does an image classifier confuse similar bird species?

Convolutional models classify by visual features such as colour and shape. When two species share those features — for example a Red Avadavat and a Strawberry Finch — the model assigns high-confidence predictions to the wrong class. Adding more training images of the confused species can reduce these errors.

How do you preprocess an image before predicting with a saved model?

Read the file with tf.io.read_file, decode it with tf.image.decode_image, resize to the model’s input shape (224x224 for EfficientNetB0), and rescale to [0, 1] only if the model has no built-in rescaling layer. EfficientNetB0 already normalises internally, so set rescale=False.

References

1. TensorFlow Developer Certificate in 2022: Zero to Mastery

2. Birds 400 - Species Image Classification

3. wikipedia: ImageNet

4. Wikipedia article about Red Avadavat

Did you like this post? Please let me know if you have any comments or suggestions.

Posts that might be interesting for you


desktop bg dark

About Elena

Elena, a PhD in Computer Science, simplifies AI concepts and helps you use machine learning.

Citation
Elena Daehnhardt. (2022) 'TensorFlow: Evaluating the Saved Bird Species Prediction Model', daehnhardt.com, 02 May 2022. Available at: https://daehnhardt.com/blog/2022/05/02/tf-reusing-and-evaluating-saved-models/
All Posts