Elena' s AI Blog

TensorFlow: Evaluating the Regression Model

25 Jan 2022 (updated: 29 Dec 2025) / 11 minutes to read

Elena Daehnhardt


Coding robot illustration


TL;DR:
  • Evaluate TensorFlow models with MAE and MSE on test data. Compare multiple architectures—use model.evaluate() for metrics. Lower MAE/MSE means better predictions. Test set reveals true performance.

Previous: Part 8 — TensorFlow: Global and Operation-level Seeds

Next: Part 10 — TensorFlow: Multiclass Classification Model

Regression Model Evaluation in TensorFlow: MAE and MSE

Model evaluation is the process that measures how well a trained model predicts on data it has never seen. In the previous post, we created several simple regression models with TensorFlow’s Sequential API. Here we go in-depth on evaluating those models using a held-out testing dataset and the Mean Absolute Error (MAE) and Mean Squared Error (MSE) metrics.

Data Preparation: Train/Test Split with tf.range()

First of all, to ensure the reproducibility of results, we set a random seed (please check my previous post if you are curious about seeds in TensorFlow). As in the previous post on regression in TensorFlow, we use tf.range() function for generating a set of X input values, and also y outputs as follows:

# Creating a random seed
tf.random.set_seed(57)

# Generating data
X = tf.range(-100, 300, 4)
y = X + 7
X, y
(<tf.Tensor: shape=(100,), dtype=int32, numpy=
 array([-100,  -96,  -92,  -88,  -84,  -80,  -76,  -72,  -68,  -64,  -60,
         -56,  -52,  -48,  -44,  -40,  -36,  -32,  -28,  -24,  -20,  -16,
         -12,   -8,   -4,    0,    4,    8,   12,   16,   20,   24,   28,
          32,   36,   40,   44,   48,   52,   56,   60,   64,   68,   72,
          76,   80,   84,   88,   92,   96,  100,  104,  108,  112,  116,
         120,  124,  128,  132,  136,  140,  144,  148,  152,  156,  160,
         164,  168,  172,  176,  180,  184,  188,  192,  196,  200,  204,
         208,  212,  216,  220,  224,  228,  232,  236,  240,  244,  248,
         252,  256,  260,  264,  268,  272,  276,  280,  284,  288,  292,
         296], dtype=int32)>, <tf.Tensor: shape=(100,), dtype=int32, numpy=
 array([-93, -89, -85, -81, -77, -73, -69, -65, -61, -57, -53, -49, -45,
        -41, -37, -33, -29, -25, -21, -17, -13,  -9,  -5,  -1,   3,   7,
         11,  15,  19,  23,  27,  31,  35,  39,  43,  47,  51,  55,  59,
         63,  67,  71,  75,  79,  83,  87,  91,  95,  99, 103, 107, 111,
        115, 119, 123, 127, 131, 135, 139, 143, 147, 151, 155, 159, 163,
        167, 171, 175, 179, 183, 187, 191, 195, 199, 203, 207, 211, 215,
        219, 223, 227, 231, 235, 239, 243, 247, 251, 255, 259, 263, 267,
        271, 275, 279, 283, 287, 291, 295, 299, 303], dtype=int32)>)

We want to separate training and testing datasets for the respective model training and evaluation steps, using split_data() function:

# Split data into train and test sets
def split_data(X, y):
  X_train = X[:80] # First 80% of the data
  y_train = y[:80] 

  X_test = X[80:] # last 20% percent of the data
  y_test = y[80:]
  
  return(X_train, X_test, y_train, y_test)

(X_train, X_test, y_train, y_test) = split_data(X, y)
X_train = tf.expand_dims(X_train, axis = -1)
X_test = tf.expand_dims(X_test, axis = -1)

Calculating MAE and MSE Error Metrics

Mean Absolute Error (MAE) is a regression metric that averages the absolute differences between predicted and actual values, treating every error proportionally. Mean Squared Error (MSE) is a regression metric that squares each error before averaging, which amplifies the impact of significant outliers. Using the predicted y_pred and the testing data y_test, we compute both with the TensorFlow metrics API (tf.metrics). MAE and MSE are the usual metrics for regression problems. You can also employ the Huber metric available in TensorFlow.

# Calculate MAE and MSE
def get_errors(y_test, y_pred):
  # we remove an extra dimension with tf.squeeze
  y_pred = tf.squeeze(y_pred)

  # Calculate the Mean Absolute Error (MAE)
  mae = tf.metrics.mean_absolute_error(y_true=y_test, y_pred=y_pred).numpy()

  # Calculate the Mean Square Error (MSE)
  mse = tf.metrics.mean_squared_error(y_true=y_test, y_pred=y_pred).numpy()

  # print("MAE=%f, MSE=%f"%(mae, mse))
  return (mae, mse)

Creating Sequential Models with Tunable Hyperparameters

In experimentation, we create, compile, fit and evaluate several models with defined hyperparameters and evaluation metrics for finding a well-performing model. We will follow the same steps while creating different experimental models. We changed the hyperparameters to find out their best combination for a lower error rate. Thus, we create a function create_model(), which will take in the main hyperparameters, including the number of neurons in the first dense layer (model creation), and the learning rate of the Adam optimiser (model compilation step). The function returns compiled model, which is yet to be trained (or fitted).

# Create and compile a model with defined neurons and learning rate
def create_model(neurons_number=3, learning_rate=0.1):
  # Create a model with 6 neurons in the first layer
  model = tf.keras.Sequential([
         tf.keras.layers.Dense(neurons_number),
         tf.keras.layers.Dense(1)])

  # Compile the model
  model.compile(loss=tf.keras.losses.mae,
               optimizer=tf.keras.optimizers.Adam(lr=learning_rate),
               metrics=["mae", "mse"])
  return model

# We will evaluate four possible hyperparameter sets defined:
neurons = [3, 6, 3, 6]; 
epochs = [50, 100, 50, 100]
learning_rates = [0.1, 0.001, 0.001, 0.1]

Evaluating Models with model.predict() and Error Metrics

With the prepared dataset, the defined hyperparameter sets, and the model creation function, we experiment with different models. We cycle over the hyperparameter sets and call the create_model() function to create Sequential models with each parameter combination, then evaluate each with model.predict() and the MAE/MSE metrics.

# We will store the evaluation results in the array
evaluation_results=[]

# We zeep the hyperparameters to create four Sequencial models to create, fit and analyse
for neurons, epoch, rate in zip(neurons, epochs, learning_rates):
  # Create and fit the model
  model = create_model(neurons_number=neurons, learning_rate=rate)
  model.fit(X_train, y_train, epochs=epoch, verbose=0)

  # Predict the test set
  y_pred = model.predict(X_test)

  # Store results
  mae, mse = get_errors(y_test, y_pred)
  evaluation_results.append({"neurons": neurons, "learning_rate": rate,
                             "epochs": epoch, "mae": mae, "mse": mse})

# Show evaluation results in a Pandas DataFrame  
import pandas as pd
pd.DataFrame(evaluation_results)

The table below shows that the Sequential model with 3 neurons and Adam optimiser with a learning rate of 0.1 has the lowest MAE and MSE values compared with the 3 other models.

MAE and MSE results

Conclusion: Selecting the Best Regression Model

In this post, we evaluated four regression models using TensorFlow, comparing the Sequential models with the MAE and MSE error metrics to find the best neural network architecture for the defined hyperparameters. Model evaluation on a held-out test set is the decisive step that distinguishes a model that generalises from one that has merely memorised its training data. For these experiments, the Sequential model with 3 neurons and an Adam optimiser learning rate of 0.1 produced the lowest MAE and MSE.

Regression Model Evaluation FAQ

What is the difference between MAE and MSE in regression?

Mean Absolute Error (MAE) averages the absolute differences between predictions and ground truth, so every error contributes proportionally. Mean Squared Error (MSE) squares each error before averaging, which penalises large outliers far more heavily. Use MAE when all errors matter equally; use MSE when large mistakes are especially costly. Both are computed in TensorFlow with tf.metrics.mean_absolute_error and tf.metrics.mean_squared_error.

How do you evaluate a regression model in TensorFlow?

Call model.evaluate(X_test, y_test) on a held-out test set after training, or compute metrics manually from model.predict(X_test). The test set must not be seen during training; reporting metrics on training data overstates performance. Lower MAE and MSE indicate better predictions.

Why use a separate test set instead of training-set metrics?

Training-set metrics measure memorisation, not generalisation. A model can fit training data almost perfectly while failing on unseen inputs. Splitting the data (here, 80% train / 20% test) and evaluating on the test partition reveals the model’s true predictive performance.

Did you like this post? Please let me know if you have any comments or suggestions.

Python posts that might be interesting for you



References and Further Reading

  1. TensorFlow Developer Certificate in 2022: Zero to Mastery
desktop bg dark

About Elena

Elena, a PhD in Computer Science, simplifies AI concepts and helps you use machine learning.

Citation
Elena Daehnhardt. (2022) 'TensorFlow: Evaluating the Regression Model', daehnhardt.com, 25 January 2022. Available at: https://daehnhardt.com/blog/2022/01/25/tf-evaluation/
All Posts