Regression Model Evaluation in TensorFlow: MAE and MSE
Model evaluation is the process that measures how well a trained model predicts on data it has never seen. In the previous post, we created several simple regression models with TensorFlow’s Sequential API. Here we go in-depth on evaluating those models using a held-out testing dataset and the Mean Absolute Error (MAE) and Mean Squared Error (MSE) metrics.
Data Preparation: Train/Test Split with tf.range()
First of all, to ensure the reproducibility of results, we set a random seed (please check my previous post if you are curious about seeds in TensorFlow). As in the previous post on regression in TensorFlow, we use tf.range() function for generating a set of X input values, and also y outputs as follows:
# Creating a random seed
tf.random.set_seed(57)
# Generating data
X = tf.range(-100, 300, 4)
y = X + 7
X, y
(<tf.Tensor: shape=(100,), dtype=int32, numpy=
array([-100, -96, -92, -88, -84, -80, -76, -72, -68, -64, -60,
-56, -52, -48, -44, -40, -36, -32, -28, -24, -20, -16,
-12, -8, -4, 0, 4, 8, 12, 16, 20, 24, 28,
32, 36, 40, 44, 48, 52, 56, 60, 64, 68, 72,
76, 80, 84, 88, 92, 96, 100, 104, 108, 112, 116,
120, 124, 128, 132, 136, 140, 144, 148, 152, 156, 160,
164, 168, 172, 176, 180, 184, 188, 192, 196, 200, 204,
208, 212, 216, 220, 224, 228, 232, 236, 240, 244, 248,
252, 256, 260, 264, 268, 272, 276, 280, 284, 288, 292,
296], dtype=int32)>, <tf.Tensor: shape=(100,), dtype=int32, numpy=
array([-93, -89, -85, -81, -77, -73, -69, -65, -61, -57, -53, -49, -45,
-41, -37, -33, -29, -25, -21, -17, -13, -9, -5, -1, 3, 7,
11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59,
63, 67, 71, 75, 79, 83, 87, 91, 95, 99, 103, 107, 111,
115, 119, 123, 127, 131, 135, 139, 143, 147, 151, 155, 159, 163,
167, 171, 175, 179, 183, 187, 191, 195, 199, 203, 207, 211, 215,
219, 223, 227, 231, 235, 239, 243, 247, 251, 255, 259, 263, 267,
271, 275, 279, 283, 287, 291, 295, 299, 303], dtype=int32)>)
We want to separate training and testing datasets for the respective model training and evaluation steps, using split_data() function:
# Split data into train and test sets
def split_data(X, y):
X_train = X[:80] # First 80% of the data
y_train = y[:80]
X_test = X[80:] # last 20% percent of the data
y_test = y[80:]
return(X_train, X_test, y_train, y_test)
(X_train, X_test, y_train, y_test) = split_data(X, y)
X_train = tf.expand_dims(X_train, axis = -1)
X_test = tf.expand_dims(X_test, axis = -1)
Calculating MAE and MSE Error Metrics
Mean Absolute Error (MAE) is a regression metric that averages the absolute differences between predicted and actual values, treating every error proportionally. Mean Squared Error (MSE) is a regression metric that squares each error before averaging, which amplifies the impact of significant outliers. Using the predicted y_pred and the testing data y_test, we compute both with the TensorFlow metrics API (tf.metrics). MAE and MSE are the usual metrics for regression problems. You can also employ the Huber metric available in TensorFlow.
# Calculate MAE and MSE
def get_errors(y_test, y_pred):
# we remove an extra dimension with tf.squeeze
y_pred = tf.squeeze(y_pred)
# Calculate the Mean Absolute Error (MAE)
mae = tf.metrics.mean_absolute_error(y_true=y_test, y_pred=y_pred).numpy()
# Calculate the Mean Square Error (MSE)
mse = tf.metrics.mean_squared_error(y_true=y_test, y_pred=y_pred).numpy()
# print("MAE=%f, MSE=%f"%(mae, mse))
return (mae, mse)
Creating Sequential Models with Tunable Hyperparameters
In experimentation, we create, compile, fit and evaluate several models with defined hyperparameters and evaluation metrics for finding a well-performing model. We will follow the same steps while creating different experimental models. We changed the hyperparameters to find out their best combination for a lower error rate. Thus, we create a function create_model(), which will take in the main hyperparameters, including the number of neurons in the first dense layer (model creation), and the learning rate of the Adam optimiser (model compilation step). The function returns compiled model, which is yet to be trained (or fitted).
# Create and compile a model with defined neurons and learning rate
def create_model(neurons_number=3, learning_rate=0.1):
# Create a model with 6 neurons in the first layer
model = tf.keras.Sequential([
tf.keras.layers.Dense(neurons_number),
tf.keras.layers.Dense(1)])
# Compile the model
model.compile(loss=tf.keras.losses.mae,
optimizer=tf.keras.optimizers.Adam(lr=learning_rate),
metrics=["mae", "mse"])
return model
# We will evaluate four possible hyperparameter sets defined:
neurons = [3, 6, 3, 6];
epochs = [50, 100, 50, 100]
learning_rates = [0.1, 0.001, 0.001, 0.1]
Evaluating Models with model.predict() and Error Metrics
With the prepared dataset, the defined hyperparameter sets, and the model creation function, we experiment with different models. We cycle over the hyperparameter sets and call the create_model() function to create Sequential models with each parameter combination, then evaluate each with model.predict() and the MAE/MSE metrics.
# We will store the evaluation results in the array
evaluation_results=[]
# We zeep the hyperparameters to create four Sequencial models to create, fit and analyse
for neurons, epoch, rate in zip(neurons, epochs, learning_rates):
# Create and fit the model
model = create_model(neurons_number=neurons, learning_rate=rate)
model.fit(X_train, y_train, epochs=epoch, verbose=0)
# Predict the test set
y_pred = model.predict(X_test)
# Store results
mae, mse = get_errors(y_test, y_pred)
evaluation_results.append({"neurons": neurons, "learning_rate": rate,
"epochs": epoch, "mae": mae, "mse": mse})
# Show evaluation results in a Pandas DataFrame
import pandas as pd
pd.DataFrame(evaluation_results)
The table below shows that the Sequential model with 3 neurons and Adam optimiser with a learning rate of 0.1 has the lowest MAE and MSE values compared with the 3 other models.

Conclusion: Selecting the Best Regression Model
In this post, we evaluated four regression models using TensorFlow, comparing the Sequential models with the MAE and MSE error metrics to find the best neural network architecture for the defined hyperparameters. Model evaluation on a held-out test set is the decisive step that distinguishes a model that generalises from one that has merely memorised its training data. For these experiments, the Sequential model with 3 neurons and an Adam optimiser learning rate of 0.1 produced the lowest MAE and MSE.
Regression Model Evaluation FAQ
What is the difference between MAE and MSE in regression?
Mean Absolute Error (MAE) averages the absolute differences between predictions and ground truth, so every error contributes proportionally. Mean Squared Error (MSE) squares each error before averaging, which penalises large outliers far more heavily. Use MAE when all errors matter equally; use MSE when large mistakes are especially costly. Both are computed in TensorFlow with tf.metrics.mean_absolute_error and tf.metrics.mean_squared_error.
How do you evaluate a regression model in TensorFlow?
Call model.evaluate(X_test, y_test) on a held-out test set after training, or compute metrics manually from model.predict(X_test). The test set must not be seen during training; reporting metrics on training data overstates performance. Lower MAE and MSE indicate better predictions.
Why use a separate test set instead of training-set metrics?
Training-set metrics measure memorisation, not generalisation. A model can fit training data almost perfectly while failing on unseen inputs. Splitting the data (here, 80% train / 20% test) and evaluating on the test partition reveals the model’s true predictive performance.
Did you like this post? Please let me know if you have any comments or suggestions.
Python posts that might be interesting for youReferences and Further Reading
Related Reading
Enjoyed this? Get more like it.
Weekly notes on AI tools, Python, and what I'm actually building — plus a free copy of Fantastic AI: The 2026 Toolkit.