Evaluation Metrics For Regression Models

Machine Learning

In order to evaluate and compare the performance of regression models, various evaluation metrics can be employed, including R2 score, MSE, RMSE, MAE, and MAPE

In the forthcoming sections, you’ll discover comprehensive explanations for each metric. Additionally, leveraging the well-known “Diabetes” dataset from the LARS paper, a practical Python example will demonstrate the step-by-step computation of each metric. You’ll also explore the utilization of scikit-learn methods. Finally, by employing two regression models, you’ll learn how to leverage these measures to determine the optimal choice

In all the following metrics formulas, we define:

y_i as the true dependent variable for the observation i
y_i as the predicted value for the observation i
N as the number of observations in our dataset
p as the number of independent variables in our dataset

1. Loading the dataset and fitting the model

1.1. Dataset

We will use the Diabetes data used in the “Least Angle Regression” paper: N=442 patients, p=10 predictors. One row per patient, and the last column is the response variable.

You can find the dataset (raw) in: https://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt

Let’s first import all the libraries will be needed for this article:

import pandas as pd
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import normalize, Normalizer
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.metrics import r2_score, explained_variance_score, mean_squared_error, mean_absolute_error
from sklearn.metrics import mean_absolute_percentage_error, median_absolute_error

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

Then we load the dataset:

df = pd.read_csv("https://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt", sep="t")
print(df.shape)
df.head()

We will also normalize data, to have the same order of magnitude among the independent variables:

#### Scaled data: X-mean: 
# Start by subtracting the mean with StandardScaler, without dividing by the std(with_std=False)
df_sc = StandardScaler(with_std=False).fit_transform(df)
df_sc = pd.DataFrame(data=df_sc, columns=df.columns)

#### Normalize data: divide each feature by its L2 norm
# If axis=0 no need to transpose, this available in "normalize" method but not in "Normalizer"
df_norm = normalize(df_sc.iloc[:,:-1],axis=0,norm='l2')
#or transpose the dataframe: (as axis=1 is the default value )
#df_norm=normalize(df_sc.iloc[:,:-1].T,norm='l2') 
#(not to forget to transpose the results too, to go back to the initial shape)

df_norm = pd.DataFrame(data=df_norm, columns=df.columns[:-1])

df_norm['Y'] = df_sc['Y']
print("Normalized data: Scaled data/L2 norm")
df_norm.head()

1.2. Model to fit

To illustrate the computation of the different evaluation metrics for regression models, step by step, let’s take the predicted values given by the OLS model “y_predict_ols”. In this example we will not be splitting the dataset to train and test sets. It’s not the objective of this notebook.

features = df.columns[:-1]
X = df_norm[features]
y = df_norm['Y']

#### OLS: Ordinary Least Square
reg_ols = LinearRegression(fit_intercept=False)                                                      
reg_ols.fit(X,y)
#Predict values
y_predict_ols = reg_ols.predict(X)

2. Evaluation metrics for regression models:

2.1. R2 score

Definition

R2 score or the coefficient of determination measures how much the explanatory variables explain the variance of the dependent variable. It indicates if the fitted model is a good one, and if it could be used to predict the unseen values.

The best value of R² is 1, meaning that the model is a perfect fit of our dataset. It could be 0, meaning that the model is a constant and it will always predict the expected average value of the dependent variable y, regardless of the input variables. It could also be negative, the model could be arbitrarily worse.

Evaluation Metrics For Regression Models? r2

With y_i the predicted value for the observation i, and y is the average value of the dependent variable:

Evaluation Metrics For Regression Models

Let’s introduce another measure RSS “Residual Sum of Square” which as its name indicates, compute the square of the residual error (difference between the real and the predicted value):

Evaluation Metrics For Regression Models, RSS

We can then, write the formula of the R² as:

Evaluation Metrics For Regression Models, R2 score

By adding more and more independent variables to the dataset, the R2 score will be mechanically improved, which does not mean that those variables are really improving the accuracy of the prediction. Therefore, the adjusted R2 is introduced to provide a more precise accuracy by taking into account how many independent variables are used:

Computed

df_errors = pd.DataFrame()
df_errors['y_true'] = y
df_errors['yhat'] = y_predict_ols

df_errors['y_yhat'] = df_errors['y_true'] - df_errors['yhat']
df_errors['(y_yhat)**2'] = df_errors['y_yhat']**2

mean_y = df_errors['y_true'].mean()
df_errors['(y_ymean)**2'] = (df_errors['y_true']-mean_y)**2

r2_recomp = 1-(np.sum(df_errors['(y_yhat)**2'])/np.sum(df_errors['(y_ymean)**2']))                               
r2_recomp

0.5177484222203499

sklearn

r2_sk_learn = r2_score(y, y_predict_ols)                                                               
r2_sk_learn

0.5177484222203499

Both

print("r2 recomputed:",r2_recomp, ", r2_sk_learn:",r2_sk_learn)

r2 recomputed: 0.5177484222203499 , r2_sk_learn: 0.5177484222203499

Let’s Visualize

plt.scatter(y_predict_ols,y)
plt.plot(y,y,'-r')
plt.annotate(r"R2={0}".format(round(r2_recomp,3)), xy=(200, 200),xytext=(-65, -10), textcoords='offset points',
             fontsize=12)
plt.xlabel("y_predict_ols")
plt.ylabel("y_true")
plt.title("Regression: Predicted vs True y")

2.2. Explained Variance score

Definition

The explained variance score is computed as:

With:

and:

When the residuals have 0 mean, the explained variable score becomes equal to R² score:

Computed

N=y.shape[0]

df_errors = pd.DataFrame()
df_errors['y_true'] = y
df_errors['yhat'] = y_predict_ols

mean_y = df_errors['y_true'].mean()
df_errors['(y_ymean)**2'] = (df_errors['y_true']-mean_y)**2

var_y = np.sum(df_errors['(y_ymean)**2'])/N

df_errors['y_yhat'] = df_errors['y_true']-df_errors['yhat']
mean_yyhat = df_errors['y_yhat'].mean()
df_errors['(y_yhat_yyhatmean)**2'] = (df_errors['y_yhat']-mean_yyhat)**2                              

var_yyhat = np.sum(df_errors['(y_yhat_yyhatmean)**2'])/N

explained_variance_recomp = 1-var_yyhat/var_y
explained_variance_recomp

0.5177484222203499

sklearn

explained_variance_sklearn = explained_variance_score(y, y_predict_ols)
explained_variance_sklearn

0.5177484222203499

Both

print("Explained variance recomputed:",explained_variance_recomp,", Explained variance from sk_learn:",
explained_variance_sklearn)

Explained variance recomputed: 0.5177484222203499 , Explained variance from sk_learn: 0.5177484222203499

One can see that the Explained Variance and the R2 score are equal. That’s because the mean of the residual is ~0:

mean_yyhat = df_errors['y_yhat'].mean()
mean_yyhat

-4.4810811672653376e-14

2.3. MSE: Mean Squared Error

Definition

Mean square Error compute the average squared error between the true value and the predicted one:

Evaluation Metrics For Regression Models, MSE

We can recall here that the RSS (Residual Sum of Square):

Then:

It gives more importance to the highest errors, thus it’s more sensitive to outliers.

Computed

N=y.shape[0]
df_errors = pd.DataFrame()
df_errors['y_true'] = y
df_errors['yhat'] = y_predict_ols
df_errors['y_yhat'] = df_errors['y_true']-df_errors['yhat']                                           
df_errors['(y_yhat)**2'] = df_errors['y_yhat']**2

mse_recomp = np.sum(df_errors['(y_yhat)**2'])/N
mse_recomp

2859.69634758675

sklearn

mse_sklearn = mean_squared_error(y, y_predict_ols)                                                                    
mse_sklearn

2859.69634758675

Both

print("MSE recomputed:",mse_recomp, ", MSE from sk_learn:",mse_sklearn)

MSE recomputed: 2859.69634758675 , MSE from sk_learn: 2859.69634758675

2.4. RMSE: Root Mean Squared Error

Definition

RMSE is the root squared of MSE

Evaluation Metrics For Regression Models, RMSE

Computed

rmse_recomp = np.sqrt(mse_recomp)
rmse_recomp

53.47612876402657

sklearn

rmse_sklearn = mean_squared_error(y, y_predict_ols,squared=False)
rmse_sklearn

53.47612876402657

Both

print("RMSE recomputed:",rmse_recomp, ", RMSE from sk_learn:",rmse_sklearn)

RMSE recomputed: 53.47612876402657 , RMSE from sk_learn: 53.47612876402657

2.5. MAE: Mean Absolute Error

Definition

MAE is the mean absolute error, is more robust to outliers than the MSE:

Evaluation Metrics For Regression Models, MAE

Computed

N = y.shape[0]

df_errors = pd.DataFrame()
df_errors['y_true'] = y
df_errors['yhat'] = y_predict_ols
df_errors['y_yhat'] = df_errors['y_true']-df_errors['yhat']                                                                                                                                                        
df_errors['abs(y_yhat)'] = df_errors['y_yhat'].abs()

mae_recomp = np.sum(df_errors['abs(y_yhat)'])/N
mae_recomp

43.27745202531506

sklearn

mae_sklearn = mean_absolute_error(y, y_predict_ols)                                         
mae_sklearn

43.27745202531506

Both

print("MAE recomputed:",mae_recomp, ", MAE sklearn:",mae_sklearn)

MAE recomputed: 43.27745202531506 , MAE sklearn: 43.27745202531506

2.6. MAPE: Mean Absolute Percentage Error

Definition

MAPE also known as Mean Absolute Percentage Deviation (MAPD). This measure is sensitive to relative errors. It’s also insensitive to a global scaling of the target value.

It’s also more robust to outliers than MAE.

Evaluation Metrics For Regression Models, MAPE

With 𝛆 a very small strict positive value, to avoid undefined value when dividing by a value of y=0.

The score could be high when the y is small, or when the difference of the absolute value is high.

Computed

N = y.shape[0]
epsilon = 0.0001

df_errors = pd.DataFrame()
df_errors['y_true'] = y
df_errors['yhat'] = y_predict_ols
df_errors['y_yhat'] = df_errors['y_true']-df_errors['yhat']
df_errors['abs(y_yhat)'] = df_errors['y_yhat'].abs()
df_errors['abs(y_true)'] = df_errors['y_true'].apply(lambda x: max(epsilon,np.abs(x)))                                
df_errors['(y_yhat)/y'] = df_errors['abs(y_yhat)']/df_errors['abs(y_true)']

mape_recomp = np.sum(df_errors['(y_yhat)/y'])/N
mape_recomp

3.2636897018293114

sklearn

mape_sklearn = mean_absolute_percentage_error(y, y_predict_ols)
mape_sklearn

3.2636897018293114

Both

print("MAPE recomputed:",mape_recomp, ", MAPE sklearn:",mape_sklearn)

MAPE recomputed: 3.2636897018293114 , MAPE sklearn: 3.2636897018293114

2.7. MedAE: Median Absolute Error

Definition

MedAE metric computes the median of all absolute residual values:

Evaluation Metrics For Regression Models, MedAE

This metric does not take into consideration the outliers.

Computed

N = y.shape[0]
epsilon = 0.0001     

df_errors = pd.DataFrame()
df_errors['y_true'] = y
df_errors['yhat'] = y_predict_ols
df_errors['y_yhat'] = df_errors['y_true']-df_errors['yhat']                                                         
df_errors['abs(y_yhat)'] = df_errors['y_yhat'].abs()

medae_recomp = np.median(df_errors['abs(y_yhat)'])
medae_recomp

38.5247645673831

sklearn

medae_sklearn = median_absolute_error(y, y_predict_ols)
medae_sklearn

38.5247645673831

Both

 print("MedAE recomputed:",medae_recomp, ", MedAE sklearn:",medae_sklearn)

MedAE recomputed: 38.5247645673831 , MedAE sklearn: 38.5247645673831

3. Evaluation metrics to compare among regression models

We will use 2 regression models, to fit the data and evaluate their performance. As explained at the beginning, we will not be splitting the dataset to train and test sets. It’s not the objective of this notebook.

features = df.columns[:-1]
X = df_norm[features]
y = df_norm['Y']

#### OLS: Ordinary Least Square
reg_ols = LinearRegression(fit_intercept=False)
reg_ols.fit(X,y)
#Predict values
y_predict_ols = reg_ols.predict(X)

####LASSO
reg_lasso = Lasso(alpha=1,fit_intercept=False) #Without cross-validation to find the best alpha, it's not the objective here
reg_lasso.fit(X,y)
#Predict values
y_predict_lasso = reg_lasso.predict(X)

Now that we fit the models, let’s compute the different metrics for each of them:

predicted_values = [y_predict_ols,y_predict_lasso]
models = ['OLS','LASSO']
measures_list = []
i=0


for y_predict in predicted_values:
    r2 = r2_score(y, y_predict)
    explained_variance = explained_variance_score(y, y_predict)
    mse = mean_squared_error(y, y_predict)
    rmse = mean_squared_error(y, y_predict,squared=False)
    mae = mean_absolute_error(y, y_predict)
    mape = mean_absolute_percentage_error(y, y_predict)
    medae = median_absolute_error(y, y_predict)
    measures_list.append([models[i],r2,explained_variance,mse,rmse,mae,mape,medae])
    i=+1


df_results = pd.DataFrame(data=measures_list, columns=['model','r2','explained_var','mse','rmse','mae','mape','medae'])
df_results

As shown in the results, the R²is higher for the OLS (0.52) than the Lasso model (0.36) (even if the value in absolute terms is not that high). At this stage, we can assume that the OLS is a better model than the Lasso for our dataset.

Furthermore, the MSE and RMSE are lower for the OLS than the Lasso. Once again, OLS is a good fit for our dataset.

MAE and MedAE are also showing better results for OLS than Lasso.

However, MAPE is lower for the Lasso than the OLS, showing that there are some values of the true y that could be high, making the relative value of the residual lower. It could be interesting to study the outliers in the dataset, and remove them if any.

Globally, the OLS model is showing better metrics than the Lasso model. Thus, between these 2 models, OLS will be a good fit for our dataset.

Summary

In this article, you have learned:

The most imporant evaluation metrics used to assess the performance of regression models:
- R2 Score
- Explained Variance Score
- MSE: Mean Squared Error
- RMSE : Root Mean Squared Error
- MAE: Mean Absolute Error
- MAPE: Mean Absolute Percentage Error
- MedAE: Median Absolute Error