In order to evaluate and compare the performance of regression models, various evaluation metrics can be employed, including R2 score, MSE, RMSE, MAE, and MAPE

In the forthcoming sections, you’ll discover comprehensive explanations for each metric. Additionally, leveraging the well-known “Diabetes” dataset from the LARS paper, a practical Python example will demonstrate the step-by-step computation of each metric. You’ll also explore the utilization of scikit-learn methods. Finally, by employing two regression models, you’ll learn how to leverage these measures to determine the optimal choice

In all the following metrics formulas, we define:

- y
_{i}as the true dependent variable for the observation i - y
_{i}as the predicted value for the observation i - N as the number of observations in our dataset
- p as the number of independent variables in our dataset

## 1. Loading the dataset and fitting the model

### 1.1. Dataset

We will use the Diabetes data used in the “Least Angle Regression” paper: N=442 patients, p=10 predictors. One row per patient, and the last column is the response variable.

You can find the dataset (raw) in: https://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt

Let’s first import all the libraries will be needed for this article:

```
import pandas as pd
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import normalize, Normalizer
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.metrics import r2_score, explained_variance_score, mean_squared_error, mean_absolute_error
from sklearn.metrics import mean_absolute_percentage_error, median_absolute_error
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
```

Then we load the dataset:

```
df = pd.read_csv("https://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt", sep="t")
print(df.shape)
df.head()
```

We will also normalize data, to have the same order of magnitude among the independent variables:

```
#### Scaled data: X-mean:
# Start by subtracting the mean with StandardScaler, without dividing by the std(with_std=False)
df_sc = StandardScaler(with_std=False).fit_transform(df)
df_sc = pd.DataFrame(data=df_sc, columns=df.columns)
#### Normalize data: divide each feature by its L2 norm
# If axis=0 no need to transpose, this available in "normalize" method but not in "Normalizer"
df_norm = normalize(df_sc.iloc[:,:-1],axis=0,norm='l2')
#or transpose the dataframe: (as axis=1 is the default value )
#df_norm=normalize(df_sc.iloc[:,:-1].T,norm='l2')
#(not to forget to transpose the results too, to go back to the initial shape)
df_norm = pd.DataFrame(data=df_norm, columns=df.columns[:-1])
df_norm['Y'] = df_sc['Y']
print("Normalized data: Scaled data/L2 norm")
df_norm.head()
```

### 1.2. Model to fit

To illustrate the computation of the different evaluation metrics for regression models, step by step, let’s take the predicted values given by the OLS model “y_predict_ols”. In this example we will not be splitting the dataset to train and test sets. It’s not the objective of this notebook.

```
features = df.columns[:-1]
X = df_norm[features]
y = df_norm['Y']
#### OLS: Ordinary Least Square
reg_ols = LinearRegression(fit_intercept=False)
reg_ols.fit(X,y)
#Predict values
y_predict_ols = reg_ols.predict(X)
```

## 2. Evaluation metrics for regression models:

### 2.1. R2 score

**Definition**

R2 score or the coefficient of determination measures how much the explanatory variables explain the variance of the dependent variable. It indicates if the fitted model is a good one, and if it could be used to predict the unseen values.

The best value of R^{2 } is 1, meaning that the model is a perfect fit of our dataset. It could be 0, meaning that the model is a constant and it will always predict the expected average value of the dependent variable y, regardless of the input variables. It could also be negative, the model could be arbitrarily worse.

With y_{i} the predicted value for the observation i, and y is the average value of the dependent variable:

Let’s introduce another measure RSS “Residual Sum of Square” which as its name indicates, compute the square of the residual error (difference between the real and the predicted value):

We can then, write the formula of the R^{2} as:

By adding more and more independent variables to the dataset, the R2 score will be mechanically improved, which does not mean that those variables are really improving the accuracy of the prediction. Therefore, the adjusted R2 is introduced to provide a more precise accuracy by taking into account how many independent variables are used:

**Computed**

```
df_errors = pd.DataFrame()
df_errors['y_true'] = y
df_errors['yhat'] = y_predict_ols
df_errors['y_yhat'] = df_errors['y_true'] - df_errors['yhat']
df_errors['(y_yhat)**2'] = df_errors['y_yhat']**2
mean_y = df_errors['y_true'].mean()
df_errors['(y_ymean)**2'] = (df_errors['y_true']-mean_y)**2
r2_recomp = 1-(np.sum(df_errors['(y_yhat)**2'])/np.sum(df_errors['(y_ymean)**2']))
r2_recomp
```

0.5177484222203499

**sklearn**

```
r2_sk_learn = r2_score(y, y_predict_ols)
r2_sk_learn
```

0.5177484222203499

**Both**

`print("r2 recomputed:",r2_recomp, ", r2_sk_learn:",r2_sk_learn)`

r2 recomputed: 0.5177484222203499 , r2_sk_learn: 0.5177484222203499

**Let’s Visualize**

```
plt.scatter(y_predict_ols,y)
plt.plot(y,y,'-r')
plt.annotate(r"R2={0}".format(round(r2_recomp,3)), xy=(200, 200),xytext=(-65, -10), textcoords='offset points',
fontsize=12)
plt.xlabel("y_predict_ols")
plt.ylabel("y_true")
plt.title("Regression: Predicted vs True y")
```

### 2.2. Explained Variance score

**Definition**

The explained variance score is computed as:

With:

and:

When the residuals have 0 mean, the explained variable score becomes equal to R^{2 } score:

**Computed**

```
N=y.shape[0]
df_errors = pd.DataFrame()
df_errors['y_true'] = y
df_errors['yhat'] = y_predict_ols
mean_y = df_errors['y_true'].mean()
df_errors['(y_ymean)**2'] = (df_errors['y_true']-mean_y)**2
var_y = np.sum(df_errors['(y_ymean)**2'])/N
df_errors['y_yhat'] = df_errors['y_true']-df_errors['yhat']
mean_yyhat = df_errors['y_yhat'].mean()
df_errors['(y_yhat_yyhatmean)**2'] = (df_errors['y_yhat']-mean_yyhat)**2
var_yyhat = np.sum(df_errors['(y_yhat_yyhatmean)**2'])/N
explained_variance_recomp = 1-var_yyhat/var_y
explained_variance_recomp
```

0.5177484222203499

**skl****earn**

```
explained_variance_sklearn = explained_variance_score(y, y_predict_ols)
explained_variance_sklearn
```

0.5177484222203499

**Both**

```
print("Explained variance recomputed:",explained_variance_recomp,", Explained variance from sk_learn:",
explained_variance_sklearn)
```

Explained variance recomputed: 0.5177484222203499 , Explained variance from sk_learn: 0.5177484222203499

One can see that the Explained Variance and the R2 score are equal. That’s because the mean of the residual is ~0:

```
mean_yyhat = df_errors['y_yhat'].mean()
mean_yyhat
```

-4.4810811672653376e-14

### 2.3. MSE: Mean Squared Error

**Definition**

Mean square Error compute the average squared error between the true value and the predicted one:

We can recall here that the RSS (Residual Sum of Square):

Then:

It gives more importance to the highest errors, thus it’s more sensitive to outliers.

**Computed**

```
N=y.shape[0]
df_errors = pd.DataFrame()
df_errors['y_true'] = y
df_errors['yhat'] = y_predict_ols
df_errors['y_yhat'] = df_errors['y_true']-df_errors['yhat']
df_errors['(y_yhat)**2'] = df_errors['y_yhat']**2
mse_recomp = np.sum(df_errors['(y_yhat)**2'])/N
mse_recomp
```

2859.69634758675

**sklearn**

```
mse_sklearn = mean_squared_error(y, y_predict_ols)
mse_sklearn
```

2859.69634758675

**Both**

`print("MSE recomputed:",mse_recomp, ", MSE from sk_learn:",mse_sklearn)`

MSE recomputed: 2859.69634758675 , MSE from sk_learn: 2859.69634758675

### 2.4. RMSE: Root Mean Squared Error

**Definition**

RMSE is the root squared of MSE

**Computed**

```
rmse_recomp = np.sqrt(mse_recomp)
rmse_recomp
```

53.47612876402657

**sklearn**

```
rmse_sklearn = mean_squared_error(y, y_predict_ols,squared=False)
rmse_sklearn
```

53.47612876402657

**Both**

`print("RMSE recomputed:",rmse_recomp, ", RMSE from sk_learn:",rmse_sklearn) `

RMSE recomputed: 53.47612876402657 , RMSE from sk_learn: 53.47612876402657

### 2.5. MAE: Mean Absolute Error

**Definition**

MAE is the mean absolute error, is more robust to outliers than the MSE:

**Computed**

```
N = y.shape[0]
df_errors = pd.DataFrame()
df_errors['y_true'] = y
df_errors['yhat'] = y_predict_ols
df_errors['y_yhat'] = df_errors['y_true']-df_errors['yhat']
df_errors['abs(y_yhat)'] = df_errors['y_yhat'].abs()
mae_recomp = np.sum(df_errors['abs(y_yhat)'])/N
mae_recomp
```

43.27745202531506

**sklearn**

```
mae_sklearn = mean_absolute_error(y, y_predict_ols)
mae_sklearn
```

43.27745202531506

**Both**

`print("MAE recomputed:",mae_recomp, ", MAE sklearn:",mae_sklearn)`

MAE recomputed: 43.27745202531506 , MAE sklearn: 43.27745202531506

### 2.6. MAPE: Mean Absolute Percentage Error

**Defi****nit****ion**

MAPE also known as Mean Absolute Percentage Deviation (MAPD). This measure is sensitive to relative errors. It’s also insensitive to a global scaling of the target value.

It’s also more robust to outliers than MAE.

With 𝛆 a very small strict positive value, to avoid undefined value when dividing by a value of y=0.

The score could be high when the y is small, or when the difference of the absolute value is high.

**Computed**

```
N = y.shape[0]
epsilon = 0.0001
df_errors = pd.DataFrame()
df_errors['y_true'] = y
df_errors['yhat'] = y_predict_ols
df_errors['y_yhat'] = df_errors['y_true']-df_errors['yhat']
df_errors['abs(y_yhat)'] = df_errors['y_yhat'].abs()
df_errors['abs(y_true)'] = df_errors['y_true'].apply(lambda x: max(epsilon,np.abs(x)))
df_errors['(y_yhat)/y'] = df_errors['abs(y_yhat)']/df_errors['abs(y_true)']
mape_recomp = np.sum(df_errors['(y_yhat)/y'])/N
mape_recomp
```

3.2636897018293114

**sklearn**

```
mape_sklearn = mean_absolute_percentage_error(y, y_predict_ols)
mape_sklearn
```

3.2636897018293114

**Both**

`print("MAPE recomputed:",mape_recomp, ", MAPE sklearn:",mape_sklearn)`

MAPE recomputed: 3.2636897018293114 , MAPE sklearn: 3.2636897018293114

### 2.7. MedAE: Median Absolute Error

**Defi****nit****ion**

MedAE metric computes the median of all absolute residual values:

This metric does not take into consideration the outliers.

**Computed**

```
N = y.shape[0]
epsilon = 0.0001
df_errors = pd.DataFrame()
df_errors['y_true'] = y
df_errors['yhat'] = y_predict_ols
df_errors['y_yhat'] = df_errors['y_true']-df_errors['yhat']
df_errors['abs(y_yhat)'] = df_errors['y_yhat'].abs()
medae_recomp = np.median(df_errors['abs(y_yhat)'])
medae_recomp
```

38.5247645673831

**sklearn**

```
medae_sklearn = median_absolute_error(y, y_predict_ols)
medae_sklearn
```

38.5247645673831

**Both**

` print("MedAE recomputed:",medae_recomp, ", MedAE sklearn:",medae_sklearn)`

MedAE recomputed: 38.5247645673831 , MedAE sklearn: 38.5247645673831

## 3. Evaluation metrics to compare among regression models

We will use 2 regression models, to fit the data and evaluate their performance. As explained at the beginning, we will not be splitting the dataset to train and test sets. It’s not the objective of this notebook.

```
features = df.columns[:-1]
X = df_norm[features]
y = df_norm['Y']
#### OLS: Ordinary Least Square
reg_ols = LinearRegression(fit_intercept=False)
reg_ols.fit(X,y)
#Predict values
y_predict_ols = reg_ols.predict(X)
####LASSO
reg_lasso = Lasso(alpha=1,fit_intercept=False) #Without cross-validation to find the best alpha, it's not the objective here
reg_lasso.fit(X,y)
#Predict values
y_predict_lasso = reg_lasso.predict(X)
```

Now that we fit the models, let’s compute the different metrics for each of them:

```
predicted_values = [y_predict_ols,y_predict_lasso]
models = ['OLS','LASSO']
measures_list = []
i=0
for y_predict in predicted_values:
r2 = r2_score(y, y_predict)
explained_variance = explained_variance_score(y, y_predict)
mse = mean_squared_error(y, y_predict)
rmse = mean_squared_error(y, y_predict,squared=False)
mae = mean_absolute_error(y, y_predict)
mape = mean_absolute_percentage_error(y, y_predict)
medae = median_absolute_error(y, y_predict)
measures_list.append([models[i],r2,explained_variance,mse,rmse,mae,mape,medae])
i=+1
df_results = pd.DataFrame(data=measures_list, columns=['model','r2','explained_var','mse','rmse','mae','mape','medae'])
df_results
```

As shown in the results, the R^{2 }is higher for the OLS (0.52) than the Lasso model (0.36) (even if the value in absolute terms is not that high). At this stage, we can assume that the OLS is a better model than the Lasso for our dataset.

Furthermore, the MSE and RMSE are lower for the OLS than the Lasso. Once again, OLS is a good fit for our dataset.

MAE and MedAE are also showing better results for OLS than Lasso.

However, MAPE is lower for the Lasso than the OLS, showing that there are some values of the true y that could be high, making the relative value of the residual lower. It could be interesting to study the outliers in the dataset, and remove them if any.

Globally, the OLS model is showing better metrics than the Lasso model. Thus, between these 2 models, OLS will be a good fit for our dataset.

## Summary

In this article, you have learned:

- The most imporant evaluation metrics used to assess the performance of regression models:
**R2 Score****Explained Variance Score****MSE**: Mean Squared Error**RMSE**: Root Mean Squared Error**MAE**: Mean Absolute Error**MAPE**: Mean Absolute Percentage Error**MedAE**: Median Absolute Error

- Their mathematical formulas
- Their computation step by step using a pratical example in Python
- Their computation using scikit-learn library
- The application of these metrics to choose between 2 models: OLS and Ridge

I hope you enjoyed reading me. Did you find this ressource useful? leave me a comment to give me your feedback👇.