* In *Chapter 7

Very often, when the model residuals *do* present a pattern and are *not* randomly distributed, it is because the existing relationship in the data is not linear, but non-linear, so another modeling technique must be applied. In the next subsection, you will learn how you can interpret regression models.

It is also good to know how to interpret a linear regression model. Sometimes, you use linear regression not necessarily to create a predictive model but to do a regression analysis. You can then use regression analysis to understand the relationship between the independent and dependent variables.

Looking back at the regression equation (*y = 1021.212 * x + 53.30*), you can see the two terms: alpha or slope (*1021.20*) and beta or *y* intercept (*53.3*). You can interpret this model as follows: *for each additional year of working experience, you will increase your salary by $1,021.20*. Also, note that when “years of experience” is equal to 0, the expected salary is going to be $53.30 (this is the point where the straight line crosses the *y* axis).

From a broad perspective, your regression analysis should answer the following question: for each extra unit that is added to the independent variable (slope), what is the average change in the dependent variable?

At this point, you have a much better idea of regression models! There is just one other very important topic that you should be aware of, regardless of whether it will come up in the exam or not, which is the parsimony aspect of your model.

You have already heard about parsimony in *Chapter 1, Machine Learning Fundamentals*. This is the ability to prioritize simple models over complex ones. Looking into regression models, you might have to use more than one feature to predict your outcome. This is also known as a multiple regression model.

When that is the case, the R and R squared coefficients tend to reward more complex models with more features. In other words, if you keep adding new features to a multiple regression model, you will come up with higher R and R squared coefficients. That is why you *cannot* anchor your decisions *only* based on those two metrics.

Another additional metric that you could use (apart from R, R squared, MSE, and RMSE) is known as **adjusted R squared**. This metric is penalized when you add extra features to the model that do not bring any real value. In *Table 6.6*, you can see when a model is starting to lose parsimony.

Number of features | R squared | Adjusted R squared |

1 | 81 | 79 |

2 | 83 | 82 |

3 | 88 | 87 |

4 | 90 | 86 |

5 | 92 | 85 |

Table 6.6 – Comparing R squared and adjusted R squared

Here, you can conclude that maintaining three variables in the model is better than maintaining four or five. Adding four or five variables to the model will increase the R squared (as expected), but decrease the adjusted R squared.

At this point, you should have a very good understanding of regression models. Now, let us check what AWS offers in terms of built-in algorithms for this class of models.