As you know, understanding evaluation metrics is very important in order to measure your model’s performance and document your work. In the same way, when you want to optimize your current models, evaluating metrics also plays a very important role in defining the baseline performance that you want to challenge.

The process of model optimization consists of finding the best configuration (also known as hyperparameters) of the machine learning algorithm for a particular data distribution. You do not want to find hyperparameters that overfit the training data, in the same way that you do not want to find hyperparameters that underfit the training data.

You learned about overfitting and underfitting in *Chapter 1, Machine Learning Fundamentals*. In the same chapter, you also learned how to avoid these two types of modeling issues.

In this section, you will learn about some techniques that you can use to find the best configuration for a particular algorithm and dataset. You can combine these techniques of model optimization with other methods, such as cross-validation, to find the best set of hyperparameters for your model and avoid fitting issues.

*Important note*

*Always remember that you do not want to optimize your algorithm to the underlying training data but to the data distribution behind the training data, so that your model will work on the training data as well as the production data (the data that has never been exposed to your model during the training process). A machine learning model that works only on the training data is useless. That is why combining model-tuning techniques (such as the ones you will learn about next) with sampling techniques (such as cross-validation) makes all the difference when it comes to creating a good model.*

**Grid search** is probably the most popular method for model optimization. It consists of testing different combinations of the algorithm and selecting the best one. Here, there are two important points that you need to pay attention to:

- How to define the best configuration of the model
- How many configurations should be tested

The best model is defined based on an evaluation metric. In other words, you have to first define which metric you are going to use to evaluate the model’s performance. Secondly, you have to define how you are going to evaluate the model. Usually, cross-validation is used to evaluate the model on multiple datasets that have never been used for training.

In terms of the number of combinations/configurations, this is the most challenging part when playing with grid search. Each hyperparameter of an algorithm may have multiple or, sometimes, infinite possibilities of values. If you consider that an algorithm will usually have multiple hyperparameters, this becomes a function with quadratic cost, where the number of unique combinations to test is given as *the number of values of hyperparameter a * the number of values of hyperparameter b * the number of values of hyperparameter i*. *Table 7.1* shows how you could potentially set a grid search configuration for a decision tree model:

Criterion | Max depth | Min samples leaf |

Gini, Entropy | 2, 5 =, 10 | 10, 20, 30 |

Table 7.1 – Grid search configuration

In *Table 7.1*, there are three hyperparameters: **Criterion**, **Max depth**, and **Min samples leaf**. Each of these hyperparameters has a list of values for testing. That means by the end of the grid search process, you will have tested 18 models (2 * 3 * 3), where only the best one will be selected.

As you might have noticed, all the different combinations of those three hyperparameters will be tested. For example, consider the following:

- Criterion = Gini, Max depth = 2, Min samples leaf = 10
- Criterion = Gini, Max depth = 5, Min samples leaf = 10
- Criterion = Gini, Max depth = 10, Min samples leaf = 10

Some other questions that you might have could be as follows:

- Considering that a particular algorithm might have several hyperparameters, which ones should I tune?
- Considering that a particular hyperparameter might accept infinite values, which values should I test?

These are good questions, and grid search will not give you a straight answer for them. Instead, this is closer to an empirical process, where you have to test as much as you need to achieve your target performance.