It is often desirable to find models with simpler forms, such as weights close to zero. To do this, a regularization function over the model parameters is added to the usual loss minimization problem (\( \alpha\) is a constant hyperparameter):
\( \hat{f} \) | This symbol denotes the optimal model for a problem. |
\( y \) | This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input. |
\( \mathcal{H} \) | This is the symbol representing the set of possible models. |
\( \theta \) | This is the symbol we use for model weights/parameters. |
\( L \) | This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be. |
\( \textup{reg} \) | This is the symbol used for representing a regularization function. |
\( h \) | This symbol denotes a model in machine learning. |
\( u \) | This symbol denotes the input of a model. |
The following example shows how this regularized formulation of the optimization target can result in models with simpler parameters (here, closer to zero):