Loss Minimization with Regularization

Description

It is often desirable to find models with simpler forms, such as weights close to zero. To do this, a regularization function over the model parameters is added to the usual loss minimization problem (\( \alpha\) is a constant hyperparameter):

Symbols Used:

\( \hat{f} \)	This symbol denotes the optimal model for a problem.
\( y \)	This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input.
\( \mathcal{H} \)	This is the symbol representing the set of possible models.
\( \theta \)	This is the symbol we use for model weights/parameters.
\( L \)	This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be.
\( \textup{reg} \)	This is the symbol used for representing a regularization function.
\( h \)	This symbol denotes a model in machine learning.
\( u \)	This symbol denotes the input of a model.

Example

The following example shows how this regularized formulation of the optimization target can result in models with simpler parameters (here, closer to zero):

Consider a model \( \htmlClass{sdt-0000000084}{h}_1 \) with loss \( \htmlClass{sdt-0000000072}{L}_1 = 10 \) for the weights \( \htmlClass{sdt-0000000066}{\theta}_1 = (2, 4, 6) \).
Consider another model \( \htmlClass{sdt-0000000084}{h}_2 \) with higher loss \( \htmlClass{sdt-0000000072}{L}_2 = 18 \) for the weights \( \htmlClass{sdt-0000000066}{\theta}_2 = (1, 2, 3) \).
Consider the L2 regularizer, giving:
\[ \htmlClass{sdt-0000000076}{\textup{reg}}(\htmlClass{sdt-0000000066}{\theta}_1) = 2^2 + 4^2 + 6^2 = 56 \\ \htmlClass{sdt-0000000076}{\textup{reg}}(\htmlClass{sdt-0000000066}{\theta}_2) = 1^2 + 2^2 + 3^2 = 14 \]
Consider \( \alpha^2 = 0.25 \) as the hyperparameter controlling the effect of the regularization term. Then:
\[ L_1 + \alpha^2 \htmlClass{sdt-0000000076}{\textup{reg}}(\htmlClass{sdt-0000000066}{\theta}_1) = 10 + 0.25 \cdot 56 = 10 + 14 = 24 \\ L_2 + \alpha^2 \htmlClass{sdt-0000000076}{\textup{reg}}(\htmlClass{sdt-0000000066}{\theta}_2) = 18 + 0.25 \cdot 14 = 18 + 3.5 = 21.5 \]
The optimization process will choose \( \htmlClass{sdt-0000000084}{h}_2 \) over \( \htmlClass{sdt-0000000084}{h}_1 \) despite the loss of \( \htmlClass{sdt-0000000084}{h}_1 \) being lower.

Your History

Loss Minimization with Regularization

Description

Symbols Used:

Example

References