Risk of Optimal Model

Description

The risk of an optimal model describes an empirical method for determining the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \) for a given problem. It accomplishes this task by evaluating all candidate models \( \htmlClass{sdt-0000000084}{h} \) from the hypothesis space \( \htmlClass{sdt-0000000039}{\mathcal{H}} \) on a sampled dataset and selecting the model with the minimum risk.

Symbols Used:

\( \hat{f} \)	This symbol denotes the optimal model for a problem.
\( y \)	This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input.
\( \mathcal{H} \)	This is the symbol representing the set of possible models.
\( h \)	This symbol denotes a model in machine learning.
\( u \)	This symbol denotes the input of a model.

Derivation

Recall the definition of the empirical risk of a model
\[\htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}) = \frac{1}{N} \sum^{N}_{i=1} L (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i), \htmlClass{sdt-0000000037}{y}_i)\]
Now suppose that all our models \( \htmlClass{sdt-0000000084}{h} \) are drawn from a hypothesis space \( \htmlClass{sdt-0000000039}{\mathcal{H}} \):
The symbol \( \mathcal{H} \) denotes the set of possible models, often from a particular class like "polynomials of any degree" or "multi-layer perceptron networks". For any learning algorithm, \( \mathcal{H} \) indicates the space where an optimal model may be found.
Using the definition of the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \):
The symbol \(\hat{f}\) denotes the optimal model for a problem. It yields the lowest risk \( \htmlClass{sdt-0000000062}{R} \) for pairs of inputs and outputs. The goal of machine learning is to optimize \( \htmlClass{sdt-0000000084}{h} \) until it becomes \(\hat{f}\).

We observe that we need to take the model \( \htmlClass{sdt-0000000084}{h} \) with the lowest risk. This can be done using the argmin operator.
Therefore, we obtain
\[\htmlClass{sdt-0000000002}{\hat{f}} = \underset{h \in \htmlClass{sdt-0000000039}{\mathcal{H}}}{argmin} \hspace{0.2cm} \frac{1}{N} \sum^{N}_{i=1} L (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i), \htmlClass{sdt-0000000037}{y}_i) \]
as required

Example

Suppose, we have the following models with their empirical risk calculated on an arbitrary dataset of samples:

\[\begin{align*} \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}_1) &= 3 \\ \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}_2) &= 2.3 \\ \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}_3) &= 6 \end{align*}\]
Using the equation described above, we conclude observe that the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \) is the model \( \htmlClass{sdt-0000000131}{X} \)h with the lowest risk.

Therefore, we obtain \( \htmlClass{sdt-0000000002}{\hat{f}} \) = \(\htmlClass{sdt-0000000084}{h}_2\).

Your History

Risk of Optimal Model

Prerequisites

Description

\[\htmlClass{sdt-0000000002}{\hat{f}} = h_{opt} = \underset{h \in \htmlClass{sdt-0000000039}{\mathcal{H}}}{argmin} \hspace{0.2cm} \frac{1}{N} \sum^{N}_{i=1} L (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i), \htmlClass{sdt-0000000037}{y}_i)\]

Symbols Used:

Derivation

Example

References