Your History

Menu

Risk of Optimal Model

Prerequisites

Empirical Risk of a Model | \(R^{emp}(h) = \frac{1}{N} \sum^{N}_{i=1} L (h(u_i), y_i)\)
Optimal Model | \( \hat{f} \)
Hypothesis Space | \( \mathcal{H} \)

Description

The risk of an optimal model describes an empirical method for determining the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \) for a given problem. It accomplishes this task by evaluating all candidate models \( \htmlClass{sdt-0000000084}{h} \) from the hypothesis space \( \htmlClass{sdt-0000000039}{\mathcal{H}} \) on a sampled dataset and selecting the model with the minimum risk.

\[\htmlClass{sdt-0000000002}{\hat{f}} = h_{opt} = \underset{h \in \htmlClass{sdt-0000000039}{\mathcal{H}}}{argmin} \hspace{0.2cm} \frac{1}{N} \sum^{N}_{i=1} L (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i), \htmlClass{sdt-0000000037}{y}_i)\]

Symbols Used:

This symbol denotes the optimal model for a problem.

\( y \)

This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input.

\( \mathcal{H} \)

This is the symbol representing the set of possible models.

\( h \)

This symbol denotes a model in machine learning.

\( u \)

This symbol denotes the input of a model.

Derivation

  1. Recall the definition of the empirical risk of a model
    \[\htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}) = \frac{1}{N} \sum^{N}_{i=1} L (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i), \htmlClass{sdt-0000000037}{y}_i)\]
  2. Now suppose that all our models \( \htmlClass{sdt-0000000084}{h} \) are drawn from a hypothesis space \( \htmlClass{sdt-0000000039}{\mathcal{H}} \):

    The symbol \( \mathcal{H} \) denotes the set of possible models, often from a particular class like "polynomials of any degree" or "multi-layer perceptron networks". For any learning algorithm, \( \mathcal{H} \) indicates the space where an optimal model may be found.

  3. Using the definition of the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \):

    The symbol \(\hat{f}\) denotes the optimal model for a problem. It yields the lowest risk \( \htmlClass{sdt-0000000062}{R} \) for pairs of inputs and outputs. The goal of machine learning is to optimize \( \htmlClass{sdt-0000000084}{h} \) until it becomes \(\hat{f}\).


    We observe that we need to take the model \( \htmlClass{sdt-0000000084}{h} \) with the lowest risk. This can be done using the argmin operator.
  4. Therefore, we obtain
    \[\htmlClass{sdt-0000000002}{\hat{f}} = \underset{h \in \htmlClass{sdt-0000000039}{\mathcal{H}}}{argmin} \hspace{0.2cm} \frac{1}{N} \sum^{N}_{i=1} L (\htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i), \htmlClass{sdt-0000000037}{y}_i) \]
    as required

Example

Suppose, we have the following models with their empirical risk calculated on an arbitrary dataset of samples:

\[\begin{align*} \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}_1) &= 3 \\ \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}_2) &= 2.3 \\ \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}_3) &= 6 \end{align*}\]
Using the equation described above, we conclude observe that the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \) is the model \( \htmlClass{sdt-0000000131}{X} \)h with the lowest risk.

Therefore, we obtain \( \htmlClass{sdt-0000000002}{\hat{f}} \) = \(\htmlClass{sdt-0000000084}{h}_2\).

References

  1. Jaeger, H. (n.d.). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved April 14, 2024, from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf