The risk of an optimal model describes an empirical method for determining the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \) for a given problem. It accomplishes this task by evaluating all candidate models \( \htmlClass{sdt-0000000084}{h} \) from the hypothesis space \( \htmlClass{sdt-0000000039}{\mathcal{H}} \) on a sampled dataset and selecting the model with the minimum risk.
\( \hat{f} \) | This symbol denotes the optimal model for a problem. |
\( y \) | This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input. |
\( \mathcal{H} \) | This is the symbol representing the set of possible models. |
\( h \) | This symbol denotes a model in machine learning. |
\( u \) | This symbol denotes the input of a model. |
The symbol \( \mathcal{H} \) denotes the set of possible models, often from a particular class like "polynomials of any degree" or "multi-layer perceptron networks". For any learning algorithm, \( \mathcal{H} \) indicates the space where an optimal model may be found.
The symbol \(\hat{f}\) denotes the optimal model for a problem. It yields the lowest risk \( \htmlClass{sdt-0000000062}{R} \) for pairs of inputs and outputs. The goal of machine learning is to optimize \( \htmlClass{sdt-0000000084}{h} \) until it becomes \(\hat{f}\).
Suppose, we have the following models with their empirical risk calculated on an arbitrary dataset of samples:
\[\begin{align*} \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}_1) &= 3 \\ \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}_2) &= 2.3 \\ \htmlClass{sdt-0000000062}{R}^{emp}(\htmlClass{sdt-0000000084}{h}_3) &= 6 \end{align*}\]
Using the equation described above, we conclude observe that the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \) is the model \( \htmlClass{sdt-0000000131}{X} \)h with the lowest risk.
Therefore, we obtain \( \htmlClass{sdt-0000000002}{\hat{f}} \) = \(\htmlClass{sdt-0000000084}{h}_2\).