The fundamental goal of supervised learning is to discover the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \) that minimizes risk \( \htmlClass{sdt-0000000062}{R} \) when applied to unseen testing data drawn from the distributions of random variables \( \htmlClass{sdt-0000000013}{U} \) and \( \htmlClass{sdt-0000000021}{Y} \). However, the model's only source of knowledge is the training data, comprising \(N\) samples: \(\htmlClass{sdt-0000000057}{S} = (\htmlClass{sdt-0000000103}{u}_i, \htmlClass{sdt-0000000037}{y}_i)_{i=1,...,N} \).
Because we lack access to the testing data, we optimize the model using the training data, aiming to minimize its empirical risk ("training error"). The underlying hope is that by minimizing empirical risk, the model will generalize well to the unseen testing data.
\( y \) | This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input. |
\( \mathcal{H} \) | This is the symbol representing the set of possible models. |
\( S \) | This symbol describes the pair of inputs and ground truths \((\htmlClass{sdt-0000000103}{u}_i, \htmlClass{sdt-0000000037}{y}_i)\) used to train a model. |
\( h \) | This symbol denotes a model in machine learning. |
\( u \) | This symbol denotes the input of a model. |
This symbol \(S\) describes the pair of inputs and ground truths \((\htmlClass{sdt-0000000103}{u}_i, \htmlClass{sdt-0000000037}{y}_i)_{i=1,...,N}\) used to train a model where \(N\) represents the total number of data points. This symbol is also known as the training data. The risk calculated using these samples is known as the empirical risk.
The symbol for a model is \(h\). It represents a machine learning model that takes an input and gives an output.
The symbol \( \mathcal{H} \) denotes the set of possible models, often from a particular class like "polynomials of any degree" or "multi-layer perceptron networks". For any learning algorithm, \( \mathcal{H} \) indicates the space where an optimal model may be found.