Your History

Menu

MSE Minimization

Description

Minimizing Mean Squared Error Loss (MSE) means finding the optimal model with respect to the MSE Loss. This is the goal of many Machine Learning problems given some input-output pairs of data, where the output is continuous.

\[\htmlClass{sdt-0000000002}{\hat{f}} = \argmin_{\htmlClass{sdt-0000000084}{h} \in \htmlClass{sdt-0000000039}{\mathcal{H}}} \frac{1}{N} \sum_{i=1}^{N} \left( \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) - \htmlClass{sdt-0000000037}{y}_i \right)^2\]

Symbols Used:

This symbol denotes the optimal model for a problem.

\( y \)

This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input.

\( \mathcal{H} \)

This is the symbol representing the set of possible models.

\( h \)

This symbol denotes a model in machine learning.

\( u \)

This symbol denotes the input of a model.

Example

Consider the same example of quadratic polynomial fitting given on the Mean Squared Error Loss (MSE) page:

  1. Assume we want to fit a function to the values \( \htmlClass{sdt-0000000037}{y} = (1, 0, 2) \), which we assume to be generated by a quadratic polynomial.
  2. We choose a model \( \htmlClass{sdt-0000000084}{h} \) in the form of a quadratic polynomial: \( \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) = a_0 + a_1 \htmlClass{sdt-0000000103}{u}_i + a_2 \htmlClass{sdt-0000000103}{u}_i^2 \) with unknown coefficients \( a_0, a_1, a_2 \).
  3. Now consider the inputs \( \htmlClass{sdt-0000000103}{u} = (-1, 0, 1) \).
    Different values for \( a_0, a_1, a_2 \) will give different model predictions, thus different MSE Loss. Optimizing between all possible values of \( a_0, a_1, a_2 \) is the same as finding:
    \[ \htmlClass{sdt-0000000002}{\hat{f}} = \argmin_{\htmlClass{sdt-0000000084}{h} \in \htmlClass{sdt-0000000039}{\mathcal{H}}} \frac{1}{N} \sum_{i=1}^{N} \left( \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) - \htmlClass{sdt-0000000037}{y}_i \right)^2 \]
    where the hypothesis space \( \htmlClass{sdt-0000000039}{\mathcal{H}} \) is the set of all quadratic polynomials.
  4. Since \( \htmlClass{sdt-0000000037}{y} = (1, 0, 2) \) uniquely determines the quadratic polynomial \( y = \frac{1}{2}x + \frac{3}{2}x^2 \), the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \) will be the quadratic polynomial with \( a_0 = 0, a_1 = \frac{1}{2}, a_2 = \frac{3}{2} \).

References

  1. Jaeger, H. (n.d.). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved April 20, 2024, from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf