Description
Minimizing Mean Squared Error Loss (MSE) means finding the optimal model with respect to the MSE Loss. This is the goal of many Machine Learning problems given some input-output pairs of data, where the output is continuous.
Example
Consider the same example of quadratic polynomial fitting given on the Mean Squared Error Loss (MSE) page:
- Assume we want to fit a function to the values \( \htmlClass{sdt-0000000037}{y} = (1, 0, 2) \), which we assume to be generated by a quadratic polynomial.
- We choose a model \( \htmlClass{sdt-0000000084}{h} \) in the form of a quadratic polynomial: \( \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) = a_0 + a_1 \htmlClass{sdt-0000000103}{u}_i + a_2 \htmlClass{sdt-0000000103}{u}_i^2 \) with unknown coefficients \( a_0, a_1, a_2 \).
- Now consider the inputs \( \htmlClass{sdt-0000000103}{u} = (-1, 0, 1) \).
Different values for \( a_0, a_1, a_2 \) will give different model predictions, thus different MSE Loss. Optimizing between all possible values of \( a_0, a_1, a_2 \) is the same as finding:
\[ \htmlClass{sdt-0000000002}{\hat{f}} = \argmin_{\htmlClass{sdt-0000000084}{h} \in \htmlClass{sdt-0000000039}{\mathcal{H}}} \frac{1}{N} \sum_{i=1}^{N} \left( \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}_i) - \htmlClass{sdt-0000000037}{y}_i \right)^2 \]
where the hypothesis space \( \htmlClass{sdt-0000000039}{\mathcal{H}} \) is the set of all quadratic polynomials. - Since \( \htmlClass{sdt-0000000037}{y} = (1, 0, 2) \) uniquely determines the quadratic polynomial \( y = \frac{1}{2}x + \frac{3}{2}x^2 \), the optimal model \( \htmlClass{sdt-0000000002}{\hat{f}} \) will be the quadratic polynomial with \( a_0 = 0, a_1 = \frac{1}{2}, a_2 = \frac{3}{2} \).