Your History

Menu

Approximation of Performance Landscape

Description

The performance landscape of \( \htmlClass{sdt-0000000062}{R} \)(\( \htmlClass{sdt-0000000083}{\theta} \)) can be approximated using Taylor approximation. It allows for estimating the model's risk in the neighborhood of parameters \( \htmlClass{sdt-0000000083}{\theta} \).

\[\htmlClass{sdt-0000000062}{R}(\htmlClass{sdt-0000000083}{\theta}) = \htmlClass{sdt-0000000080}{\sum}_{i=1}^D\htmlClass{sdt-0000000003}{x}_i\htmlClass{sdt-0000000083}{\theta}_i^2\]

Symbols Used:

This is a symbol for any generic variable. It can hold any value, whether that be an integer or a real number, or a complex number, or a matrix etc.

\( R \)

This symbol denotes the risk of a model.

\( \sum \)

This is the summation symbol in mathematics, it represents the sum of a sequence of numbers.

\( \theta \)

This symbol represents the parameters of the model

Derivation

Consider the performance landscape of \( \htmlClass{sdt-0000000062}{R} \)(\( \htmlClass{sdt-0000000083}{\theta} \)).

Performance Landscape

Notice, that \( \htmlClass{sdt-0000000062}{R} \)(\( \htmlClass{sdt-0000000083}{\theta} \)) can be a highly complex function with high curvature. However, if we choose a single point \( \htmlClass{sdt-0000000083}{\theta} \) in the domain \( \htmlClass{sdt-0000000052}{\Theta} \), we can calculate the second derivatives of the risk: \[(\frac{\delta^2 \htmlClass{sdt-0000000062}{R}}{\delta \htmlClass{sdt-0000000083}{\theta}^2}),\] which can now be used to locally approximate the risk - \( \htmlClass{sdt-0000000062}{R} \). If we denote this approximation as \[\hat{\htmlClass{sdt-0000000062}{R}}(\htmlClass{sdt-0000000083}{\theta}),\] then the final shape of the performance landscape is approximately:

\[\hat{\htmlClass{sdt-0000000062}{R}}(\htmlClass{sdt-0000000083}{\theta}) = \htmlClass{sdt-0000000080}{\sum}_{i=1}^D(\frac{\delta^2 \htmlClass{sdt-0000000062}{R}}{\delta \htmlClass{sdt-0000000083}{\theta}^2})\htmlClass{sdt-0000000083}{\theta}_i^2\]

Note that this formulation assumes that we chose the origin (only zeros) as our parameters. If we chose a different point, the Taylor expansion around \( \htmlClass{sdt-0000000083}{\theta} \) results in a much more complex formula.

Example

Let's say that \[\htmlClass{sdt-0000000062}{R}(\htmlClass{sdt-0000000083}{\theta})=\htmlClass{sdt-0000000127}{\sin}^2(\htmlClass{sdt-0000000083}{\theta}_1)-\htmlClass{sdt-0000000124}{\cos}^2(\htmlClass{sdt-0000000083}{\theta}_2).\]

This function looks in the following way

Risk Function plot

Then, we can approximate this function with \[\hat{\htmlClass{sdt-0000000062}{R}}(\htmlClass{sdt-0000000083}{\theta}) = \htmlClass{sdt-0000000083}{\theta}_1^2 + \htmlClass{sdt-0000000083}{\theta}_2^2\]. If we plot this approximation, we see that the behavior of these functions is similar at the origin.

Taylor Approximation

Taylor Approximation

See this visualization on Desmos.

References

  1. Jaeger, H. (n.d.). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved April 20, 2024, from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf