We can write the gradient of the empirical risk as a sum of gradients. This is also exactly how the gradient is computed in practice. By doing a sweep through the training set (an 'epoch'), we can compute the gradient by aggregating the gradient of the loss function with respect to each training sample.
\( \mathcal{N} \) | This is the symbol used for a function approximator, typically a neural network. |
\( i \) | This is the symbol for an iterator, a variable that changes value to refer to a sequence of elements. |
\( R \) | This symbol denotes the risk of a model. |
\( \theta \) | This is the symbol we use for model weights/parameters. |
\( \mathbf{y} \) | This symbol represents the output activation vector of a neural network. |
\( L \) | This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be. |
\( \nabla \) | This symbol represents the gradient of a function. |
\( u \) | This symbol denotes the input of a model. |