Your History

Menu

RNN Update Equation

Description

Contrary to Feedforward Neural Networks, Recurrent Neural Networks (RNNs) have an internal state. Given its current state, the RNN's next state can be computed using an update equation. This gives rise to a recurrence relation between states.

\[\htmlClass{sdt-0000000125}{\mathbf{h}}(\htmlClass{sdt-0000000117}{n}+1)=\htmlClass{sdt-0000000051}{\sigma}\left(\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000125}{\mathbf{h}}(\htmlClass{sdt-0000000117}{n})\right)\]

Symbols Used:

This symbol represents the activation function. It maps real values to other real values in a non-linear way.

\( \mathbf{W} \)

This symbol represents the matrix containing the weights and biases of a layer in a neural network.

\( n \)

This symbol represents any given whole number, \( n \in \htmlClass{sdt-0000000014}{\mathbb{W}}\).

\( \mathbf{h} \)

This symbol represents the hidden state of a recurrent neural network.

Derivation

\( \htmlClass{sdt-0000000059}{\mathbf{W}} \) is an \(\htmlClass{sdt-0000000117}{n} \times \htmlClass{sdt-0000000117}{n}\) matrix and \(\htmlClass{sdt-0000000125}{\mathbf{h}}\) is an \(\htmlClass{sdt-0000000117}{n}\)-dimensional vector. This ensures that the matrix product is defined, and that the hidden state retains the same dimensions over time. Often we use \(\tanh\) as the activation function \(\htmlClass{sdt-0000000051}{\sigma}\) to induce a non-linear relationship between states.

Example

  1. Suppose we have an initial state \(\htmlClass{sdt-0000000125}{\mathbf{h}}(0)=\begin{bmatrix}-1\\1\end{bmatrix}\), a weight matrix \(\htmlClass{sdt-0000000059}{\mathbf{W}} =\begin{bmatrix}0&1\\1&0\end{bmatrix}\), and the activation function \(\htmlClass{sdt-0000000051}{\sigma} =\tanh\).
  2. The matrix product \(\htmlClass{sdt-0000000059}{\mathbf{W}}\htmlClass{sdt-0000000125}{\mathbf{h}}(0)\) can be computed with matrix multiplication\[\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000125}{\mathbf{h}}(0)=\begin{bmatrix}0&1\\1&0\end{bmatrix}\begin{bmatrix}-1\\1\end{bmatrix}=\begin{bmatrix}1\\-1\end{bmatrix}.\]
  3. By applying the activation function \(\htmlClass{sdt-0000000051}{\sigma}\) elementwise to the matrix product, we can compute the next state \[\begin{align*}\htmlClass{sdt-0000000125}{\mathbf{h}}(1)&=\htmlClass{sdt-0000000051}{\sigma}\left(\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000125}{\mathbf{h}}(0)\right)\\&=\tanh\left(\begin{bmatrix}1\\-1\end{bmatrix}\right)\\&=\begin{bmatrix}\tanh(1)\\\tanh(-1)\end{bmatrix}\\&\approx \begin{bmatrix}0.762\\-0.762\end{bmatrix}.\end{align*}\]

References

  1. Jaeger, H. (2024, April 26). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf