Description

Recurrent Neural Networks (RNNs) derive their power from the recursive nature of their operation. At each time step, an RNN updates its hidden state by combining the current input with its hidden state from the previous time step.

This equation highlights the inner workings of an RNN and illustrates how information from several time steps in the past still have a tangible effect on the current state of the network.

\[\begin{align} \htmlClass{sdt-0000000094}{\mathcal{x}}(n) &= \\ &= \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n - 1) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n)) \\ & = \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} ( \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n - 2) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n - 1))) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n)) \\ &= \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} (\htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} ( \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n - 3) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n-2))) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n - 1))) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n)) \\ &= ... \end{align}\]

Symbols Used:

\( \mathbf{W} \)	This symbol represents the matrix containing the weights and biases of a layer in a neural network.
\( \sigma \)	This symbol represents the sigmoid function.
\( \mathcal{x} \)	This symbol represents the activations of a neural network layer in vector form.
\( u \)	This symbol denotes the input of a model.

Derivation

Consider the update equation for the activation state of an RNN:

\[\begin{align*} \htmlClass{sdt-0000000094}{\mathcal{x}}(n) &= \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n-1) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n) + \htmlClass{sdt-0000000082}{\mathcal{b}}) \\ \htmlClass{sdt-0000000068}{\mathbf{y}}(n) &= f(\htmlClass{sdt-0000000059}{\mathbf{W}}^{out} \htmlClass{sdt-0000000094}{\mathcal{x}}(n)) \end{align*}\]

For the sake of simplicity, we will drop the bias. Thus, we obtain:
\[\htmlClass{sdt-0000000094}{\mathcal{x}}(n) = \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n - 1) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n))\]
We can now expand the definition of \(\htmlClass{sdt-0000000094}{\mathcal{x}}(n - 1)\) by considering the activation and input from the previous timestep:
\[\htmlClass{sdt-0000000094}{\mathcal{x}}(n) = \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} ( \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n - 2) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n - 1))) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n)) \]
We can similarly break down \(\htmlClass{sdt-0000000094}{\mathcal{x}}(n-2)\) further into the past:
\[\htmlClass{sdt-0000000094}{\mathcal{x}}(n) = \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} (\htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} ( \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n - 3) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n-2))) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n - 1))) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n))\]
This process can be repeated until we obtain the first input in the equation. Thus, we have recursively expressed the RNN update equation as required.

References

Jaeger, H. (2024, April 26). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf

Your History

Recursive Definition of Recurrent Neural Networks

Prerequisites

Description

Symbols Used:

Derivation

References