Description
Recurrent Neural Networks (RNNs) derive their power from the recursive nature of their operation. At each time step, an RNN updates its hidden state by combining the current input with its hidden state from the previous time step.
This equation highlights the inner workings of an RNN and illustrates how information from several time steps in the past still have a tangible effect on the current state of the network.
\[\begin{align*}
\htmlClass{sdt-0000000094}{\mathcal{x}}(n) &= \\
&= \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n - 1) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n)) \\
& = \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} ( \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n - 2) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n - 1))) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n)) \\
&= \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} (\htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} ( \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n - 3) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n-2))) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n - 1))) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n)) \\
&= ...
\end{align*}\]
Derivation
Consider the update equation for the activation state of an RNN:
\[\begin{align*}
\htmlClass{sdt-0000000094}{\mathcal{x}}(n) &= \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n-1) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n) + \htmlClass{sdt-0000000082}{\mathcal{b}}) \\
\htmlClass{sdt-0000000068}{\mathbf{y}}(n) &= f(\htmlClass{sdt-0000000059}{\mathbf{W}}^{out} \htmlClass{sdt-0000000094}{\mathcal{x}}(n))
\end{align*}\]
- For the sake of simplicity, we will drop the bias. Thus, we obtain:
\[\htmlClass{sdt-0000000094}{\mathcal{x}}(n) = \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n - 1) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n))\] - We can now expand the definition of \(\htmlClass{sdt-0000000094}{\mathcal{x}}(n - 1)\) by considering the activation and input from the previous timestep:
\[\htmlClass{sdt-0000000094}{\mathcal{x}}(n) = \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} ( \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n - 2) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n - 1))) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n)) \] - We can similarly break down \(\htmlClass{sdt-0000000094}{\mathcal{x}}(n-2)\) further into the past:
\[\htmlClass{sdt-0000000094}{\mathcal{x}}(n) = \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} (\htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} ( \htmlClass{sdt-0000000079}{\sigma} (\htmlClass{sdt-0000000059}{\mathbf{W}} \htmlClass{sdt-0000000094}{\mathcal{x}}(n - 3) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n-2))) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n - 1))) + \htmlClass{sdt-0000000059}{\mathbf{W}}^{in} \htmlClass{sdt-0000000103}{u}(n))\] - This process can be repeated until we obtain the first input in the equation. Thus, we have recursively expressed the RNN update equation as required.