Your History

Menu

Input neuron of an LSTM

Prerequisites

Activation of a neuron | \(x^\kappa_i = \sigma(\sum_{j=1}^{L^{k - 1}}\theta_{i j}^{k} x_{i}^{k-1} + \theta_{i 0}^{k})\)

Description

This equation is the first part of the LSTM block. It transforms the external signal, which could be the input from the previous layer. It works exactly the same as a typical layer in a multi-layer perceptron - Activation of a layer.

\[\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n}+1) = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000059}{\mathbf{W}}^u[1;x^u])\]

Symbols Used:

This symbol represents the state of the input neuron to the LSTM.

\( \sigma \)

This symbol represents the activation function. It maps real values to other real values in a non-linear way.

\( \mathbf{W} \)

This symbol represents the matrix containing the weights and biases of a layer in a neural network.

\( n \)

This symbol represents any given whole number, \( n \in \htmlClass{sdt-0000000014}{\mathbb{W}}\).

Derivation

The derivation follows the same steps as the Activation of a layer.

First, recall the activation of a single neuron \[\htmlClass{sdt-0000000030}{x^\kappa_i} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}})\] In this derivation, we denote the state of the input neuron as \( \htmlClass{sdt-0000000028}{u} \), rather than \( \htmlClass{sdt-0000000050}{x^\kappa} \). The state of the input neuron varies in time so we denote the past state as \(\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n})\) and the to-be-calculated state a \(\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n}+1)\).

We can rewrite this in vector form, where \(\htmlClass{sdt-0000000066}{\theta}^u \in \mathbb{R}^{\htmlClass{sdt-0000000044}{L}^{u}}\) is the vector of weights and \(\htmlClass{sdt-0000000066}{\theta}_{0}^{u}\) is the bias.

\[\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n}+1) = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000066}{\theta}^{u} \cdot x^{u} + \htmlClass{sdt-0000000066}{\theta}_{0}^u)\]

The activations of all neurons in this layer are denoted as \(\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n}+1) = (\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n}+1)_1,...,\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n}+1)_{\htmlClass{sdt-0000000044}{L}^{u}})\).

If we stack the weight vectors for each of the neurons together with the biases we end up with the weight matrix - \( \htmlClass{sdt-0000000059}{\mathbf{W}} \)^u.

Now, since matrix multiplication is essentially a series of vector dot products, we can combine these operations into a matrix form.

\[\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n}+1) = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000059}{\mathbf{W}}^{u}[1;x^{u}]) \enspace .\]

The 1 is concatenated to the activation of a previous layer in order to accommodate the bias. Notice, that if the first column of the weight matrix \(\htmlClass{sdt-0000000059}{\mathbf{W}}^{u}\) are the biases, they will always be multiplied with this 1, not with any of the actual activations, which is the whole point of the bias.

Example

See Activation of a layer for an example calculation for a typical, linear layer, as the one described on this page.

References

  1. Jaeger, H. (n.d.). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved April 27, 2024, from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf