Description

This equation represents the input gate of an LSTM. It transforms an external signal, \(x^{\htmlClass{sdt-0000000036}{g^\text{input}}}\), consisting of outputs of other LSTM blocks and other neurons in the neural network similarly to a typical multi-layer perceptron.

\[\htmlClass{sdt-0000000036}{g^\text{input}}(\htmlClass{sdt-0000000117}{n}+1) = \htmlClass{sdt-0000000079}{\sigma}(\htmlClass{sdt-0000000059}{\mathbf{W}}^{\htmlClass{sdt-0000000036}{g^\text{input}}}[1;x^{\htmlClass{sdt-0000000036}{g^\text{input}}}])\]

Symbols Used:

\( g^\text{input} \)	This symbol represents the state of the input gate of the LSTM.
\( \mathbf{W} \)	This symbol represents the matrix containing the weights and biases of a layer in a neural network.
\( \sigma \)	This symbol represents the sigmoid function.
\( n \)	This symbol represents any given whole number, \( n \in \htmlClass{sdt-0000000014}{\mathbb{W}}\).

Derivation

Notice that the equation is analogous to the activation of a single layer \[\htmlClass{sdt-0000000050}{x^\kappa} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000059}{\mathbf{W}}[1;x^{\htmlClass{sdt-0000000015}{k} - 1}])\].

Derivation of this equation follows the same steps as the Activation of a layer, but the activation function is strictly sigmoid. No other activations can be used.

The task of this gate is to calculate a vector of "weights". These weights judge how important each part of the new input is. The larger the weight, the more important. The smaller the weight, the less important it is and should not be stored in the memory cell. Naturally, this imposes the weights to be between 0 and 1, forcing the use of sigmoid as the activation function.

References

Jaeger, H. (n.d.). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved April 27, 2024, from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf

Your History

Input gate of an LSTM

Prerequisites

Description

\[\htmlClass{sdt-0000000036}{g^\text{input}}(\htmlClass{sdt-0000000117}{n}+1) = \htmlClass{sdt-0000000079}{\sigma}(\htmlClass{sdt-0000000059}{\mathbf{W}}^{\htmlClass{sdt-0000000036}{g^\text{input}}}[1;x^{\htmlClass{sdt-0000000036}{g^\text{input}}}])\]

Symbols Used:

Derivation

References