This equation is the first part of the LSTM block. It transforms the external signal, which could be the input from the previous layer. It works exactly the same as a typical layer in a multi-layer perceptron - Activation of a layer.
\( u \) | This symbol represents the state of the input neuron to the LSTM. |
\( \sigma \) | This symbol represents the activation function. It maps real values to other real values in a non-linear way. |
\( \mathbf{W} \) | This symbol represents the matrix containing the weights and biases of a layer in a neural network. |
\( n \) | This symbol represents any given whole number, \( n \in \htmlClass{sdt-0000000014}{\mathbb{W}}\). |
The derivation follows the same steps as the Activation of a layer.
First, recall the activation of a single neuron \[\htmlClass{sdt-0000000030}{x^\kappa_i} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}})\] In this derivation, we denote the state of the input neuron as \( \htmlClass{sdt-0000000028}{u} \), rather than \( \htmlClass{sdt-0000000050}{x^\kappa} \). The state of the input neuron varies in time so we denote the past state as \(\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n})\) and the to-be-calculated state a \(\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n}+1)\).
We can rewrite this in vector form, where \(\htmlClass{sdt-0000000066}{\theta}^u \in \mathbb{R}^{\htmlClass{sdt-0000000044}{L}^{u}}\) is the vector of weights and \(\htmlClass{sdt-0000000066}{\theta}_{0}^{u}\) is the bias.
\[\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n}+1) = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000066}{\theta}^{u} \cdot x^{u} + \htmlClass{sdt-0000000066}{\theta}_{0}^u)\]
The activations of all neurons in this layer are denoted as \(\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n}+1) = (\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n}+1)_1,...,\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n}+1)_{\htmlClass{sdt-0000000044}{L}^{u}})\).
If we stack the weight vectors for each of the neurons together with the biases we end up with the weight matrix - \( \htmlClass{sdt-0000000059}{\mathbf{W}} \)^u.
Now, since matrix multiplication is essentially a series of vector dot products, we can combine these operations into a matrix form.
\[\htmlClass{sdt-0000000028}{u}(\htmlClass{sdt-0000000117}{n}+1) = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000059}{\mathbf{W}}^{u}[1;x^{u}]) \enspace .\]
The 1 is concatenated to the activation of a previous layer in order to accommodate the bias. Notice, that if the first column of the weight matrix \(\htmlClass{sdt-0000000059}{\mathbf{W}}^{u}\) are the biases, they will always be multiplied with this 1, not with any of the actual activations, which is the whole point of the bias.
See Activation of a layer for an example calculation for a typical, linear layer, as the one described on this page.