Activation of the output layer

Description

This equation calculates the final output of the multi-layer perceptron. It works in a similar way as Activation of a layer, but no activation function is applied.

This output denotes the result of the whole forward pass performed by the model \( \htmlClass{sdt-0000000084}{h} \). Formally,

\[\htmlClass{sdt-0000000068}{\mathbf{y}} = \htmlClass{sdt-0000000084}{h}(\htmlClass{sdt-0000000103}{u}).\]

Symbols Used:

\( k \)	This symbol represents any given integer, \( k \in \htmlClass{sdt-0000000122}{\mathbb{Z}}\).
\( \mathbf{W} \)	This symbol represents the matrix containing the weights and biases of a layer in a neural network.
\( \mathbf{y} \)	This symbol represents the output activation vector of a neural network.
\( \mathcal{x} \)	This symbol represents the activations of a neural network layer in vector form.

Derivation

Derivation follows analogous steps as Activation of a layer.

First, recall the activation of a single neuron \[\htmlClass{sdt-0000000030}{x^\kappa_i} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}})\]

We can rewrite this in vector form, where \(\htmlClass{sdt-0000000066}{\theta}^{k} \in \mathbb{R}^{\htmlClass{sdt-0000000044}{L}^k}\) is the vector of weights and \(\htmlClass{sdt-0000000066}{\theta}_{0}^k\) is the bias.

\[\htmlClass{sdt-0000000068}{\mathbf{y}} = \htmlClass{sdt-0000000066}{\theta}^{k} \cdot x^{k - 1} + \htmlClass{sdt-0000000066}{\theta}_{0}^k,\]

where \(k\) is the index of the last layer, i.e. \(x^k=\htmlClass{sdt-0000000068}{\mathbf{y}}\).

The activations of all neurons in the \(\kappa\)-th layer is denoted as \(\htmlClass{sdt-0000000068}{\mathbf{y}} = (\htmlClass{sdt-0000000050}{x^\kappa}_1,...,\htmlClass{sdt-0000000050}{x^\kappa}_{\htmlClass{sdt-0000000044}{L}^{k}})\).

If we stack the weight vectors for each of the neurons together with the biases we end up with the weight matrix - \( \htmlClass{sdt-0000000059}{\mathbf{W}} \).

Now, since matrix multiplication is essentially a series of vector dot products, we can combine these operations into a matrix form.

\[\htmlClass{sdt-0000000068}{\mathbf{y}} = \htmlClass{sdt-0000000059}{\mathbf{W}}^k[1;x^{k - 1}] \enspace .\]

The 1 is concatenated to the activation of a previous layer in order to accommodate the bias. Notice, that if the first column of the weight matrix \(\mathbf{W}\) are the biases, they will always be multiplied with this 1, not with any of the actual activations, which is the whole point of the bias.

Notice that we do not have any activation function in this case.

Example

Assume that the last, \(\k\)-th, layer has 3 inputs, and 2 outputs.

Then we can define the weights for each neuron:

\[ \htmlClass{sdt-0000000066}{\theta}^\kappa_1 = \begin{bmatrix} 0.1 & 0.2 & 0.3\end{bmatrix} \\ \htmlClass{sdt-0000000066}{\theta}^\kappa_2 = \begin{bmatrix} 0.4 & 0.5 & 0.6\end{bmatrix} \]

As well as the biases of each neuron.

\[ \htmlClass{sdt-0000000066}{\theta}^\kappa_{01} = 0.7 \\ \htmlClass{sdt-0000000066}{\theta}^\kappa_{02} = 0.8 \]

Then, we can construct the weight matrix.

\[ \mathbf{W} = \begin{bmatrix} \htmlClass{sdt-0000000066}{\theta}^\kappa_{01} & \htmlClass{sdt-0000000066}{\theta}^\kappa_1 \\ \htmlClass{sdt-0000000066}{\theta}^\kappa_{02} & \htmlClass{sdt-0000000066}{\theta}^\kappa_2 \\ \end{bmatrix} \]

\[ \mathbf{W} = \begin{bmatrix} 0.7 & 0.1 & 0.2 & 0.3 \\ 0.8 & 0.4 & 0.5 & 0.6 \\ \end{bmatrix} \]

If we denote the output of previous layer as

\[x^{\kappa - 1}= \begin{bmatrix} 0.9 \\ 1.0 \\ 1.1\end{bmatrix},\]

then

\[[1;x^{\kappa - 1}] = \begin{bmatrix} 1 \\ 0.9 \\ 1.0 \\ 1.1\end{bmatrix} \]

which finally allows us to calculate the activation on our layer:

\[ \htmlClass{sdt-0000000068}{\mathbf{y}} = \htmlClass{sdt-0000000059}{\mathbf{W}}[1;x^{\kappa - 1}] \]

We subsequently substitute all the variables to obtain the result.

\[ \htmlClass{sdt-0000000068}{\mathbf{y}} = \htmlClass{sdt-0000000059}{\mathbf{W}}[1;x^{\kappa - 1}] \]

\[ \htmlClass{sdt-0000000068}{\mathbf{y}} = \begin{bmatrix} 0.7 & 0.1 & 0.2 & 0.3 \\ 0.8 & 0.4 & 0.5 & 0.6 \\ \end{bmatrix} \begin{bmatrix} 1 \\ 0.9 \\ 1.0 \\ 1.1\end{bmatrix} \]

\[ \htmlClass{sdt-0000000068}{\mathbf{y}} = \begin{bmatrix} 1.32 \\ 2.32 \\ \end{bmatrix} \]

Your History

Activation of the output layer

Prerequisites

Description

\[\htmlClass{sdt-0000000068}{\mathbf{y}}=\htmlClass{sdt-0000000094}{\mathcal{x}}^{\htmlClass{sdt-0000000015}{k}}=\htmlClass{sdt-0000000059}{\mathbf{W}}^{\htmlClass{sdt-0000000015}{k}}\htmlClass{sdt-0000000094}{\mathcal{x}}^{\htmlClass{sdt-0000000015}{k}-1}\]

Symbols Used:

Derivation

Example

References