This equation calculates the value of the \(\htmlClass{sdt-0000000018}{i}\)-th neuron's activation in the \(\htmlClass{sdt-0000000015}{k}\)-th layer in a multi-layer perceptron. This value is used during the forward pass through the neural network.
\( j \) | This is a secondary symbol for an iterator, a variable that changes value to refer to a series of elements |
\( k \) | This symbol represents any given integer, \( k \in \htmlClass{sdt-0000000122}{\mathbb{Z}}\). |
\( i \) | This is the symbol for an iterator, a variable that changes value to refer to a sequence of elements. |
\( x^\kappa_i \) | This symbol represents the activation of the \(i\)-th neuron in the \(\kappa\)-th layer of a multi-layer perceptron. |
\( L \) | This symbol represents the sizes of the layers in a neural network. |
\( \sigma \) | This symbol represents the activation function. It maps real values to other real values in a non-linear way. |
\( \theta \) | This is the symbol we use for model weights/parameters. |
\( \sum \) | This is the summation symbol in mathematics, it represents the sum of a sequence of numbers. |
An \(\htmlClass{sdt-0000000018}{i}\)-th neuron at the \(\htmlClass{sdt-0000000015}{k}\)-th layer has an associated weight vector - \(\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}}\), and a bias - \(\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}}\).
Let \(L^{\htmlClass{sdt-0000000015}{k} - 1}\) be the size of the previous layer. Then, \((x_1^{\htmlClass{sdt-0000000015}{k} - 1},...,x_{L^{\htmlClass{sdt-0000000015}{k} - 1}}^{\htmlClass{sdt-0000000015}{k} - 1})\) are the activations of the neurons at the previous layers. Each value in the weight vector \(\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}}\) is linked to one of the inputs from the previous layers.
By multiplying these weights with their respective inputs and summing these results we obtain
\[\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} \enspace.\]
Each neuron has an associated bias, which is added to the above result:
\[\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}}\enspace.\]
Finally, this value is passed through an activation function \(\htmlClass{sdt-0000000051}{\sigma}\), giving us the final result.
\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k} - 1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}})\]
Let's define the following values:
\[x^{\htmlClass{sdt-0000000015}{k}-1} = (0.1,0.2,0.3)\]
\[\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}} = (0.5, 1.0, 1.5)\]
\[\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}} = -1\]
Then, we have that the size of the previous layer is \(L^{\htmlClass{sdt-0000000015}{k}-1}=3\). In our calculations, we will use the Rectified Linear Unit as the activation function - \( \htmlClass{sdt-0000000051}{\sigma} \).
\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000011}{j} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}})\]
We replace the activation function.
\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \text{relu}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}})\]
We substitute the bias.
\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \text{relu}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} - 1)\]
Finally, we expand the sum and substitute the values of the inputs and weights.
\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \text{relu}(\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 1}^{\htmlClass{sdt-0000000015}{k}} x_1^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 2}^{\htmlClass{sdt-0000000015}{k}} x_2^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 3}^{\htmlClass{sdt-0000000015}{k}} x_3^{\htmlClass{sdt-0000000015}{k}-1} - 1)\]
\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \text{relu}(0.5 \cdot 0.1 + 1.0 \cdot 0.2 + 1.5 \cdot 0.3 - 1)\]
\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \text{relu}(-0.3)\]
\[\htmlClass{sdt-0000000030}{x^\kappa_i} = max(0,-0.3)\]
\[\htmlClass{sdt-0000000030}{x^\kappa_i} = 0\]