Activation of a neuron

Description

This equation calculates the value of the \(\htmlClass{sdt-0000000018}{i}\)-th neuron's activation in the \(\htmlClass{sdt-0000000015}{k}\)-th layer in a multi-layer perceptron. This value is used during the forward pass through the neural network.

Symbols Used:

\( j \)	This is a secondary symbol for an iterator, a variable that changes value to refer to a series of elements
\( k \)	This symbol represents any given integer, \( k \in \htmlClass{sdt-0000000122}{\mathbb{Z}}\).
\( i \)	This is the symbol for an iterator, a variable that changes value to refer to a sequence of elements.
\( x^\kappa_i \)	This symbol represents the activation of the \(i\)-th neuron in the \(\kappa\)-th layer of a multi-layer perceptron.
\( L \)	This symbol represents the sizes of the layers in a neural network.
\( \sigma \)	This symbol represents the activation function. It maps real values to other real values in a non-linear way.
\( \theta \)	This is the symbol we use for model weights/parameters.
\( \sum \)	This is the summation symbol in mathematics, it represents the sum of a sequence of numbers.

Derivation

An \(\htmlClass{sdt-0000000018}{i}\)-th neuron at the \(\htmlClass{sdt-0000000015}{k}\)-th layer has an associated weight vector - \(\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}}\), and a bias - \(\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}}\).

Let \(L^{\htmlClass{sdt-0000000015}{k} - 1}\) be the size of the previous layer. Then, \((x_1^{\htmlClass{sdt-0000000015}{k} - 1},...,x_{L^{\htmlClass{sdt-0000000015}{k} - 1}}^{\htmlClass{sdt-0000000015}{k} - 1})\) are the activations of the neurons at the previous layers. Each value in the weight vector \(\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}}\) is linked to one of the inputs from the previous layers.

By multiplying these weights with their respective inputs and summing these results we obtain

\[\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} \enspace.\]

Each neuron has an associated bias, which is added to the above result:

\[\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}}\enspace.\]

Finally, this value is passed through an activation function \(\htmlClass{sdt-0000000051}{\sigma}\), giving us the final result.

\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k} - 1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}})\]

Example

Let's define the following values:

\[x^{\htmlClass{sdt-0000000015}{k}-1} = (0.1,0.2,0.3)\]

\[\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}} = (0.5, 1.0, 1.5)\]

\[\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}} = -1\]

Then, we have that the size of the previous layer is \(L^{\htmlClass{sdt-0000000015}{k}-1}=3\). In our calculations, we will use the Rectified Linear Unit as the activation function - \( \htmlClass{sdt-0000000051}{\sigma} \).

\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \htmlClass{sdt-0000000051}{\sigma}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000011}{j} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}})\]

We replace the activation function.

\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \text{relu}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 0}^{\htmlClass{sdt-0000000015}{k}})\]

We substitute the bias.

\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \text{relu}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000011}{j}=1}^{\htmlClass{sdt-0000000044}{L}^{\htmlClass{sdt-0000000015}{k} - 1}}\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}^{\htmlClass{sdt-0000000015}{k}} x_{\htmlClass{sdt-0000000018}{i}}^{\htmlClass{sdt-0000000015}{k}-1} - 1)\]

Finally, we expand the sum and substitute the values of the inputs and weights.

\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \text{relu}(\htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 1}^{\htmlClass{sdt-0000000015}{k}} x_1^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 2}^{\htmlClass{sdt-0000000015}{k}} x_2^{\htmlClass{sdt-0000000015}{k}-1} + \htmlClass{sdt-0000000066}{\theta}_{\htmlClass{sdt-0000000018}{i} 3}^{\htmlClass{sdt-0000000015}{k}} x_3^{\htmlClass{sdt-0000000015}{k}-1} - 1)\]

\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \text{relu}(0.5 \cdot 0.1 + 1.0 \cdot 0.2 + 1.5 \cdot 0.3 - 1)\]

\[\htmlClass{sdt-0000000030}{x^\kappa_i} = \text{relu}(-0.3)\]

\[\htmlClass{sdt-0000000030}{x^\kappa_i} = max(0,-0.3)\]

\[\htmlClass{sdt-0000000030}{x^\kappa_i} = 0\]

Your History

Activation of a neuron

Description

Symbols Used:

Derivation

Example

References