This equation calculates the weights of a Hopfield Network in a non-iterative way. It yields the exact weights that best fit the data.
\( T \) | This symbol represents the transpose of a matrix or vector. Taking the transpose of a matrix or vector essentially means flipping the matrix or vector over its diagonal. |
\( L \) | This symbol represents the sizes of the layers in a neural network. |
\( \mathbf{W} \) | This symbol represents the matrix containing the weights and biases of a layer in a neural network. |
\( \sum \) | This is the summation symbol in mathematics, it represents the sum of a sequence of numbers. |
\( I \) | This is the symbol for the identity matrix. It behaves such that, for some matrix \(A\), \(AI = A\) and \(IA = A\) |
\( \xi \) | This symbol represents a single training pattern. |
For intuition regarding the weight matrix and their relation to the training patterns, let's consider Weight update of a Hopfield Network:
\[\htmlClass{sdt-0000000059}{\mathbf{W}}(\htmlClass{sdt-0000000117}{n}) = \htmlClass{sdt-0000000059}{\mathbf{W}}(\htmlClass{sdt-0000000117}{n} - 1) + \htmlClass{sdt-0000000106}{\mu} (\htmlClass{sdt-0000000111}{\xi}\htmlClass{sdt-0000000111}{\xi}^{\htmlClass{sdt-0000000022}{T}} - \htmlClass{sdt-0000000089}{I})\]
This time, instead of optimizing the weight matrix iteratively, we calculate it directly. First, remember that \(\htmlClass{sdt-0000000111}{\xi} \in \{1,-1\}^{\htmlClass{sdt-0000000044}{L}}\). The product of the training pattern with its transpose is an \(L\) by \(L\) matrix storing the correspondences between particular neurons. Additionally, we subtract \(NI\) exactly N times (because of the summation) from the weight matrix. This gets rid of the "self-correspondences" between a neuron and itself.
Intuitively, the entries in the weight matrix are an average (notice \(\frac{1}{L}\)) representing the frequency of two neurons having the same sign. If a specific neurons, say \(\htmlClass{sdt-0000000018}{i}\) and \(\htmlClass{sdt-0000000011}{j}\) are often the same, their weight, \(\htmlClass{sdt-0000000059}{\mathbf{W}}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}\) will be large. On the contrary, if half the times these neurons are the same and half the times are different, their weight \(\htmlClass{sdt-0000000059}{\mathbf{W}}_{\htmlClass{sdt-0000000018}{i} \htmlClass{sdt-0000000011}{j}}\) will be zero.
By taking an average of these correspondences for all training patterns, we obtain the weight matrix that represents the statistical correlations between the neurons across the whole dataset.
Let's say we have the following dataset:
\[ \xi_1 = \begin{bmatrix} 1 \\ -1 \end{bmatrix} , \xi_2 = \begin{bmatrix} 1 \\ 1 \end{bmatrix} \]
Then, the size of the pattern is \(\htmlClass{sdt-0000000044}{L} = 2\) and the size of the dataset is \(N=2\).
Now, we can calculate the weight matrix:
\[ \htmlClass{sdt-0000000059}{\mathbf{W}} = \frac{1}{\htmlClass{sdt-0000000044}{L}}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000018}{i}=1,...,N} \htmlClass{sdt-0000000111}{\xi}_{\htmlClass{sdt-0000000018}{i}} \htmlClass{sdt-0000000111}{\xi}_{\htmlClass{sdt-0000000018}{i}}^{T} - NI) \]
\[ \htmlClass{sdt-0000000059}{\mathbf{W}} = \frac{1}{2}(\htmlClass{sdt-0000000080}{\sum}_{\htmlClass{sdt-0000000018}{i}=1,...,2} \htmlClass{sdt-0000000111}{\xi}_{\htmlClass{sdt-0000000018}{i}} \htmlClass{sdt-0000000111}{\xi}_{\htmlClass{sdt-0000000018}{i}}^{T} - 2I) \]
\[ \htmlClass{sdt-0000000059}{\mathbf{W}} = \frac{1}{2}( \xi_1 \xi_1^T + \xi_2 \xi_2^T - 2I ) \]
\[ \htmlClass{sdt-0000000059}{\mathbf{W}} = \frac{1}{2}( \begin{bmatrix} 1 \\ -1 \end{bmatrix} \begin{bmatrix} 1 & -1 \end{bmatrix} + \begin{bmatrix} 1 \\ 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \end{bmatrix} - \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} ) \]
\[ \htmlClass{sdt-0000000059}{\mathbf{W}} = \frac{1}{2}( \begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix} + \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} - \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} ) \]
\[ \htmlClass{sdt-0000000059}{\mathbf{W}} = \frac{1}{2}( \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} - \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} ) \]
\[ \htmlClass{sdt-0000000059}{\mathbf{W}} = \frac{1}{2}( \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} ) \]
\[ \htmlClass{sdt-0000000059}{\mathbf{W}} = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} \]