Because of the simple nature of Hebbian learning, based only on the coincidence of pre- and post-synaptic activity, it may not be intuitively clear why this form of plasticity leads to meaningful learning. However, it can be shown that Hebbian plasticity does pick up the statistical properties of the input in a way that can be categorized as unsupervised learning. This can be mathematically shown in a simplified example. Let us work under the simplifying assumption of a single rate-based neuron of rate y(t), whose inputs have rates x_1(t) ... x_N(t). The response of the neuron y(t) is usually described as a linear combination of its input, \sum_i w_ix_i, followed by a
response function f: :y = f\left(\sum_{i=1}^N w_i x_i \right). As defined in the previous sections, Hebbian plasticity describes the evolution in time of the synaptic weight w: :\frac{dw_i}{dt} = \eta x_i y. Assuming, for simplicity, an identity response function f(a)=a, we can write :\frac{dw_i}{dt} = \eta x_i \sum_{j=1}^N w_j x_j or in
matrix form: :\frac{d\mathbf{w}}{dt} = \eta \mathbf{x}\mathbf{x}^T\mathbf{w}. As in the previous chapter, if training by epoch is done an average \langle \dots \rangle over discrete or continuous (time) training set of \mathbf{x} can be done:\frac{d\mathbf{w}}{dt} = \langle \eta \mathbf{x}\mathbf{x}^T\mathbf{w} \rangle = \eta \langle \mathbf{x}\mathbf{x}^T\rangle\mathbf{w} = \eta C \mathbf{w}.where C = \langle\, \mathbf{x}\mathbf{x}^T \rangle is the
correlation matrix of the input under the additional assumption that \langle\mathbf{x}\rangle = 0 (i.e. the average of the inputs is zero). This is a system of N coupled linear differential equations. Since C is
symmetric, it is also
diagonalizable, and the solution can be found, by working in its eigenvectors basis, to be of the form :\mathbf{w}(t) = k_1e^{\eta\alpha_1 t}\mathbf{c}_1 + k_2e^{\eta\alpha_2 t}\mathbf{c}_2 + ... + k_Ne^{\eta\alpha_N t}\mathbf{c}_N where k_i are arbitrary constants, \mathbf{c}_i are the eigenvectors of C and \alpha_i their corresponding eigen values. Since a correlation matrix is always a
positive-definite matrix, the eigenvalues are all positive, and one can easily see how the above solution is always exponentially divergent in time. This is an intrinsic problem due to this version of Hebb's rule being unstable, as in any network with a dominant signal the synaptic weights will increase or decrease exponentially. Intuitively, this is because whenever the presynaptic neuron excites the postsynaptic neuron, the weight between them is reinforced, causing an even stronger excitation in the future, and so forth, in a self-reinforcing way. One may think a solution is to limit the firing rate of the postsynaptic neuron by adding a non-linear, saturating response function f, but in fact, it can be shown that for
any neuron model, Hebb's rule is unstable. Therefore, network models of neurons usually employ other learning theories such as
BCM theory,
Oja's rule, or the
generalized Hebbian algorithm. Regardless, even for the unstable solution above, one can see that, when sufficient time has passed, one of the terms dominates over the others, and :\mathbf{w}(t) \approx e^{\eta\alpha^* t}\mathbf{c}^* where \alpha^* is the
largest eigenvalue of C. At this time, the postsynaptic neuron performs the following operation: :y \approx e^{\eta\alpha^* t}\mathbf{c}^* \mathbf{x} Because, again, \mathbf{c}^* is the eigenvector corresponding to the largest eigenvalue of the correlation matrix between the x_is, this corresponds exactly to computing the first
principal component of the input. This mechanism can be extended to performing a full PCA (
principal component analysis) of the input by adding further postsynaptic neurons, provided the postsynaptic neurons are prevented from all picking up the same principal component, for example by adding
lateral inhibition in the postsynaptic layer. We have thus connected Hebbian learning to PCA, which is an elementary form of unsupervised learning, in the sense that the network can pick up useful statistical aspects of the input, and "describe" them in a distilled way in its output. ==Hebbian learning and mirror neurons==