Timeline • Circa 1800,
Legendre (1805) and
Gauss (1795) created the simplest feedforward network which consists of a single weight layer with linear activation functions. It was trained by the
least squares method for minimising
mean squared error, also known as
linear regression. Legendre and Gauss used it for the prediction of planetary movement from training data. • In 1943,
Warren McCulloch and
Walter Pitts proposed the binary
artificial neuron as a logical model of biological neural networks. • In 1958,
Frank Rosenblatt proposed the multilayered
perceptron model, consisting of an input layer, a hidden layer with randomized weights that did not learn, and an output layer with learnable connections. R. D. Joseph (1960) mentions an even earlier perceptron-like device: cited and adopted these ideas, also crediting work by H. D. Block and B. W. Knight. Unfortunately, these early efforts did not lead to a working learning algorithm for hidden units, i.e.,
deep learning. • In 1965,
Alexey Grigorevich Ivakhnenko and Valentin Lapa published
Group Method of Data Handling, the first working
deep learning algorithm, a method to train arbitrarily deep neural networks. It is based on layer by layer training through regression analysis. Superfluous hidden units are pruned using a separate validation set. Since the activation functions of the nodes are Kolmogorov-Gabor polynomials, these were also the first deep networks with multiplicative units or "gates." It was used to train an eight-layer neural net in 1971. • In 1967,
Shun'ichi Amari reported the first multilayered neural network trained by
stochastic gradient descent, which was able to classify non-linearily separable pattern classes. Amari's student Saito conducted the computer experiments, using a five-layered feedforward network with two learning layers. (his 1974 PhD thesis, reprinted in a 1994 book, did not yet describe the algorithm). In 1986,
David E. Rumelhart et al. popularised backpropagation but did not cite the original work.
Linear regression Perceptron If using a threshold, i.e. a linear
activation function, the resulting
linear threshold unit is called a
perceptron. (Often the term is used to denote just one of these units.) Multiple parallel non-linear units are able to
approximate any continuous function from a compact interval of the real numbers into the interval [−1,1] despite the limited computational power of single unit with a linear threshold function. . Numbers in neurons represent their explicit threshold. Numbers annotating arrows represent weight of the inputs. If the threshold of 2 is met then a value of 1 is used for the weight multiplication to the next layer. Not meeting the threshold results in 0 being used. The bottom layer of inputs is not always considered a real neural network layer. Perceptrons can be trained by a simple learning algorithm that is usually called the
delta rule. It calculates the errors between calculated output and sample output data, and uses this to create an adjustment to the weights, thus implementing a form of
gradient descent.
Multilayer perceptron A
multilayer perceptron (
MLP) is a
misnomer for a modern feedforward artificial neural network, consisting of fully connected neurons (hence the synonym sometimes used of
fully connected network (
FCN)), often with a nonlinear kind of activation function, organized in at least three layers, notable for being able to distinguish data that is not
linearly separable. ==Other feedforward networks==