The activation function of a neuron is chosen to have a number of properties which either enhance or simplify the network containing the neuron. Crucially, for instance, any
multilayer perceptron using a linear activation function has an equivalent single-layer network; a
non-linear function is therefore necessary to gain the advantages of a multi-layer network. Below, u refers in all cases to the weighted sum of all the inputs to the neuron, i.e. for n inputs, : u = \sum_{i=1}^n w_i x_i where w is a vector of synaptic weights and x is a vector of inputs.
Step function The output y of this activation function is binary, depending on whether the input meets a specified threshold, \theta (theta). The "signal" is sent, i.e. the output is set to 1, if the activation meets or exceeds the threshold. : y = \begin{cases} 1 & \text{if }u \ge \theta \\ 0 & \text{if }u This function is used in
perceptrons, and appears in many other models. It performs a division of the
space of inputs by a
hyperplane. It is specially useful in the last layer of a network, intended for example to perform binary classification of the inputs.
Linear combination In this case, the output unit is simply the weighted sum of its inputs, plus a bias term. A number of such linear neurons perform a linear transformation of the input vector. This is usually more useful in the early layers of a network. A number of analysis tools exist based on linear models, such as
harmonic analysis, and they can all be used in neural networks with this linear neuron. The bias term allows us to make
affine transformations to the data.
Sigmoid A fairly simple nonlinear function, the
sigmoid function such as the logistic function also has an easily calculated derivative, which can be important when calculating the weight updates in the network. It thus makes the network more easily manipulable mathematically, and was attractive to early computer scientists who needed to minimize the computational load of their simulations. It was previously commonly seen in
multilayer perceptrons. However, recent work has shown sigmoid neurons to be less effective than
rectified linear neurons. The reason is that the gradients computed by the
backpropagation algorithm tend to diminish towards zero as activations propagate through layers of sigmoidal neurons, making it difficult to optimize neural networks using multiple layers of sigmoidal neurons.
Rectifier In the context of
artificial neural networks, the
rectifier or
rectified linear unit is an
activation function defined as the positive part of its argument: : f(x) = x^+ = \max(0, x), where x is the input to a neuron. This is also known as a
ramp function and is analogous to
half-wave rectification in electrical engineering. This
activation function was first introduced to a dynamical network by Hahnloser et al. in a 2000 paper in
Nature with strong
biological motivations and mathematical justifications. It has been demonstrated for the first time in 2011 to enable better training of deeper networks, compared to the widely used activation functions prior to 2011, i.e., the
logistic sigmoid (which is inspired by
probability theory; see
logistic regression) and its more practical counterpart, the
hyperbolic tangent. A commonly used variant of the rectified linear unit activation function is the leaky rectified linear unit which allows a small, positive gradient when the unit is not active: f(x) = \begin{cases} x & \text{if } x > 0, \\ ax & \text{otherwise}. \end{cases} where x is the input to the neuron and a is a small positive constant (set to 0.01 in the original paper). ==Pseudocode algorithm==