Arbitrary width The first results concerned the
arbitrary width case. Ken-ichi Funahashi (May 1989) showed that Rumelhart–Hinton–Williams type backpropagation networks possess universal approximation capability with a class of
sigmoidal activation functions, extending the result to multi-output mappings as well. , Maxwell Stinchcombe, and
Halbert White (July 1989) showed that multilayer
feed-forward networks with as few as one hidden layer are universal approximators, provided that the activation function satisfies certain conditions.
George Cybenko (December 1989) independently established a related result for sigmoid activation functions using functional-analytic methods. Hornik also showed in 1991 that it is not the specific choice of the activation function but rather the multilayer feed-forward architecture itself that gives neural networks the potential of being universal approximators. Moshe Leshno
et al in 1993 and later Allan Pinkus in 1999 showed that the universal
approximation property is equivalent to having a nonpolynomial activation function.
Arbitrary depth The
arbitrary depth case was also studied by a number of authors such as Gustaf Gripenberg in 2003, Dmitry Yarotsky, Zhou Lu
et al in 2017, Boris Hanin and Mark Sellke in 2018 who focused on neural networks with ReLU activation function. In 2020, Patrick Kidger and Terry Lyons extended those results to neural networks with
general activation functions such, e.g. tanh or GeLU. One special case of arbitrary depth is that each composition component comes from a finite set of mappings. In 2024, Cai constructed a finite set of mappings, named a vocabulary, such that any continuous function can be approximated by compositing a sequence from the vocabulary. This is similar to the concept of compositionality in linguistics, which is the idea that a finite vocabulary of basic elements can be combined via grammar to express an infinite range of meanings.
Bounded depth and bounded width The bounded depth and bounded width case was first studied by Maiorov and Pinkus in 1999. They showed that there exists an analytic sigmoidal activation function such that two hidden layer neural networks with bounded number of units in hidden layers are universal approximators. In 2018, Guliyev and Ismailov constructed a smooth sigmoidal activation function providing universal approximation property for two hidden layer feedforward neural networks with fewer units in hidden layers. In 2018, they also constructed single hidden layer networks with bounded width that are still universal approximators for univariate functions. However, this does not apply for multivariable functions. In 2022, Shen
et al. obtained precise quantitative information on the depth and width required to approximate a target function by deep and wide ReLU neural networks.
Quantitative bounds The question of minimal possible width for universality was first studied in 2021, Park et al obtained the minimum width required for the universal approximation of
Lp functions using feed-forward neural networks with
ReLU as activation functions. Similar results that can be directly applied to
residual neural networks were also obtained in the same year by Paulo Tabuada and Bahman Gharesifard using
control-theoretic arguments. In 2023, Cai obtained the optimal minimum width bound for the universal approximation. For the arbitrary depth case, Leonie Papon and Anastasis Kratsios derived explicit depth estimates depending on the regularity of the target function and of the activation function.
Kolmogorov network The
Kolmogorov–Arnold representation theorem is similar in spirit. Indeed, certain neural network families can directly apply the Kolmogorov–Arnold theorem to yield a universal approximation theorem.
Robert Hecht-Nielsen showed that a three-layer neural network can approximate any continuous multivariate function. This was extended to the discontinuous case by Vugar Ismailov. In 2024, Ziming Liu and co-authors showed a practical application.
Reservoir computing and quantum reservoir computing In
reservoir computing a sparse
recurrent neural network with fixed weights equipped of fading memory and echo state property is followed by a trainable output layer. Its universality has been demonstrated separately for what concerns networks of rate neurons and spiking neurons, respectively. In 2024, the framework has been generalized and extended to quantum reservoirs where the reservoir is based on qubits defined over Hilbert spaces.
Variants Variants include discontinuous activation functions, certifiable networks, random neural networks, and alternative network architectures and topologies. The universal approximation property of width-bounded networks has been studied as a
dual of classical universal approximation results on depth-bounded networks. For input dimension d_x and output dimension d_y the minimum width required for the universal approximation of the
Lp functions is exactly max\{d_x + 1, d_y\} (for a ReLU network). C(K, R d_y) in general while maintaining width max\{d_x + 1, d_y\}? Theorem 3 shows that an additional activation comes to rescue." --> More generally this also holds if
both ReLU and a
threshold activation function are used. In 2020, a universal approximation theorem result was established by Brüel-Gabrielsson, showing that graph representation with certain injective properties is sufficient for universal function approximation on bounded graphs and restricted universal function approximation on unbounded graphs, with an accompanying \mathcal O(\left|V\right| \cdot \left|E\right|)-runtime method that performed at state of the art on a collection of benchmarks (where V and E are the sets of nodes and edges of the graph respectively). There are also a variety of results between
non-Euclidean spaces and other commonly used architectures and, more generally, algorithmically generated sets of functions, such as the
convolutional neural network (CNN) architecture,
radial basis functions, or neural networks with specific properties. == Arbitrary-width case ==