KANs are based on the
Kolmogorov–Arnold representation theorem, which was linked to the
13th Hilbert problem. Given x = (x_1, x_2, \dots, x_n) consisting of
n variables, a multivariate continuous function f(x) can be represented as: : f(x) = f(x_1, \dots, x_n) = \sum_{q=1}^{2n+1} \Phi_q \left( \sum_{p=1}^{n} \varphi_{q,p}(x_p) \right) (1) This formulation contains two nested summations: an outer and an inner sum. The outer sum \sum_{q=1}^{2n+1} aggregates 2n+1 terms, each involving a function \Phi_q : \mathbb{R} \to \mathbb{R}. The inner sum \sum_{p=1}^n computes
n terms for each
q, where each term \varphi_{q,p} : [0,1] \to \mathbb{R} is a continuous function of the single variable x_p. The inner continuous functions \varphi_{q,p} are universal, independent of f, while the outer functions \Phi_q depend on the specific function f being represented. The representation (1) holds for all multivariate functions f as proved in . If f is continuous, then the outer functions \Phi_q are continuous; if f is discontinuous, then the corresponding \Phi_q are generally discontinuous, while the inner functions \varphi_{q,p} remain the same universal functions. Liu et al. proposed the name KAN. A general KAN network consisting of
L layers takes
x to generate the output as: :\mathrm{KAN}(x) = (\Phi^{L-1} \circ \Phi^{L-2} \circ \cdots \circ \Phi^{1} \circ \Phi^{0})x (3) Here, \Phi^{l} is the function matrix of the
l-th KAN layer or a set of pre-activations. Let
i denote the neuron of the
l-th layer and
j the neuron of the (
l+1)-th layer. The activation function \varphi^{l}_{j,i} connects (
l,
i) to (
l+1,
j): :\varphi^{l}_{j,i}, \quad l = 0,\dots,L-1, \; i = 1,\dots,n_l, \; j = 1,\dots,n_{l+1} (4) where
nl is the number of nodes of the
l-th layer. Thus, the function matrix \Phi^{l} can be represented as an n_{l+1} \times n_l matrix of activations: : x^{l+1} = \begin{pmatrix} \varphi^{l}_{1,1}(\cdot) & \varphi^{l}_{1,2}(\cdot) & \cdots & \varphi^{l}_{1,n_l}(\cdot) \\ \varphi^{l}_{2,1}(\cdot) & \varphi^{l}_{2,2}(\cdot) & \cdots & \varphi^{l}_{2,n_l}(\cdot) \\ \vdots & \vdots & \ddots & \vdots \\ \varphi^{l}_{n_{l+1},1}(\cdot) & \varphi^{l}_{n_{l+1},2}(\cdot) & \cdots & \varphi^{l}_{n_{l+1},n_l}(\cdot) \end{pmatrix}x^{l} == Implementations ==