A Logical Calculus of the Ideas Immanent in Nervous Activity

The artificial neuron used in the original paper is slightly different from the modern version. They considered neural networks that operate in discrete steps of time t = 0, 1, \dots . The neural network contains a number of neurons. Let the state of a neuron i at time t be N_i(t). The state of a neuron can either be 0 or 1, standing for "not firing" and "firing". Each neuron also has a firing threshold \theta, such that it fires if the total input exceeds the threshold. Each neuron can connect to any other neuron (including itself) with positive synapses (excitatory) or negative synapses (inhibitory). That is, each neuron can connect to another neuron with a weight w taking an integer value. A peripheral afferent is a neuron with no incoming synapses. We can regard each neural network as a directed graph, with the nodes being the neurons, and the directed edges being the synapses. A neural network has a circle or a circuit if there exists a directed circle in the graph. Let w_{ij}(t) be the connection weight from neuron j to neuron i at time t , then its next state is N_i(t+1) = H \left( \sum_{j=1}^{n} w_{ij}(t) N_j(t) - \theta_i(t) \right), where H is the Heaviside step function (outputting 1 if the input is greater than or equal to 0, and 0 otherwise). Symbolic logic The paper used, as a logical language for describing neural networks, "Language II" from The Logical Syntax of Language by Rudolf Carnap with some notations taken from Principia Mathematica by Alfred North Whitehead and Bertrand Russell. Language II covers substantial parts of classical mathematics, including real analysis and portions of set theory. To describe a neural network with peripheral afferents N_1, N_2, \dots, N_p and non-peripheral afferents N_{p+1}, N_{p+2}, \dots, N_n they considered logical predicate of formPr(N_1, N_2, \dots, N_p, t) where Pr is a first-order logic predicate function (a function that outputs a boolean), N_1, \dots, N_p are predicates that take t as an argument, and t is the only free variable in the predicate. Intuitively speaking, N_1, \dots, N_p specifies the binary input patterns going into the neural network over all time, and Pr(N_1, N_2, \dots, N_n, t) is a function that takes some binary input patterns, and constructs an output binary pattern Pr(N_1, N_2, \dots, N_n, 0), Pr(N_1, N_2, \dots, N_n, 1), \dots . A logical sentence Pr(N_1, N_2, \dots, N_n, t) is realized by a neural network iff there exists a time-delay T \geq 0 , a neuron i in the network, and an initial state for the non-peripheral neurons N_{p+1}(0), \dots, N_n(0) , such that for any time t , the truth-value of the logical sentence is equal to the state of the neuron i at time t + T . That is,\forall t = 0, 1, 2, \dots, \quad Pr(N_1, N_2, \dots, N_p, t) = N_i(t + T) Equivalence In the paper, they considered some alternative definitions of artificial neural networks, and have shown them to be equivalent, that is, neural networks under one definition realizes precisely the same logical sentences as neural networks under another definition. They considered three forms of inhibition: relative inhibition, absolute inhibition, and extinction. The definition above is relative inhibition. By "absolute inhibition" they meant that if any negative synapse fires, then the neuron will not fire. By "extinction" they meant that if at time t , any inhibitory synapse fires on a neuron i , then \theta_i(t + j) = \theta_i(0) + b_j for j = 1, 2, 3, \dots , until the next time an inhibitory synapse fires on i . It is required that b_j = 0 for all large j . Theorem 4 and 5 state that these are equivalent. They considered three forms of excitation: spatial summation, temporal summation, and facilitation. The definition above is spatial summation (which they pictured as having multiple synapses placed close together, so that the effect of their firing sums up). By "temporal summation" they meant that the total incoming signal is \sum_{\tau = 0}^T\sum_{j=1}^{n} w_{ij}(t) N_j(t - \tau) for some T \geq 1 . By "facilitation" they meant the same as extinction, except that b_j \leq 0 . Theorem 6 states that these are equivalent. They considered neural networks that do not change, and those that change by Hebbian learning. That is, they assume that at t = 0 , some excitatory synaptic connections are not active. If at any t , both N_i(t) = 1, N_j(t) = 1 , then any latent excitatory synapse between i, j becomes active. Theorem 7 states that these are equivalent. Logical expressivity They considered "temporal propositional expressions" (TPE), which are propositional formulas with one free variable t . For example, N_1(t) \vee N_2(t) \wedge \neg N_3(t) is such an expression. Theorem 1 and 2 together showed that neural nets without circles are equivalent to TPE. For neural nets with loops, they noted that "realizable Pr may involve reference to past events of an indefinite degree of remoteness". These then encodes for sentences like "There was some x such that x was a ψ" or (\exists x) (\psi x). Theorems 8 to 10 showed that neural nets with loops can encode all first-order logic with equality and conversely, any looped neural networks is equivalent to a sentence in first-order logic with equality, thus showing that they are equivalent in logical expressiveness. As a remark, they noted that a neural network, if furnished with a tape, scanners, and write-heads, is equivalent to a Turing machine, and conversely, every Turing machine is equivalent to some such neural network. Thus, these neural networks are equivalent to Turing computability and Church's lambda-definability. ==Context==