Preliminary definitions Let R be a
finite set. A
predicate on R is a boolean function that takes in a subset of R and outputs either 0 or 1. In particular, a perceptron unit is a predicate. A predicate \psi has
support S \subset R, iff any X \subset S, we have \psi(X) = \psi(X \cap S). In words, it means that if we know how \psi works on subsets of S, then we know how it works on subsets of all of R. A predicate can have many different supports. The
support size of a predicate \psi is the minimal number of elements necessary in its support. For example, the constant-0 and constant-1 functions both are supported on the
empty set, thus they both have support size 0. A
perceptron (the kind studied by Minsky and Papert) over R is a function of form\theta\left(\sum_i a_i \psi_i\right)where \psi_i are predicates, and a_i are real numbers. If \Phi is a set of predicates, then L(\Phi) is the set of all perceptrons using just predicates in \Phi. The
order of a perceptron \theta\left(\sum_i a_i \psi_i\right) is the maximal support size of its component predicates \{\psi_i\}_i. The
order of a boolean function on R is the minimal order possible for a perceptron implementing the boolean function. A boolean function is
conjunctively local iff its order does not increase to infinity as |R| increases to infinity. The
mask of A \subset R is the predicate 1_A defined by1_A(X) = \begin{cases} 1 & \text{ if }A \subset X,\\ 0 & \text{ else.} \end{cases}
Main theorems {{Math proof|title=Proof|proof= Let the perceptron be \theta\left(\sum_i a_i \psi_i\right), where each \psi_i is of support size at most k. We convert it into a linear sum of masks, each having size at most k. Let \psi_i be supported on set A. Write it in disjunctive normal form, with one clause for each subset of A on which \psi_i returns 1, and for each subset, write one positive literal for each element in the subset, and one negative literal otherwise. For example, suppose \psi_i is supported on \{1,2\}, and is 1 on all odd-sized subsets, then we can write it as(x_1 \land \neg x_2) \lor (\neg x_1 \land x_2) Now, convert this formula to a Boolean algebra formula, then expand, yielding a linear sum of masks. For example, the above formula is converted tox_1(1-x_2) + (1-x_1)x_2 = x_1 + x_2 - 2x_1x_2 Repeat this for each predicate used in the perceptron, and sum them up, we obtain an equivalent perceptron using just masks. }} Let S_R be the
permutation group on the elements of R, and G be a subgroup of S_R. We say that a predicate \psi is G -invariant iff \psi \circ g = \psi for any g \in G. That is, any X\subset R, we have \psi(X) = \psi(g(X)). For example, the
parity function is S_R -invariant, since any permutation of the set preserves the size, and thus parity, of any of its subsets. {{Math proof|title=Proof|proof= The proof idea is to take the average over all elements of G. Enumerate the predicates in \Phi as \psi_1, \psi_2, ..., and write g(j) for the index of the predicate such that \psi_{g(j)} = \psi_j \circ g, for any g\in G. That is, we have defined a group action on the set \Phi. Define a_j := \sum_{g\in G}b_{g^{-1}(j)}. We claim this is the desired perceptron. Since \psi \in L(\Phi), there exists some real numbers b_j such that\theta\left(\sum_j b_j \psi_j\right) = \psi By definition of G -invariance, if \psi(A) = 1, then \psi(g(A)) = 1 for all g\in G. That is,\sum_j b_j (\psi_j\circ g)(A) > 0; \quad g \in Gand so, taking the average over all elements in G, we have0 Similarly for the case where \psi(A) = 0. }} {{Math proof|title=Proof|proof= Let \psi_{parity} be the parity function, and \Phi be the set of all masks of size \leq |R| -1. Clearly both \psi_{parity} and \Phi are invariant under all permutations. Suppose \psi_{parity} has order \leq |R|-1, then by the positive normal form theorem, \psi_{parity} \in L(\Phi). By the group invariance theorem, there exists a perceptron\theta\left(\sum_i a_i \psi_i\right) = \psi_{parity}such that a_i depends only on the S_R equivalence class of the mask \psi_i, and thus, only depends on the size of the mask \psi_i. That is, there exists real numbers b_0, b_1, ..., b_. Now we can explicitly calculate the perceptron on any subset X \subset R. Since X contains \binom{k} subsets of size k, we plug in the perceptron’s formula and calculate:\psi_{parity}(X) = \theta\left(\sum_{k=0}^{k} \right) Now, define the polynomial functionp(x) := \sum_{k=0}^{|R|-1} b_k \binom{x}{k}where \binom{x}{k} = \frac{x(x-1) \cdots(x-k+1)}{k!}. It has at most degree |R|-1. then since \theta(p(|X|)) = \psi_{parity}(X), for each |X| = 0, 1, 2, ..., |R|, we havep(0) - \epsilon > 0, \quad p(1) - \epsilon 0, \quad \cdotsfor a small positive \epsilon. Thus, the degree \leq |R|-1 polynomial p-\epsilon has at least |R| different roots, one on each (0, 1), (1, 2), ..., (|R|-1, |R|), contradiction. }} Proof: omitted.{{Math theorem Let R_n be the rectangle of shape 5n \times (2n+12), then as n\to\infty, the connectedness function on R_n has order growing at least as fast as \Omega(|R_n|^{1/2}). }} Proof sketch: By
reducing the parity function to the connectness function, using circuit gadgets. It is in a similar style as the one showing that
Sokoban is NP-hard. == Reception and legacy ==