TC0 can be divided further, into a hierarchy of languages requiring up to 1 layer, 2 layers, etc. Let \mathsf{TC}^0_d be the class of languages decidable by a threshold circuit family of up to depth d:\mathsf{TC}^0_1 \subset \mathsf{TC}^0_2 \subset \cdots \subset \mathsf{TC}^0 = \bigcup_{d=1}^\infty \mathsf{TC}^0_d The hierarchy can be even more finely divided.
MAJ vs threshold The MAJ gate is sometimes called an
unweighted threshold gate. They are equivalent up to a uniform polynomial overhead. In detail: • A MAJ gate is a threshold gate where all the weights are 1, and threshold is 1/2 the fan-in. • A polynomial-size circuit containing threshold gates with polynomial integer weights and threshold can be converted to a polynomial-size circuit with the same depth. Specifically, the weights can be simulated by replicating the input circuits, and the threshold can be simulated by replicating constant True/False inputs. Furthermore, there is an explicit algorithm, by which, given a single n-input threshold gate with arbitrary (unbounded) integer weights and thresholds, it constructs a depth-2 circuit using \mathsf{poly}(n)-many AND, OR, NOT, and MAJ gates. Thus, any polynomial-size, depth-d threshold circuit can be simulated uniformly by a polynomial-size majority circuit of depth d+1. As a separation theorem, it is known that the n -input Boolean inner product function (IP), defined below, is computable by a majority circuit with 3 layers and O(n) gates, but is not computable by a threshold circuit with 2 layers and \mathsf{poly}(n) gates.\frac 12 n \log n - 2n + o(n) \leq \log_2 W(n) \leq \frac 12 n \log n - n + o(n)See for a literature review. Sometimes the class of polynomial-bounded weights and thresholds with depth d is denoted as \widehat{\mathsf{LT}}_d := \mathsf{TC}_d^0, and \mathsf{LT}_d denotes the class where the weight and thresholds are unbounded ("large weight threshold circuit"). This formalizes neural networks with real-valued activation functions. As previously stated, any polynomial-size, depth-d threshold circuit can be simulated uniformly by a polynomial-size majority circuit of depth d+1. Therefore, \mathsf{TC}_d^0 \subset \mathsf{LT}_d \subset \mathsf{TC}_{d+1}^0. It has been proven that \mathsf{TC}_2^0 \subsetneq \mathsf{LT}_2.
Probabilistic version Like how the P class has a probabilistic version
BPP, the \mathsf{TC}^0 has a probabilistic version \mathsf{RTC}^0. It is defined as the class of languages that can be polynomial-probabilistically decided. Sometimes, this class is also called \mathsf{BPTC}^0, in a closer analogy with BPP. In this definition, the probability that C_n(x, y) = +1 is at least \frac 23, and if x \not\in L, then the probability that C_n(x, y) = +1 is at most \frac 13. By the standard trick of sampling many times then taking the majority opinion, any d-layer \mathsf{RTC}^0 circuit can be converted to a (d+1)-layer \mathsf{BPTC}^0 circuit.
Hierarchy Analogous to how \mathsf{TC}^0_1 \subset \mathsf{TC}^0_2 \subset \cdots \subset \mathsf{TC}^0 = \bigcup_{d=1}^\infty \mathsf{TC}^0_d , \mathsf{RTC}^0 can also be divided into\mathsf{RTC}^0_1 \subset \mathsf{RTC}^0_2 \subset \cdots \subset \mathsf{RTC}^0 = \bigcup_{d=1}^\infty \mathsf{RTC}^0_dBy definition, \mathsf{TC}^0_d \subset \mathsf{RTC}^0_d. Furthermore, since \mathsf{RTC}^0_d \subset \mathsf{TC}^0_{d+1} , Let \oplus be defined as the
parity function, or the
XOR function. Then the following two separations are theorems: • If the weights of the bottom gates of a threshold circuit of depth 2 computing \mathrm{IP}_n do not exceed 2^{n / 3}, then the top gate must have fanin at least 2^{\Omega(n)}. It is an open question how many levels the hierarchy has. It is also an open question whether the hierarchy collapses, that is, \mathsf{TC}^0 = \mathsf{TC}^0_{3}. If the polynomial bound on the number of gates is relaxed, then \mathsf{TC}^0_3 is quite powerful. Specifically, any language in \mathsf{ACC}^0 can be decided by a circuit family in \mathsf{TC}^0_3 (using Majority gates), except that it uses a
quasi-polynomial number of gates (instead of polynomial). This result is optimal, in that there exists a function that is computable with 3 layers of \mathsf{AC}^0 , but requires at least an exponential number of gates for \mathsf{TC}^0_2 (using Majority gates). ==References==