This section formally defines the notions and terms relating to group testing. • The
input vector, \mathbf{x} = (x_1, x_2, \dots, x_n), is defined to be a binary vector of length n (that is, \mathbf{x} \in \{0,1\}^n), with the
j-th item being called
defective if and only if x_j= 1. Further, any non-defective item is called a 'good' item. \mathbf{x} is intended to describe the (unknown) set of defective items. The key property of \mathbf{x} is that it is an
implicit input. That is to say, there is no direct knowledge of what the entries of \mathbf{x} are, other than that which can be inferred via some series of 'tests'. This leads on to the next definition. • Let \mathbf{x} be an input vector. A set, S \subseteq \{ 1, 2, \dots, n \} is called a
test. When testing is
noiseless, the result of a test is
positive when there exists j \in S such that x_j = 1, and the result is
negative otherwise. Therefore, the goal of group testing is to come up with a method for choosing a 'short' series of tests that allow \mathbf{x} to be determined, either exactly or with a high degree of certainty. • A group-testing algorithm is said to make an
error if it incorrectly labels an item (that is, labels any defective item as non-defective or vice versa). This is
not the same thing as the result of a group test being incorrect. An algorithm is called
zero-error if the probability that it makes an error is zero. • t(d, n) denotes the minimum number of tests required to always find d defectives among n items with zero probability of error by any group-testing algorithm. For the same quantity but with the restriction that the algorithm is non-adaptive, the notation \bar{t}(d, n) is used. \mathbf{x} is defined as the number of 1's in \mathbf{x}. Hence, |\mathbf{x}| \leq d where |\mathbf{x}| is the
Hamming weight. The vector \mathbf{x} is an implicit input since we do not know the positions of the 1's. The only way to find out is to run the tests.-->
General bounds Since it is always possible to resort to individual testing by setting S_j = \{j\} for each 1 \leq j \leq n, it must be that that \bar{t}(d, n) \leq n. Also, since any non-adaptive testing procedure can be written as an adaptive algorithm by simply performing all the tests without regard to their outcome, t(d, n) \leq \bar{t}(d, n). Finally, when 0 \neq d \neq n, there is at least one item whose defectiveness must be determined (by at least one test), and so 1 \leq t(d, n). In summary (when assuming 0 \neq d \neq n), 1 \leq t(d,n) \leq \bar{t}(d, n) \leq n .{{efn|In fact it is possible to do much better. For example, Li's s-stage algorithm gives an explicit construction were t \leq \frac{e}{\log_2 e} d \log_2{(n/d)}.}}
Information lower bound A lower bound on the number of tests needed can be described using the notion of
sample space, denoted \mathcal{S}, which is simply the set of possible placements of defectives. For any group testing problem with
sample space \mathcal{S} and any group-testing algorithm, it can be shown that t \geq \lceil \log_2 \rceil, where t is the minimum number of tests required to identify all defectives with a zero probability of error. This is called the
information lower bound.
Representation of non-adaptive algorithms Algorithms for non-adaptive group testing consist of two distinct phases. First, it is decided how many tests to perform and which items to include in each test. In the second phase, often called the decoding step, the results of each group test are analysed to determine which items are likely to be defective. The first phase is usually encoded in a matrix as follows. • Suppose a non-adaptive group testing procedure for n items consists of the tests S_1, S_2, \dots, S_t for some t \in \mathbb{N}_{\geq 0}. The
testing matrix for this scheme is the t \times n binary matrix, M, where (M)_{ij} = 1 if and only if j \in S_i (and is zero otherwise). Thus each column of M represents an item and each row represents a test, with a 1 in the (i,j)\textrm{-th} entry indicating that the i\textrm{-th} test included the j\textrm{-th} item and a 0 indicating otherwise. As well as the vector \mathbf{x} (of length n) that describes the unknown defective set, it is common to introduce the result vector, which describes the results of each test. • Let t be the number of tests performed by a non-adaptive algorithm. The
result vector, \mathbf{y} = (y_1, y_2, \dots, y_t), is a binary vector of length t (that is, \mathbf{y} \in \{0,1\}^t) such that y_i= 1
if and only if the result of the i\textrm{-th} test was positive (i.e. contained at least one defective).{{efn|Alternatively \mathbf {y} can be defined by the equation \mathbf{y} := M\mathbf {x}, where multiplication is
logical AND (\wedge) and addition is
logical OR (\vee). Here, \mathbf {y} will have a 1 in position i if and only if (M)_{i,j} and \mathbf {x} _{j} are both 1 for any j. That is, if and only if at least one defective item was included in the i\textrm{-th} test.}} With these definitions, the non-adaptive problem can be reframed as follows: first a testing matrix is chosen, M, after which the vector \mathbf{y} is returned. Then the problem is to analyse \mathbf{y} to find some estimate for \mathbf{x}. In the simplest noisy case, where there is a constant probability, q, that a group test will have an erroneous result, one considers a random binary vector, \mathbf{v}, where each entry has a probability q of being 1, and is 0 otherwise. The vector that is returned is then \hat{\mathbf{y}} = \mathbf{y} + \mathbf{v}, with the usual addition on (\mathbb{Z}/2\mathbb{Z})^n (equivalently this is the element-wise
XOR operation). A noisy algorithm must estimate \mathbf{x} using \hat{\mathbf{y}} (that is, without direct knowledge of \mathbf{y}).
Bounds for non-adaptive algorithms The matrix representation makes it possible to prove some bounds on non-adaptive group testing. The approach mirrors that of many deterministic designs, where d-separable matrices are considered, as defined below. • A binary matrix, M, is called
d-separable if every Boolean sum (logical OR) of any d of its columns is distinct. Additionally, the notation
\bar{d}-separable indicates that every sum of any of
up to d of M's columns is distinct. (This is not the same as M being k-separable for every k \leq d.) When M is a testing matrix, the property of being d-separable (\bar{d}-separable) is equivalent to being able to distinguish between (up to) d defectives. However, it does not guarantee that this will be straightforward. A stronger property, called
disjunctness does. • A binary matrix, M is called
d-disjunct if the Boolean sum of any d columns does not contain any other column. (In this context, a column A is said to contain a column B if for every index where B has a 1, A also has a 1.) A useful property of d-disjunct testing matrices is that, with up to d defectives, every non-defective item will appear in at least one test whose outcome is negative. This means there is a simple procedure for finding the defectives: just remove every item that appears in a negative test. Using the properties of d-separable and d-disjunct matrices the following can be shown for the problem of identifying d defectives among n total items. • The number of tests needed for an asymptotically small
average probability of error scales as O(d\log_2 n). • The number of tests needed for an asymptotically small
maximum probability of error scales as O(d^2 \log_2 n). • The number of tests needed for a
zero probability of error scales as O \left(\frac{d^2 \log_2 n}{\log_2 d} \right). == Generalised binary-splitting algorithm ==