Initial problem and solution overview The BitFunnel paper describes the "matching problem", which occurs when an algorithm must identify documents through the usage of keywords. The goal of the problem is to identify a set of matches given a corpus to search and a query of keyword terms to match against. This problem is commonly solved through
inverted indexes, where each searchable item is maintained with a
map of keywords. In contrast, BitFunnel represents each searchable item through a signature. A signature is a sequence of bits which describe a
Bloom filter of the searchable terms in a given searchable item. The bloom filter is constructed through hashing through several bit positions.
Theoretical implementation of bit-string signatures The signature of a document (D) can be described as the logical-or of its term signatures: \overrightarrow{S_D}=\bigcup_{t \in D} \overrightarrow{S_t} Similarly, a query for a document (Q) can be defined as a union: \overrightarrow{S_Q}=\bigcup_{t \in Q} \overrightarrow{S_t} Additionally, a document D is a member of the set
M' when the following condition is satisfied: \overrightarrow{S_Q} \cap \overrightarrow{S_D} = \overrightarrow{S_Q} This knowledge is then combined to produce a formula where
M' is identified by documents which match the query signature: M'= \left\{ D \in C \mid \overrightarrow{S_Q} \cap \overrightarrow{S_D} = \overrightarrow{S_Q}\right\} These steps and their proofs are discussed in the 2017 paper.
Pseudocode for bit-string signatures This algorithm is described in the 2017 paper. \begin{array}{l} M'=\emptyset \\ \texttt{for each}\ D \in C\ \texttt{do} \\ \qquad \texttt{if}\ \overrightarrow{S_D} \cap \overrightarrow{S_Q} = \overrightarrow{S_Q}\ \texttt{then} \\ \qquad \qquad M' = M' \cup \{ D \} \\ \qquad \texttt{end if} \\ \texttt{end for} \end{array} ==References==