The methods presented in the previous sections do not allow to obtain a priori fixed inclusion probabilities. Some applications require items' sampling probabilities to be according to weights associated with each item. For example, it might be required to sample queries in a search engine with weight as number of times they were performed so that the sample can be analyzed for overall impact on user experience. Let the weight of item be w_i, and the sum of all weights be . There are two ways to interpret weights assigned to each item in the set: • In each round, the probability of every
unselected item to be selected in that round is proportional to its weight relative to the weights of all unselected items. If is the current sample, then the probability of an item i \notin X to be selected in the current round is \textstyle w_i/(W - \sum_{j \in X}{w_j}). • The probability of each item to be included in the random sample is proportional to its relative weight, i.e., w_i / W. It is not achievable when k=n , since each item is then included with probability 1.
Algorithm A-Res The following algorithm was given by Efraimidis and Spirakis that uses interpretation 1: (* S is a stream of items to sample S.Current returns current item in stream S.Weight returns weight of current item in stream S.Next advances stream to next position The power operator is represented by ^ min-priority-queue supports: Count -> number of items in priority queue Minimum() -> returns minimum key value of all items Extract-Min() -> Remove the item with minimum key Insert(key, Item) -> Adds item with specified key *) ReservoirSample(S[1..?]) H := new min-priority-queue while S has data r := random() ^ (1/S.Weight) // random() produces a uniformly random number in (0,1) if H.Count H.Minimum H.Extract-Min() H.Insert(r, S.Current) end end S.Next end return items in H end This algorithm is identical to the algorithm given in
Reservoir Sampling with Random Sort except for the generation of the items' keys. The algorithm is equivalent to assigning each item a key r^{1/w_i} where is the random number and then selecting the items with the largest keys. Equivalently, a more
numerically stable formulation of this algorithm computes the keys as -\ln(r)/w_i and select the items with the
smallest keys.
Algorithm A-ExpJ The following algorithm is a more efficient version of
A-Res, also given by Efraimidis and Spirakis: and Tillé (2006). (* S has items to sample, R will contain the result S[i].Weight contains weight for each item *) WeightedReservoir-Chao(S[1..n], R[1..k]) WSum := 0 // fill the reservoir array for i := 1 to k R[i] := S[i] WSum := WSum + S[i].Weight end for i := k+1 to n WSum := WSum + S[i].Weight p := S[i].Weight / WSum // probability for this item j := random(); // uniformly random between 0 and 1 if j For each item, its relative weight is calculated and used to randomly decide if the item will be added into the reservoir. If the item is selected, then one of the existing items of the reservoir is uniformly selected and replaced with the new item. The trick here is that, if the probabilities of all items in the reservoir are already proportional to their weights, then by selecting uniformly which item to replace, the probabilities of all items remain proportional to their weight after the replacement. Note that Chao doesn't specify how to sample the first
k elements. He simple assumes we have some other way of picking them in proportion to their weight. Chao: "Assume that we have a sampling plan of fixed size with respect to
S_k at time
A; such that its first-order inclusion probability of
X_t is
π(k; i)".
Algorithm A-Chao with Jumps Similar to the other algorithms, it is possible to compute a random weight j and subtract items' probability mass values, skipping them while j > 0, reducing the number of random numbers that have to be generated. (* S has items to sample, R will contain the result S[i].Weight contains weight for each item *) WeightedReservoir-Chao(S[1..n], R[1..k]) WSum := 0 // fill the reservoir array for i := 1 to k R[i] := S[i] WSum := WSum + S[i].Weight end j := random() // uniformly random between 0 and 1 pNone := 1 // probability that no item has been selected so far (in this jump) for i := k+1 to n WSum := WSum + S[i].Weight p := S[i].Weight / WSum // probability for this item j -= p * pNone pNone := pNone * (1 - p) if j == Applications for Multi-Class Fairness ==