The discovery of the FDR was preceded and followed by many other types of error rates. These include: • (
per-comparison error rate) is defined as: \mathrm{PCER} = E \left[ \frac{V}{m} \right] . Testing individually each hypothesis at level guarantees that \mathrm{PCER} \le \alpha (this is testing without any correction for multiplicity) • (the
family-wise error rate) is defined as: \mathrm{FWER} = P(V \ge 1) . There are
numerous procedures that control the FWER. • k\text{-FWER} (The tail probability of the False Discovery Proportion), suggested by Lehmann and Romano, van der Laan at al, is defined as: k\text{-FWER} = P(V \ge k) \le q. • k\text{-FDR} (also called the
generalized FDR by Sarkar in 2007) is defined as: k\text{-FDR} = E \left( \frac{V}{R}I_{(V>k)} \right) \le q. • Q' is the proportion of false discoveries among the discoveries", suggested by Soric in 1989, It is defined as: \mathrm{FDR}_{+1} = p\mathrm{FDR} = E \left[ \frac{V}{R} \, \Big\vert \, R>0 \right] . This error rate cannot be strictly controlled because it is 1 when m = m_0. JD Storey promoted the use of the pFDR (a close relative of the FDR), and the
q-value, which can be viewed as the proportion of false discoveries that we expect in an ordered table of results, up to the current line. Storey also promoted the idea (also mentioned by BH) that the actual number of null hypotheses, m_0, can be estimated from the shape of the
probability distribution curve. For example, in a set of data where all null hypotheses are true, 50% of results will yield probabilities between 0.5 and 1.0 (and the other 50% will yield probabilities between 0.0 and 0.5). We can therefore estimate m_0 by finding the number of results with P > 0.5 and doubling it, and this permits refinement of our calculation of the pFDR at any particular cut-off in the data-set. \mathrm{P} \left( \frac{V}{R} > q \right) • W\text{-FDR} (Weighted FDR). Associated with each hypothesis i is a weight w_i \ge 0, the weights capture importance/price. The W-FDR is defined as: W\text{-FDR} = E\left( \frac{\sum w_i V_i }{\sum w_i R_i } \right). • (False Discovery Cost Rate). Stemming from
statistical process control: associated with each hypothesis i is a cost \mathrm{c}_i and with the intersection hypothesis H_{00} a cost c_0. The motivation is that stopping a production process may incur a fixed cost. It is defined as: :: \mathrm{FDCR} = E\left( c_0 V_0 + \frac{\sum c_i V_i }{c_0 R_0 + \sum c_i R_i } \right) • (per-family error rate) is defined as: \mathrm{PFER} = E(V). • (False non-discovery rates) by Sarkar; Genovese and Wasserman is defined as :: \mathrm{FNR} = E\left( \frac{T}{m - R} \right) = E\left( \frac{m - m_0 - (R - V)}{m - R} \right) • \mathrm{FDR}(z) is defined as: \mathrm{FDR}(z) = \frac{p_0 F_0 (z)}{F(z)} • \mathrm{fdr}, local-fdr is defined as: \mathrm{fdr} = \frac{p_0 f_0 (z)}{f(z)} in a local interval of z.
False coverage rate The
false coverage rate (FCR) is, in a sense, the FDR analog to the
confidence interval. FCR indicates the average rate of false coverage, namely, not covering the true parameters, among the selected intervals. The FCR gives a simultaneous coverage at a 1-\alpha level for all of the parameters considered in the problem. Intervals with simultaneous coverage probability 1 −
q can control the FCR to be bounded by
q. There are many FCR procedures such as: Bonferroni-Selected–Bonferroni-Adjusted, Adjusted BH-Selected CIs (Benjamini and
Yekutieli (2005)),
Bayesian approaches Connections have been made between the FDR and Bayesian approaches (including empirical Bayes methods), thresholding wavelets coefficients and
model selection, and generalizing the
confidence interval into the false coverage statement rate (FCR).
Structural false discovery rate (sFDR) The
structural false discovery rate (sFDR) is a generalization of the classical false discovery rate (FDR) introduced by D. Meskaldji and collaborators in 2018. The sFDR extends the FDR by replacing the linear denominator R in the expected ratio E[
V/
R] with a non-decreasing concave function
s(
R), yielding the criterion E[
V/
s(
R)]. This approach allows the control of false discoveries to adapt to the scale of testing, so that prudence increases faster than linearly as the number of rejections grows. When
s(
R) =
R, the classical FDR is recovered, while specific choices of s(R) can interpolate between FDR control and family-wise error control (
k-FWER). The sFDR provides a structural connection between classical, local, and generalized false discovery concepts, and has been extended to online and adaptive settings.
Empirical false discovery rate (eFDR) Conventional
p-value adjustment methods, like the
Bonferroni and Benjamini-Hochberg, often overcorrect the
p-values when the input datasets are not independent but interconnected, which is often the case in biological data, like functional enrichment analysis of differentially expressed genes. This overcorrection resulted in potentially missing biologically relevant terms with significant enrichment. Empirical false discovery rate address this problem with the so-called “plug-in” estimate of the false discovery rate (Algorithm 18.3 of Hastie et al. ), which is implemented within the
mulea R package. This method is an
empirical, resampling-based approach to calculating the false discovery rate (FDR), which we abbreviate as eFDR.
The description of the eFDR algorithm applied for functional enrichment analysis For each ontology entry (j=1,2,\ldots,J) and the investigated target set (
e.g., significantly differentially expressed genes),
mulea calculates a
p-value (p_j) based on the
hypergeometric test. To assess the unbiased statistical significance of each ontology entry, we compute the empirical false discovery rate (eFDR_j)using a resampling-based approach. First, we determine the rank of each ontology entry's
p-value relative to the
p-values of all ontology entries. R_j refers to the rank of the
p-value of the j^{th} ontology entry. Here, we do note the indicator function with Iverson brackets: I(): R_{j}=\sum_{i=1}^{J} I\!\left(p_{i}\le p_{j}\right),\; j=1,\ldots,J To calculate the expected rank \left(\bar{R}_{j}\right) of the
p-value of the j^{th} ontology entry, a resampling strategy is applied, where resampling steps are indexed with (s=1,2,\ldots,S). In each resampling step, a simulated target set with the same size as the original target set is generated, but with randomly selected elements from the background set. Then we recalculate the
hypergeometric tests and the ranks of the
p-values \left(R_{j}^{s}\right) for each resampling step. Let \bar{R}_{j} be the expectation of the R_{j}^{s} values over s: \bar{R}_{j}=\frac{\displaystyle\sum_{s=1}^{S} R_{j}^{s}}{S} The eFDR of the j^{th} ontology entry (eFDR_j) is calculated as the ratio of the expected rank \left(\bar{R}_{j}\right) to the actual rank \left(R_{j}\right) . If the calculated eFDR_j exceeds 1, it is truncated to 1. eFDR_{j}=\min\!\left(\frac{R_{j}}{\bar{R}_{j}},\,1\right) == Software implementations ==