For a given diagnostic test with n_{\text{sensitivity}} diseased subjects and n_{\text{specificity}} healthy subjects, the Youden Index can be equivalently expressed as the difference between the
true positive rate (sensitivity) and the
false positive rate (1 − specificity): :\hat{J} = \hat{p}_{\text{sensitivity}} - \hat{p}_{\text{FPR}} where \hat{p}_{\text{FPR}} = 1 - \hat{p}_{\text{specificity}}. Written this way,
J is a difference of two independent binomial proportions estimated from the diseased and healthy subgroups. Inference for
J (with a fixed, pre-specified threshold) therefore reduces directly to inference for the difference of two proportions, and the standard machinery of the
two-proportion Z-test applies. In particular, the
Wald (1 − α) confidence interval for the difference of two proportions gives: :CI = \hat{J} \pm z_{1-\alpha/2} \sqrt{\frac{\hat{p}_{\text{sensitivity}}(1-\hat{p}_{\text{sensitivity}})}{n_{\text{sensitivity}}} + \frac{\hat{p}_{\text{FPR}}(1-\hat{p}_{\text{FPR}})}{n_{\text{FPR}}}} where z_{1-\alpha/2} is the critical value from the standard normal distribution (e.g., 1.96 for a 95% confidence interval). Since the false positive rate and the specificity are estimated from the same healthy subgroup, n_{\text{FPR}} = n_{\text{specificity}}, and the variance contribution from that subgroup is identical whether parameterized by FPR or specificity: :\frac{\hat{p}_{\text{FPR}}(1-\hat{p}_{\text{FPR}})}{n_{\text{FPR}}} = \frac{(1-\hat{p}_{\text{specificity}})\,\hat{p}_{\text{specificity}}}{n_{\text{specificity}}} So this CI is equivalent to the classical Wald interval expressed in terms of sensitivity and specificity. If the threshold is instead optimized to maximize
J, the variance estimate must account for the additional variability of the threshold selection process. In such cases, the
Delta method or
bootstrapping is required to maintain the nominal coverage probability.
Alternative estimation methods While the Wald interval is widely utilized, it may exhibit poor coverage probabilities or produce bounds outside the logical range of [−1, 1] when sample sizes are small or when proportions are near 0 or 1. Because
J is a difference of two independent proportions, any confidence-interval method developed for the
two-proportion Z-test can be applied directly. More robust methods include: •
Newcombe "square-and-add" method: The
Newcombe method for the difference of proportions—which combines two
Wilson score intervals—typically provides better coverage for small samples. •
Logit transformation: Applying a
logit transformation ensures the confidence interval remains within the logical range of [−1, 1]. This is typically achieved by calculating the interval for the transformed components (sensitivity and the false positive rate) or by shifting the index to a [0, 1] scale before transformation, then back-transforming the resulting bounds. == Other metrics ==