Youden's J statistic

Youden's J statistic (also called Youden's index) is a single statistic that captures the performance of a dichotomous diagnostic test. In meteorology, this statistic is referred to as Peirce Skill Score (PSS), Hanssen–Kuipers Discriminant (HKD), or True Skill Statistic (TSS).

Definition

Youden's J statistic is : J = \text{sensitivity} + \text{specificity} -1=\text{recall}_1 + \text{recall}_0 -1 with the two right-hand quantities being sensitivity and specificity. Thus the expanded formula is: :J = \frac{\mathit{TP}}{\mathit{TP}+\mathit{FN}}+\frac{\mathit{TN}}{\mathit{TN}+\mathit{FP}}-1 = \frac{ \mathit{TP} \times \mathit{TN} - \mathit{FP} \times \mathit{FN} } {( \mathit{TP} + \mathit{FN} ) ( \mathit{TN} + \mathit{FP})} In this equation, TP is the number of true positives, TN the number of true negatives, FP the number of false positives and FN the number of false negatives. The index was suggested by W. J. Youden in 1950 as a way of summarising the performance of a diagnostic test; however, the formula was earlier published in Science by C. S. Peirce in 1884. Its value ranges from -1 through 1 (inclusive), The index is defined for all points of an ROC curve, and the maximum value of the index may be used as a criterion for selecting the optimum cut-off point when a diagnostic test gives a numeric rather than a dichotomous result. The index is represented graphically as the height above the chance line, and it is also equivalent to the area under the curve subtended by a single operating point. Because the ROC curve almost always forms a convex curve, the line of this maximum index value is likely to intersect the ROC curve at the point where the ROC curve is closest to the point in the top left corner (i.e. the point closest to no false positive or false negative results). == Confidence interval ==

Confidence interval

For a given diagnostic test with n_{\text{sensitivity}} diseased subjects and n_{\text{specificity}} healthy subjects, the Youden Index can be equivalently expressed as the difference between the true positive rate (sensitivity) and the false positive rate (1 − specificity): :\hat{J} = \hat{p}_{\text{sensitivity}} - \hat{p}_{\text{FPR}} where \hat{p}_{\text{FPR}} = 1 - \hat{p}_{\text{specificity}}. Written this way, J is a difference of two independent binomial proportions estimated from the diseased and healthy subgroups. Inference for J (with a fixed, pre-specified threshold) therefore reduces directly to inference for the difference of two proportions, and the standard machinery of the two-proportion Z-test applies. In particular, the Wald (1 − α) confidence interval for the difference of two proportions gives: :CI = \hat{J} \pm z_{1-\alpha/2} \sqrt{\frac{\hat{p}_{\text{sensitivity}}(1-\hat{p}_{\text{sensitivity}})}{n_{\text{sensitivity}}} + \frac{\hat{p}_{\text{FPR}}(1-\hat{p}_{\text{FPR}})}{n_{\text{FPR}}}} where z_{1-\alpha/2} is the critical value from the standard normal distribution (e.g., 1.96 for a 95% confidence interval). Since the false positive rate and the specificity are estimated from the same healthy subgroup, n_{\text{FPR}} = n_{\text{specificity}}, and the variance contribution from that subgroup is identical whether parameterized by FPR or specificity: :\frac{\hat{p}_{\text{FPR}}(1-\hat{p}_{\text{FPR}})}{n_{\text{FPR}}} = \frac{(1-\hat{p}_{\text{specificity}})\,\hat{p}_{\text{specificity}}}{n_{\text{specificity}}} So this CI is equivalent to the classical Wald interval expressed in terms of sensitivity and specificity. If the threshold is instead optimized to maximize J, the variance estimate must account for the additional variability of the threshold selection process. In such cases, the Delta method or bootstrapping is required to maintain the nominal coverage probability. Alternative estimation methods While the Wald interval is widely utilized, it may exhibit poor coverage probabilities or produce bounds outside the logical range of [−1, 1] when sample sizes are small or when proportions are near 0 or 1. Because J is a difference of two independent proportions, any confidence-interval method developed for the two-proportion Z-test can be applied directly. More robust methods include: • Newcombe "square-and-add" method: The Newcombe method for the difference of proportions—which combines two Wilson score intervals—typically provides better coverage for small samples. • Logit transformation: Applying a logit transformation ensures the confidence interval remains within the logical range of [−1, 1]. This is typically achieved by calculating the interval for the transformed components (sensitivity and the false positive rate) or by shifting the index to a [0, 1] scale before transformation, then back-transforming the resulting bounds. == Other metrics ==

Other metrics

Youden's index is also known as deltaP'. It allows for several multiclass generalizations, one of which is (Bookmaker) Informedness. It is the probability of an informed decision (as opposed to a random guess) and takes into account all predictions. However, a low Informedness value does not imply that the model is close to a random model, whereas this is the case for the Youden's index in the binary case. A recent multiclass generalization of Youden's J preserves this property. An unrelated but commonly used combination of basic statistics from information retrieval is the F-score, being a (possibly weighted) harmonic mean of recall and precision where recall = sensitivity = true positive rate. But specificity and precision are totally different measures. F-score, like recall and precision, only considers the so-called positive predictions, with recall being the probability of predicting just the positive class, precision being the probability of a positive prediction being correct, and F-score equating these probabilities under the effective assumption that the positive labels and the positive predictions should have the same distribution and prevalence, When the true prevalences for the two positive variables are equal as assumed in Fleiss kappa and F-score, that is the number of positive predictions matches the number of positive classes in the dichotomous (two class) case, the different kappa and correlation measure collapse to identity with Youden's J, and recall, precision and F-score are similarly identical with accuracy. ==References==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com