The partial area under the ROC curve (pAUC) is a metric for the performance of a binary classifier.
Basic concept
In the ROC space, where x=FPR (false positive rate) and y=ROC(x)=TPR (true positive rate), it is AUC=\int_{x=0}^{1} ROC(x) \ dx The AUC is widely used, especially for comparing the performances of two (or more) binary classifiers: the classifier that achieves the highest AUC is deemed better. However, when comparing two classifiers C_a and C_b, three situations are possible: • the ROC curve of C_a is never above the ROC curve of C_b • the ROC curve of C_a is never below the ROC curve of C_b • the classifiers’ ROC curves cross each other. There is general consensus that in case 1 classifier C_b is preferable and in case 2) classifier C_a is preferable. Instead, in case 3) there are regions of the ROC space where C_a is preferable and other regions where C_b is preferable. This observation led to evaluating the accuracy of classifications by computing performance metrics that consider only a specific region of interest (RoI) in the ROC space, rather than the whole space. These performance metrics are commonly known as “partial AUC” (pAUC): the pAUC is the area of the selected region of the ROC space that lies under the ROC curve. == Partial AUC obtained by constraining FPR ==
Partial AUC obtained by constraining FPR
The idea of the partial AUC was originally proposed with the goal of restricting the evaluation of given ROC curves in the range of false positive rates that are considered interesting for diagnostic purposes. Thus, the partial AUC was computed as the area under the ROC curve in the vertical band of the ROC space where FPR is in the range [FPR_{low}, FPR_{high}]. The pAUC computed by constraining FPR helps compare two partial areas. Nonetheless, it has a few limitations: • the RoI must be a vertical band of the ROC space; • no criteria are given for identifying the RoI: it is expected that some expert is able to identify FPR_{low} and FPR_{high}; • when comparing two classifiers via the associated ROC curves, a relatively small change in selecting the RoI may lead to different conclusions: in the example above, considering the band where 0.1 \leq FPR \leq 0.3 leads to conclude that C_b is better, while considering the band where 0.2 \leq FPR \leq 0.4 leads to conclude that C_a is better. == Partial AUC obtained by constraining TPR ==
Partial AUC obtained by constraining TPR
Another type of partial AUC is obtained by constraining the true positive rate, rather than the false positive rate. That is, the partial AUC is the area under the ROC curve and above the horizontal line TPR=TPR_{0}. In other words, the pAUC is computed in the portion of the ROC space where the true positive rate is greater than a given threshold TPR_{0} (no upper limit is used, since it would not make sense to limit the number of true positives). This proposal too has a few limitations: • by limiting the true positive rate, a limit on the false positive rate is also implicitly set; • no criteria are given for identifying the RoI: it is expected that experts can identify the minimum acceptable true positive rate; • when comparing two classifiers via the associated ROC curves, a relatively small change in selecting the RoI may lead to different conclusions: this happens when TPR_{0} is close to the point where the given ROC curves cross each other. == Partial AUC obtained by constraining both FPR and TPR ==
Partial AUC obtained by constraining both FPR and TPR
A “two-way” pAUC was defined by constraining both the true positive and false negative rates. A minimum value TPR_0 is specified for TPR and a maximum value FPR_0 is set for FPR, thus the RoI is the upper-left rectangle with vertices in points (FPR_0, TPR_0), (FPR_0, 1), (0, 1) and (0, TPR_0). The two-way pAUC is the area under the ROC curve that belongs to such rectangle. The two-way pAUC is clearly more flexible than the pAUC defined by constraining only FPR or TPR. Actually, the latter two types of pAUC can be seen as special cases of the two-way pAUC. As with the pAUC described above, when comparing two classifiers via the associated ROC curves, a relatively small change in selecting the RoI may lead to different conclusions. This is a particularly delicate issue, since no criteria are given for identifying the RoI (as with the other mentioned pAUC, it is expected that experts can identify TPR_{0} and FPR_0). == Partial AUC obtained by applying objective constraints to the region of interest ==
Partial AUC obtained by applying objective constraints to the region of interest
A few objective and sound criteria for defining the RoI were defined. Specifically, the computation of pAUC can be restricted to the region where • the considered classifiers are better (according to some performance metric of choice) than the random classification; • the considered classifiers achieve at least a minimum value of some performance metrics of choice; • the cost due to misclassifications by the considered classifiers is acceptable. Defining the RoI based on the performance of the random classification A possible way of defining the region where pAUC is computed consists of excluding the regions of the ROC space that represent performances worse than the performance achieved by the random classification. Random classification evaluates a given item positive with probability \rho and negative with probability (1-\rho). In a dataset of n items, of which AP are actually positive, the best guess is obtained by setting \rho=\frac{AP}{n} (\rho is also known as the “prevalence” of the positives in the dataset). It was shown that random classification with \rho=\frac{AP}{n} achieves TPR=\rho, precision=\rho, and FPR=\rho, on average. (also known as the Matthews Correlation Coefficient). Phi measures how better (or worse) is a classification, with respect to the random classification, which is characterized by Phi = 0. According to the reference values suggested by Cohen, is defined as NC=\frac{C}{n(c_{FN}+c_{FP})}. By setting \lambda=\frac{c_{FN}}{c_{FP}+c_{FN}}, we get NC= \lambda \rho (1-TPR)+(1-\lambda)(1-\rho) FPR The average NC obtained via random classification is NC_{rnd}=\frac{AP \cdot AN}{n^2} for delimiting the RoI • \lambda=1 equates to using TPR for delimiting the RoI Therefore, choosing a performance metric equates to choosing a specific value of the relative cost of false positives with respect to false negatives. In the ROC space, the slope of the line that represents constant normalized cost (hence, constant total cost) depends on \lambda, or, equivalently, on the performance metrics being used. It is common practice to select as the best classification the point of the ROC curve with the highest value of Youden's J =TPR−FPR. When considering the cost associated with the misclassifications, this practice corresponds to making a hypothesis on the relative cost of false positives and false negatives, which is rarely correct. == How to compute pAUC and RRA ==
How to compute pAUC and RRA
Software libraries to compute pAUC and RRA are available for
Python and
R. == References ==