Dependency table for selected metrics ("true" means depends, "false" - does not depend): Metrics that do not depend on a given probability are prone to misrepresentation when the probability approaches 0.
Example 1: Rare disease detection test Let us consider a medical test used to detect a rare disease. Suppose a population size of 100000 and 0.05% of the population is infected. Further suppose the following test performance: 95% of all positive individuals are classified correctly (
TPR=0.95) and 95% of all negative individuals are classified correctly (
TNR=0.95). In such a case, due to high population imbalance and in spite of having high test
accuracy (0.95), the probability that an individual who has been classified as positive is in fact positive is very low: :P(+ \mid C{+}) = 0.0095. We can observe how this low probability is reflected in some of the metrics: • \mathrm{P}_4 = 0.0370, • \mathrm{F}_1 = 0.0188, • \mathrm{J} = \mathbf{0.9100} (
Informedness /
Youden index), • \mathrm{MK} = 0.0095 (
Markedness).
Example 2: Image recognition — cats vs dogs Consider the problem of training a neural network based image classifier with only two types of images: those containing dogs (labeled as 0) and those containing cats (labeled as 1). Thus, the goal is to distinguish between the cats and dogs. Suppose that the classifier overpredicts in favour of cats ("positive" samples): 99.99% of cats are classified correctly and only 1% of dogs are classified correctly. Further, suppose that the image dataset consists of 100000 images, 90% of which are pictures of cats and 10% are pictures of dogs. In this situation, the probability that the picture containing dog will be classified correctly is pretty low: :P(C-|-) = 0.01. Not all metrics are notice this low probability: • \mathrm{P}_4 = 0.0388, • \mathrm{F}_1 = \mathbf{0.9478}, • \mathrm{J} = 0.0099 (
Informedness /
Youden index), • \mathrm{MK} = \mathbf{0.8183} (
Markedness). == See also ==