Medicine In the practice of medicine, the differences between the applications of
screening and
testing are considerable.
Medical screening Screening involves relatively cheap tests that are given to large populations, none of whom manifest any clinical indication of disease (e.g.,
Pap smears). Testing involves far more expensive, often invasive, procedures that are given only to those who manifest some clinical indication of disease, and are most often applied to confirm a suspected diagnosis. For example, most states in the US require newborns to be screened for
phenylketonuria and
hypothyroidism, among other
congenital disorders. • Hypothesis: "The newborns have phenylketonuria and hypothyroidism". • Null hypothesis (H0): "The newborns do not have phenylketonuria and hypothyroidism". • Type I error (false positive): The true fact is that the newborns do not have phenylketonuria and hypothyroidism but we consider they have the disorders according to the data. • Type II error (false negative): The true fact is that the newborns have phenylketonuria and hypothyroidism but we consider they do not have the disorders according to the data. Although they display a high rate of false positives, the screening tests are considered valuable because they greatly increase the likelihood of detecting these disorders at a far earlier stage. The simple
blood tests used to screen possible
blood donors for
HIV and
hepatitis have a significant rate of false positives; however, physicians use much more expensive and far more precise tests to determine whether a person is actually infected with either of these viruses. Perhaps the most widely discussed false positives in medical screening come from the breast cancer screening procedure
mammography. The US rate of false positive mammograms is up to 15%, the highest in world. One consequence of the high false positive rate in the US is that, in any 10-year period, half of the American women screened receive a false positive mammogram. False positive mammograms are costly, with over $100 million spent annually in the U.S. on follow-up testing and treatment. They also cause women unneeded anxiety. As a result of the high false positive rate in the US, as many as 90–95% of women who get a positive mammogram do not have the condition. The lowest rate in the world is in the Netherlands, 1%. The lowest rates are generally in Northern Europe where mammography films are read twice and a high threshold for additional testing is set (the high threshold decreases the power of the test). The ideal population screening test would be cheap, easy to administer, and produce zero false negatives, if possible. Such tests usually produce more false positives, which can subsequently be sorted out by more sophisticated (and expensive) testing.
Medical testing False negatives and false positives are significant issues in
medical testing. • Hypothesis: "The patients have the specific disease". • Null hypothesis (H0): "The patients do not have the specific disease". • Type I error (false positive): The true fact is that the patients do not have a specific disease but the physician judges the patient is ill according to the test reports. • Type II error (false negative): The true fact is that the disease is actually present but the test reports provide a falsely reassuring message to patients and physicians that the disease is absent. False positives can also produce serious and counter-intuitive problems when the condition being searched for is rare, as in screening. If a test has a false positive rate of one in ten thousand, but only one in a million samples (or people) is a true positive, most of the positives detected by that test will be false. The probability that an observed positive result is a false positive may be calculated using
Bayes' theorem. False negatives produce serious and counter-intuitive problems, especially when the condition being searched for is common. If a test with a false negative rate of only 10% is used to test a population with a true occurrence rate of 70%, many of the negatives detected by the test will be false. This sometimes leads to inappropriate or inadequate treatment of both the patient and their disease. A common example is relying on cardiac stress tests to detect coronary atherosclerosis, even though
cardiac stress tests are known to only detect limitations of
coronary artery blood flow due to advanced
stenosis.
Biometrics Biometric matching, such as for
fingerprint recognition,
facial recognition or
iris recognition, is susceptible to type I and type II errors. • Hypothesis: "The input does not identify someone in the searched list of people". • Null hypothesis: "The input does identify someone in the searched list of people". • Type I error (false reject rate): The true fact is that the person is someone in the searched list but the system concludes that the person is not according to the data. • Type II error (false match rate): The true fact is that the person is not someone in the searched list but the system concludes that the person is someone whom we are looking for according to the data. The probability of type I errors is called the "false reject rate" (FRR) or false non-match rate (FNMR), while the probability of type II errors is called the "false accept rate" (FAR) or false match rate (FMR). If the system is designed to rarely match suspects then the probability of type II errors can be called the "
false alarm rate". On the other hand, if the system is used for validation (and acceptance is the norm) then the FAR is a measure of system security, while the FRR measures user inconvenience level.
Law In criminal legal proceedings, there is enormous emphasis on ensuring that if any error is committed it will be Type II error (letting an otherwise-culpable criminal defendant go free) rather than type I error (punishing an innocent person for a crime he or she did not commit). This emphasis is the reason for high burdens of proof (guilty beyond a reasonable doubt), careful scrutiny of the prosecution's characterizations of inculpatory evidence or testimony, and skepticism or exclusion as to evidence that may be more prejudicial than probative (the Rule 403 balancing test). Substantial scholarship dating back centuries discusses the grave consequences of miscarriages of justice in criminal proceedings, not only for the defendant, but for the perception of fairness of the entire judicial system and trust from community stakeholders that allegations of criminal activity will be heard seriously but also fairly. English jurist
William Blackstone invented
Blackstone's ratio of 10:1 to describe the concept that a fair system might let ten guilty defendants walk free rather than let more than one innocent person be jailed. In recent years, legal scholarship and mainstream jurisprudence has adopted the Type I and Type II error taxonomy to have a more rigorous vocabulary with which to discuss judicial mistakes and wrongful convictions. The U.S. Supreme Court used the Type I vs. Type II taxonomy in its discussion of errors in
Ballew v. Georgia and judges and law professors increasingly use this dichotomous designation rather than the more casual "wrongly convicted" or "erroneously acquitted," which were terms used often in earlier scholarship. Recent scholarship and judicial concern often centers on the size and unanimity of juries as safeguards against Type I error (incorrectly convicting the innocent defendant). Small juries of less than twelve have been criticized by the U.S. Supreme Court for being more likely to produce Type I errors; meanwhile, some judges have also criticized the empaneling of more than the typical twelve jurors as problematic (Judge Anderson of the Wisconsin Court of Appeals wrote a notable dissent on this topic in 1993: "Absent a legislative pronouncement of public policy permitting a defendant to acquiesce to be tried by a jury greater than twelve, it is plain error to permit more than twelve jurors to deliberate."). As to unanimity, on April 20, 2020, the U.S. Supreme Court ruled the Sixth Amendment requires a unanimous jury verdict to convict a defendant of a serious offense, citing concern around Type I error as among the primary reasons for requiring unanimous verdicts in serious criminal matters.
Security screening False positives are routinely found every day in
airport security screening, which are ultimately
visual inspection systems. The installed security alarms are intended to prevent weapons being brought onto aircraft; yet they are often set to such high sensitivity that they alarm many times a day for minor items, such as keys, belt buckles, loose change, mobile phones, and tacks in shoes. • Hypothesis: "The item is a weapon". • Null hypothesis: "The item is not a weapon". • Type I error (false positive): The true fact is that the item is not a weapon but the system still sounds an alarm. • Type II error (false negative) The true fact is that the item is a weapon but the system keeps silent at this time. The ratio of false positives (identifying an innocent traveler as a terrorist) to true positives (detecting a would-be terrorist) is, therefore, very high; and because almost every alarm is a false positive, the
positive predictive value of these screening tests is very low. The relative cost of false results determines the likelihood that test creators allow these events to occur. As the cost of a false negative in this scenario is extremely high (not detecting a bomb being brought onto a plane could result in hundreds of deaths) whilst the cost of a false positive is relatively low (a reasonably simple further inspection) the most appropriate test is one with a low statistical specificity but high statistical sensitivity (one that allows a high rate of false positives in return for minimal false negatives).
Computers The notions of false positives and false negatives have a wide currency in the realm of computers and computer applications, including
computer security,
spam filtering,
malware,
optical character recognition, and many others. For example, in the case of spam filtering: • Hypothesis: "The message is spam". • Null hypothesis: "The message is not spam". • Type I error (false positive): Spam filtering or spam blocking techniques wrongly classify a legitimate email message as spam and, as a result, interfere with its delivery. • Type II error (false negative): Spam email is not detected as spam, but is classified as non-spam. While most anti-spam tactics can block or filter a high percentage of unwanted emails, doing so without creating significant false-positive results is a much more demanding task. A low number of false negatives is an indicator of the efficiency of spam filtering. ==See also==