Not all classification models are naturally probabilistic, and some that are, notably naive Bayes classifiers,
decision trees and
boosting methods, produce distorted class probability distributions. In the case of decision trees, where is the proportion of training samples with label in the leaf where ends up, these distortions come about because learning algorithms such as
C4.5 or
CART explicitly aim to produce homogeneous leaves (giving probabilities close to zero or one, and thus high
bias) while using few samples to estimate the relevant proportion (high
variance). Calibration can be assessed using a
calibration plot (also called a
reliability diagram). A calibration plot shows the proportion of items in each class for bands of predicted probability or score (such as a distorted probability distribution or the "signed distance to the hyperplane" in a support vector machine). Deviations from the identity function indicate a poorly-calibrated classifier for which the predicted probabilities or scores can not be used as probabilities. In this case one can use a method to turn these scores into properly
calibrated class membership probabilities. For the
binary case, a common approach is to apply
Platt scaling, which learns a
logistic regression model on the scores. An alternative method using
isotonic regression is generally superior to Platt's method when sufficient training data is available. ==Evaluating probabilistic classification==