Unlike generative modelling, which studies the
joint probability P(x,y), discriminative modeling studies the P(y|x) or maps the given unobserved variable (target) x to a class label y dependent on the observed variables (training samples). For example, in
object recognition, x is likely to be a vector of raw pixels (or features extracted from the raw pixels of the image). Within a probabilistic framework, this is done by modeling the
conditional probability distribution P(y|x), which can be used for predicting y from x. Note that there is still distinction between the conditional model and the discriminative model, though more often they are simply categorised as discriminative model.
Pure discriminative model vs. conditional model A
conditional model models the conditional
probability distribution, while the traditional discriminative model aims to optimize on mapping the input around the most similar trained samples.
Contrast with generative model In
statistical classification, two main approaches are called the
generative approach and the
discriminative approach. These compute
classifiers by different approaches, differing in the degree of
statistical modelling. Terminology is inconsistent, but three major types can be distinguished: • A generative model is a
statistical model of the
joint probability distribution P(X, Y) on a given
observable variable X and
target variable Y; A generative model can be used to "generate" random instances (
outcomes) of an observation
x. • A
discriminative model is a model of the
conditional probability P(Y\mid X = x) of the target
Y, given an observation
x. It can be used to "discriminate" the value of the target variable
Y, given an observation
x. • Classifiers computed without using a probability model are also referred to loosely as "discriminative". The distinction between these last two classes is not consistently made. An alternative division defines these symmetrically as: • a
generative model is a model of the conditional probability of the observable
X, given a target
y, symbolically, P(X\mid Y = y) • a
discriminative model is a model of the conditional probability of the target
Y, given an observation
x, symbolically, P(Y\mid X = x) Regardless of precise definition, the terminology is constitutional because a generative model can be used to "generate" random instances (
outcomes), either of an observation and target (x, y), or of an observation
x given a target value
y, while a discriminative model or discriminative classifier (without a model) can be used to "discriminate" the value of the target variable
Y, given an observation
x.
Contrast in approaches Let's say we are given the m class labels (classification) and n feature variables, Y:\{y_1, y_2,\ldots,y_m\}, X:\{x_1,x_2,\ldots,x_n \}, as the training samples. A generative model takes the joint probability P(x,y), where x is the input and y is the label, and predicts the most possible known label \widetilde{y}\in Y for the unknown variable \widetilde{x} using
Bayes' theorem. Discriminative models, as opposed to
generative models, do not allow one to generate samples from the
joint distribution of observed and target variables. However, for tasks such as
classification and
regression that do not require the joint distribution, discriminative models can yield superior performance (in part because they have fewer variables to compute). On the other hand, generative models are typically more flexible than discriminative models in expressing dependencies in complex learning tasks. In addition, most discriminative models are inherently
supervised and cannot easily support
unsupervised learning. Application-specific details ultimately dictate the suitability of selecting a discriminative versus generative model. Discriminative models and generative models also differ in introducing the
posterior possibility. To maintain the least expected loss, the minimization of result's misclassification should be acquired. In the discriminative model, the posterior probabilities, P(y|x) , is inferred from a parametric model, where the parameters come from the training data. Points of estimation of the parameters are obtained from the maximization of likelihood or distribution computation over the parameters. On the other hand, considering that the generative models focus on the joint probability, the class posterior possibility P(k) is considered in
Bayes' theorem, which is : P(y|x) = \frac{p(x|y)p(y)}{\textstyle \sum_{i}p(x|i)p(i) \displaystyle}=\frac{p(x|y)p(y)}{p(x)}.
Advantages and disadvantages in application In the repeated experiments, logistic regression and naive Bayes are applied here for different models on binary classification task, discriminative learning results in lower asymptotic errors, while generative one results in higher asymptotic errors faster. However, in Ulusoy and Bishop's joint work,
Comparison of Generative and Discriminative Techniques for Object Detection and Classification, they state that the above statement is true only when the model is the appropriate one for data (i.e.the data distribution is correctly modeled by the generative model).
Advantages Significant advantages of using discriminative modeling are: • Higher accuracy, which mostly leads to better learning result. • Allows simplification of the input and provides a direct approach to P(y|x) • Saves calculation resource • Generates lower asymptotic errors Compared with the advantages of using generative modeling: • Takes all data into consideration, which could result in slower processing as a disadvantage • Requires fewer training samples • A flexible framework that could easily cooperate with other needs of the application
Disadvantages • Training method usually requires multiple numerical optimization techniques • Similarly by the definition, the discriminative model will need the combination of multiple subtasks for solving a complex real-world problem == Typical discriminative modelling approaches ==