Probabilistic
graphical models form a large class of structured prediction models. In particular,
Bayesian networks and
random fields are popular. Other algorithms and models for structured prediction include
inductive logic programming,
case-based reasoning,
structured SVMs,
Markov logic networks,
Probabilistic Soft Logic, and
constrained conditional models. The main techniques are: •
Conditional random fields •
Structured support vector machines •
Structured k-nearest neighbours •
Recurrent neural networks, in particular
Elman networks •
Transformers.
Structured perceptron One of the easiest ways to understand algorithms for general structured prediction is the structured perceptron by
Collins. This algorithm combines the
perceptron algorithm for learning
linear classifiers with an inference algorithm (classically the
Viterbi algorithm when used on sequence data) and can be described abstractly as follows: • First, define a function \phi(x,y) that maps a training sample x and a candidate prediction y to a vector of length n (x and y may have any structure; n is problem-dependent, but must be fixed for each model). Let GEN be a function that generates candidate predictions. • Then: ::Let w be a weight vector of length n ::For a predetermined number of iterations: :::For each sample x in the training set with true output t: ::::Make a prediction \hat{y}: \hat{y}={\operatorname{arg\,max}}\, \{y \in GEN(x)\}\,(w^T, \phi(x,y)) ::::Update w (from \hat{y} towards t): w=w+c(-\phi(x, \hat{y})+ \phi(x,t)), where c is the
learning rate. In practice, finding the argmax over {GEN}({x}) is done using an algorithm such as Viterbi or a
max-sum, rather than an
exhaustive search through an exponentially large set of candidates. The idea of learning is similar to that for
multiclass perceptrons. ==References==