Structured prediction

Structured prediction or structured output learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than discrete or real values.

Applications

An example application is the problem of translating a natural language sentence into a syntactic representation such as a parse tree. This can be seen as a structured prediction problem that predicts the entire tag sequence for a sentence (rather than just individual tags) via the Viterbi algorithm. ==Techniques==

Techniques

Probabilistic graphical models form a large class of structured prediction models. In particular, Bayesian networks and random fields are popular. Other algorithms and models for structured prediction include inductive logic programming, case-based reasoning, structured SVMs, Markov logic networks, Probabilistic Soft Logic, and constrained conditional models. The main techniques are: • Conditional random fields • Structured support vector machines • Structured k-nearest neighbours • Recurrent neural networks, in particular Elman networks • Transformers. Structured perceptron One of the easiest ways to understand algorithms for general structured prediction is the structured perceptron by Collins. This algorithm combines the perceptron algorithm for learning linear classifiers with an inference algorithm (classically the Viterbi algorithm when used on sequence data) and can be described abstractly as follows: • First, define a function \phi(x,y) that maps a training sample x and a candidate prediction y to a vector of length n (x and y may have any structure; n is problem-dependent, but must be fixed for each model). Let GEN be a function that generates candidate predictions. • Then: ::Let w be a weight vector of length n ::For a predetermined number of iterations: :::For each sample x in the training set with true output t: ::::Make a prediction \hat{y}: \hat{y}={\operatorname{arg\,max}}\, \{y \in GEN(x)\}\,(w^T, \phi(x,y)) ::::Update w (from \hat{y} towards t): w=w+c(-\phi(x, \hat{y})+ \phi(x,t)), where c is the learning rate. In practice, finding the argmax over {GEN}({x}) is done using an algorithm such as Viterbi or a max-sum, rather than an exhaustive search through an exponentially large set of candidates. The idea of learning is similar to that for multiclass perceptrons. ==References==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com