In
computational linguistics, coreference resolution is a well-studied problem in
discourse. To derive the correct interpretation of a text, or even to estimate the relative importance of various mentioned subjects, pronouns and other
referring expressions must be connected to the right individuals. Algorithms intended to resolve coreferences commonly look first for the nearest preceding individual that is compatible with the referring expression. For example,
she might attach to a preceding expression such as
the woman or
Anne, but not as probably to
Bill. Pronouns such as
himself have much stricter constraints. As with many linguistic tasks, there is a tradeoff between
precision and recall.
Cluster-quality metrics commonly used to evaluate coreference resolution algorithms include the
Rand index, the
adjusted Rand index, and different
mutual information-based methods. A particular problem for coreference resolution in English is the pronoun
it, which has many uses.
It can refer much like
he and
she, except that it generally refers to inanimate objects (the rules are actually more complex: animals may be any of
it,
he, or
she; ships are traditionally
she; hurricanes are usually
it despite having gendered names).
It can also refer to abstractions rather than beings, e.g. ''He was paid minimum wage, but didn't seem to mind it.
Finally, it'' also has
pleonastic uses, which do not refer to anything specific: {{ordered list | list-style-type = lower-alpha Pleonastic uses are not considered referential, and so are not part of coreference. Approaches to coreference resolution can broadly be separated into mention-pair, mention-ranking or entity-based algorithms. Mention-pair algorithms involve
binary decisions if a pair of two given mentions belong to the same entity. Entity-wide constraints like
gender are not considered, which leads to
error propagation. For example, the pronouns
he or
she can both have a high probability of coreference with
the teacher, but cannot be coreferent with each other. Mention-ranking algorithms expand on this idea but instead stipulate that one mention can only be coreferent with one (previous) mention. As a result, each previous mention must be given a score and the highest scoring mention (or no mention) is linked. Finally, in entity-based methods mentions are linked based on information of the whole coreference chain instead of individual mentions. The representation of a variable-width chain is more complex and computationally expensive than mention-based methods, which lead to these algorithms being mostly based on
neural network architectures. ==See also==