The following is a list of some of the most commonly researched tasks in natural language processing. Some of these tasks have direct real-world applications, while others more commonly serve as subtasks that are used to aid in solving larger tasks. Though natural language processing tasks are closely intertwined, they can be subdivided into categories for convenience. A coarse division is given below.
Text and speech processing ;
Optical character recognition (OCR) :Given an image representing printed text, determine the corresponding text. ;
Speech recognition: Given a sound clip of a person or people speaking, determine the textual representation of the speech. This is the opposite of
text to speech and is one of the extremely difficult problems colloquially termed "
AI-complete" (see above). In
natural speech there are hardly any pauses between successive words, and thus
speech segmentation is a necessary subtask of speech recognition (see below). In most spoken languages, the sounds representing successive letters blend into each other in a process termed
coarticulation, so the conversion of the
analog signal to discrete characters can be a very difficult process. Also, given that words in the same language are spoken by people with different accents, the speech recognition software must be able to recognize the wide variety of input as being identical to each other in terms of its textual equivalent. ;
Speech segmentation: Given a sound clip of a person or people speaking, separate it into words. A subtask of
speech recognition and typically grouped with it. ;
Text-to-speech :Given a text, transform those units and produce a spoken representation. Text-to-speech can be used to aid the visually impaired. ;
Word segmentation (
Tokenization) :
Tokenization is a text-processing technique that divides text into individual words or word fragments. This technique results in two key components: a word index and tokenized text. The word index is a list that maps unique words to specific numerical identifiers, and the tokenized text replaces each word with its corresponding numerical token. These numerical tokens are then used in various deep learning methods.
Morphological analysis ;
Lemmatization: The task of removing inflectional endings only and to return the base dictionary form of a word which is also known as a lemma. Lemmatization is another technique for reducing words to their normalized form. But in this case, the transformation actually uses a dictionary to map words to their actual form. ;
Morphological segmentation: Separate words into individual
morphemes and identify the class of the morphemes. The difficulty of this task depends greatly on the complexity of the
morphology (
i.e., the structure of words) of the language being considered.
English has fairly simple morphology, especially
inflectional morphology, and thus it is often possible to ignore this task entirely and simply model all possible forms of a word (e.g., "open, opens, opened, opening") as separate words. In languages such as
Turkish or
Meitei, a highly
agglutinated Indian language, however, such an approach is not possible, as each dictionary entry has thousands of possible word forms. ;
Part-of-speech tagging: Given a sentence, determine the
part of speech (POS) for each word. Many words, especially common ones, can serve as multiple parts of speech. For example, "book" can be a
noun ("the book on the table") or
verb ("to book a flight"); "set" can be a noun, verb or
adjective; and "out" can be any of at least five different parts of speech. ;
Stemming :The process of reducing inflected (or sometimes derived) words to a base form (e.g., "close" will be the root for "closed", "closing", "close", "closer" etc.). Stemming yields similar results as lemmatization, but does so on grounds of rules, not a dictionary.
Syntactic analysis ;
Grammar induction : Generate a
formal grammar that describes a language's syntax. ;
Sentence breaking (also known as "
sentence boundary disambiguation") : Given a chunk of text, find the sentence boundaries. Sentence boundaries are often marked by
periods or other
punctuation marks, but these same characters can serve other purposes (e.g., marking
abbreviations). ;
Parsing: Determine the
parse tree (grammatical analysis) of a given sentence. The
grammar for
natural languages is
ambiguous and typical sentences have multiple possible analyses: perhaps surprisingly, for a typical sentence there may be thousands of potential parses (most of which will seem completely nonsensical to a human). There are two primary types of parsing:
dependency parsing and
constituency parsing. Dependency parsing focuses on the relationships between words in a sentence (marking things like primary objects and predicates), whereas constituency parsing focuses on building out the parse tree using a
probabilistic context-free grammar (PCFG) (see also
stochastic grammar).
Lexical semantics (of individual words in context) ;
Lexical semantics: What is the computational meaning of individual words in context? ;
Distributional semantics: How can we learn semantic representations from data? ;
Named entity recognition (NER): Given a stream of text, determine which items in the text map to proper names, such as people or places, and what the type of each such name is (e.g. person, location, organization). Although
capitalization can aid in recognizing named entities in languages such as English, this information cannot aid in determining the type of
named entity, and in any case, is often inaccurate or insufficient. For example, the first letter of a sentence is also capitalized, and named entities often span several words, only some of which are capitalized. Furthermore, many other languages in non-Western scripts (e.g.
Chinese or
Arabic) do not have any capitalization at all, and even languages with capitalization may not consistently use it to distinguish names. For example,
German capitalizes all
nouns, regardless of whether they are names, and
French and
Spanish do not capitalize names that serve as
adjectives. This task is also referred to as token classification. ;
Sentiment analysis (see also
Multimodal sentiment analysis) : Sentiment analysis involves identifying and classifying the emotional tone expressed in text. This technique involves analyzing text to determine whether the expressed sentiment is positive, negative, or neutral. Models for sentiment classification typically utilize inputs such as
word n-grams,
Term Frequency-Inverse Document Frequency (TF-IDF) features, hand-generated features, or employ
deep learning models designed to recognize both long-term and short-term dependencies in text sequences. The applications of sentiment analysis are diverse, extending to tasks such as categorizing customer reviews on various online platforms. ;
Terminology extraction :The goal of terminology extraction is to automatically extract relevant terms from a given corpus. ;
Word-sense disambiguation (WSD): Many words have more than one
meaning; we have to select the meaning which makes the most sense in context. For this problem, we are typically given a list of words and associated word senses, e.g. from a dictionary or an online resource such as
WordNet. ;
Entity linking: Many words—typically proper names—refer to
named entities; here we have to select the entity (a famous individual, a location, a company, etc.) which is referred to in context.
Relational semantics (semantics of individual sentences) ;
Relationship extraction: Given a chunk of text, identify the relationships among named entities (e.g. who is married to whom). ;
Semantic parsing: Given a piece of text (typically a sentence), produce a formal representation of its semantics, either as a graph (e.g., in
AMR parsing) or in accordance with a logical formalism (e.g., in
DRT parsing). This challenge typically includes aspects of several more elementary NLP tasks from semantics (e.g., semantic role labelling, word-sense disambiguation) and can be extended to include full-fledged discourse analysis (e.g., discourse analysis, coreference; see
Natural language understanding below). ;
Semantic role labelling (see also implicit semantic role labelling below) :Given a single sentence, identify and disambiguate semantic predicates (e.g., verbal
frames), then identify and classify the frame elements (
semantic roles).
Discourse (semantics beyond individual sentences) ;
Coreference resolution: Given a sentence or larger chunk of text, determine which words ("mentions") refer to the same objects ("entities").
Anaphora resolution is a specific example of this task, and is specifically concerned with matching up
pronouns with the nouns or names to which they refer. The more general task of coreference resolution also includes identifying so-called "bridging relationships" involving
referring expressions. For example, in a sentence such as "He entered John's house through the front door", "the front door" is a referring expression and the bridging relationship to be identified is the fact that the door being referred to is the front door of John's house (rather than of some other structure that might also be referred to). ;
Discourse analysis: This rubric includes several related tasks. One task is discourse parsing, i.e., identifying the
discourse structure of a connected text, i.e. the nature of the discourse relationships between sentences (e.g. elaboration, explanation, contrast). Another possible task is recognizing and classifying the
speech acts in a chunk of text (e.g. yes–no question, content question, statement, assertion, etc.). ; :Given a single sentence, identify and disambiguate semantic predicates (e.g., verbal
frames) and their explicit semantic roles in the current sentence (see
Semantic role labelling above). Then, identify semantic roles that are not explicitly realized in the current sentence, classify them into arguments that are explicitly realized elsewhere in the text and those that are not specified, and resolve the former against the local text. A closely related task is zero anaphora resolution, i.e., the extension of coreference resolution to
pro-drop languages. ;
Recognizing textual entailment: Given two text fragments, determine if one being true entails the other, entails the other's negation, or allows the other to be either true or false. ;
Topic segmentation and recognition :Given a chunk of text, separate it into segments each of which is devoted to a topic, and identify the topic of the segment. ;
Argument mining :The goal of argument mining is the automatic extraction and identification of argumentative structures from
natural language text with the aid of computer programs. Such argumentative structures include the premise, conclusions, the
argument scheme and the relationship between the main and subsidiary argument, or the main and counter-argument within discourse.
Higher-level NLP applications ;
Automatic summarization (text summarization): Produce a readable summary of a chunk of text. Often used to provide summaries of the text of a known type, such as research papers, articles in the financial section of a newspaper. ; :Grammatical error detection and correction involves a great band-width of problems on all levels of linguistic analysis (phonology/orthography, morphology, syntax, semantics, pragmatics). Grammatical error correction is impactful since it affects hundreds of millions of people that use or acquire English as a second language. It has thus been subject to a number of shared tasks since 2011. As far as orthography, morphology, syntax and certain aspects of semantics are concerned, and due to the development of powerful neural language models such as
GPT-2, this can now (2019) be considered a largely solved problem and is being marketed in various commercial applications. ;
Logic translation :Translate a text from a natural language into formal logic. ;
Machine translation (MT) :Automatically translate text from one human language to another. This is one of the most difficult problems, and is a member of a class of problems colloquially termed "
AI-complete", i.e. requiring all of the different types of knowledge that humans possess (grammar, semantics, facts about the real world, etc.) to solve properly. ;
Natural language understanding (NLU): Convert chunks of text into more formal representations such as
first-order logic structures that are easier for
computer programs to manipulate. Natural language understanding involves the identification of the intended semantic from the multiple possible semantics which can be derived from a natural language expression which usually takes the form of organized notations of natural language concepts. Introduction and creation of language metamodel and ontology are efficient however empirical solutions. An explicit formalization of natural language semantics without confusions with implicit assumptions such as
closed-world assumption (CWA) vs.
open-world assumption, or subjective Yes/No vs. objective True/False is expected for the construction of a basis of semantics formalization. ;
Natural language generation (NLG): :Convert information from computer databases or semantic intents into readable human language. ; Book generation :Not an NLP task proper but an extension of natural language generation and other NLP tasks is the creation of full-fledged books. The first machine-generated book was created by a rule-based system in 1984 (Racter, ''The policeman's beard is half-constructed
). The first published work by a neural network was published in 2018, 1 the Road, marketed as a novel, contains sixty million words. Both these systems are basically elaborate but non-sensical (semantics-free) language models. The first machine-generated science book was published in 2019 (Beta Writer, Lithium-Ion Batteries
, Springer, Cham). Unlike Racter
and 1 the Road'', this is grounded on factual knowledge and based on text summarization. ;
Document AI :A Document AI platform sits on top of the NLP technology enabling users with no prior experience of artificial intelligence, machine learning or NLP to quickly train a computer to extract the specific data they need from different document types. NLP-powered Document AI enables non-technical teams to quickly access information hidden in documents, for example, lawyers, business analysts and accountants. ;
Dialogue management :Computer systems intended to converse with a human. ;
Question answering: Given a human-language question, determine its answer. Typical questions have a specific right answer (such as "What is the capital of Canada?"), but sometimes open-ended questions are also considered (such as "What is the meaning of life?"). ;
Text-to-image generation: Given a description of an image, generate an image that matches the description. ; Text-to-scene generation: Given a description of a scene, generate a
3D model of the scene. ;
Text-to-video: Given a description of a video, generate a video that matches the description. == General tendencies and (possible) future directions ==