Text documents Markup languages like
XML and
HTML annotate text in a way that is syntactically distinguishable from that text. They can be used to add information about the desired visual presentation, or machine-readable semantic information, as in the
semantic web.
Tabular data This includes
CSV and
XLS. The process of assigning semantic annotations to tabular data is referred to as semantic labelling.
Semantic Labelling is the process of assigning annotations from
ontologies to tabular data. This process is also referred to as semantic annotation. coordinates, and more. as follows: geometric (using lines and planes, such as
Support-vector machine,
Linear regression), probabilistic (e.g.,
Conditional random field), logical (e.g.,
Decision tree learning), and Non-ML techniques (e.g., balancing coverage and specificity use
Jaccard index and
TF-IDF similarity for textual data and
Kolmogorov–Smirnov test for the numeric ones. Alobaid and Corcho) to label numeric columns.
Probabilistic techniques Limaye et al. uses
TF-IDF similarity and
graphical models. They also use
support-vector machine to compute the weights. Venetis et al. construct an isA database which consists of the pairs (instance, class) and then compute maximum likelihood using these pairs. Alobaid and Corcho approximated the q-q plot for predicting the properties of numeric columns.
Logical techniques Syed et al. built Wikitology, which is "a hybrid knowledge base of structured and unstructured information extracted from Wikipedia augmented by RDF data from DBpedia and other Linked Data resources."
Semantic labelling common tasks Here are some of the common semantic labelling tasks presented in the literature:
Entity linking and disambiguation This is the most common task in semantic labelling. Given a text of a cell and a data source, the approach predicts the entity and link it to the one identified in the given data source. For example, if the input to the approach were the text "Richard Feynman" and a URL to the SPARQL endpoint of DBpedia, the approach would return "http://dbpedia.org/resource/Richard_Feynman", which is the entity from DBpedia. Some approaches use exact match. Some approaches expects the subject column as an input).
Relation prediction The relation between
Madrid and
Spain is "capitalOf". Such relations can easily be found in ontologies, such as
DBpedia. Venetis et al. to extract the relation between two columns. Syed et al. is the most common gold standard for semantic labelling. Two versions exists of T2D: T2Dv1 (sometimes are referred to T2D as well) and T2Dv2.
Source control The "annotate" function (also known as "blame" or "praise") used in
source control systems such as
Git,
Team Foundation Server and
Subversion determines who
committed changes to the source code into the repository. This outputs a copy of the source code where each line is annotated with the name of the last contributor to edit that line (and possibly a revision number). This can help establish blame in the event a change caused a malfunction, or identify the author of brilliant code.
Programming Java annotations A special case is the
Java programming language, where annotations can be used as a special form of syntactic
metadata in the source code and can be manipulated upon with
reflective programming. Classes, methods, variables, parameters and packages may be annotated. The annotations can be embedded in
class files generated by the compiler and may be retained by the
Java virtual machine and thus influence the
run-time behaviour of an application. It is possible to create meta-annotations out of the existing ones in Java. Other languages, such as
C#, have a similar feature called "
attributes".
C++ features "
attributes" which allow the programmer to give indications to the compiler, and
C++26 introduces
reflection annotations similar to Java annotations.
Image annotation Automatic image annotation is used to classify images for
image retrieval systems.
Computational biology Since the 1980s,
molecular biology and
bioinformatics have created the need for
DNA annotation. DNA annotation or genome annotation is the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. An annotation (irrespective of the context) is a note added by way of explanation or commentary. Once a genome is sequenced, it needs to be annotated to make sense of it.
Digital imaging In the
digital imaging community the term annotation is commonly used for visible metadata superimposed on an
image without changing the underlying master image, such as
sticky notes, virtual laser pointers, circles, arrows, and black-outs (cf.
redaction). In the
medical imaging community, an annotation is often referred to as a
region of interest and is encoded in
DICOM format. == Other uses ==