Wikidata is a
document-oriented database, focusing on
items, which represent any kind of topic, concept, or object. Each item is allocated a unique
persistent identifier called its
QID, a positive integer prefixed with the upper-case letter "Q". This makes it possible to provide translations of the basic information describing the topic each item covers without favouring any particular language. Some examples of items and their QIDs are , , , , and . Item
labels do not need to be unique. For example, there are two items named "Elvis Presley": , which represents
the American singer and actor, and , which represents his
self-titled album. However, the combination of a label and its
description must be unique. To avoid ambiguity, an item's QID is hence linked to this combination.
Main parts Fundamentally, an item consists of: • An
identifier (the QID), related to a label and a description. • Optionally, multiple aliases and some number of statements (and their properties and values).
Statements (Q111). Values include links to other items and to
Wikimedia Commons.
Statements are how any information known about an item is recorded in Wikidata. Formally, they consist of
key–value pairs, which match a
property (such as "author", or "publication date") with one or more entity
values (such as "
Sir Arthur Conan Doyle" or "1902"). For example, the informal English statement "milk is white" would be encoded by a statement pairing the property with the value under the item . Statements may map a property to more than one value. For example, the "occupation" property for
Marie Curie could be linked with the values "physicist" and "chemist", to reflect the fact that she engaged in both occupations. Values may take on many types including other Wikidata items, strings, numbers, or media files. Properties prescribe what types of values they may be paired with. For example, the property may only be paired with values of type "URL". Optionally,
qualifiers can be used to refine the meaning of a statement by providing additional information. For example, a "population" statement could be modified with a qualifier such as "point in time (P585): 2011" (as its own key-value pair). Values in the statements may also be annotated with
references, pointing to a source backing up the statement's content. As with statements, all qualifiers and references are property–value pairs.
Properties Each property has a numeric identifier prefixed with a capital P and a page on Wikidata with optional label, description, aliases, and statements. As such, there are properties with the sole purpose of describing other properties, such as . Properties may also define more complex rules about their intended usage, termed
constraints. For example, the property includes a "single value constraint", reflecting the reality that (typically) territories have only one capital city. Constraints are treated as testing alerts and hints, rather than inviolable rules. Before a new property is created, it needs to undergo a discussion process. The most used property is , which is used on more than item pages
Lexemes In
linguistics, a
lexeme is a unit of
lexical meaning representing a group of words that share the same core meaning and grammatical characteristics. Similarly, Wikidata's
lexemes are items with a structure that makes them more suitable to store
lexicographical data. Since 2016, Wikidata has supported lexicographical entries in the form of lexemes. In Wikidata, lexicographical entries have a different identifier from regular item entries. These entries are prefixed with the letter L, such as in the example entries for and . Lexicographical entries in Wikidata can contain statements, senses, and forms. The use of lexicographical entries in Wikidata allows for the documentation of word usage, the connection between words and items on Wikidata, word translations, and enables machine-readable lexicographical data. In 2020, lexicographical entries on Wikidata exceeded 250,000. The language with the most lexicographical entries was
Russian, with a total of 101,137 lexemes, followed by
English with 38,122 lexemes. There are over 668 languages with lexicographical entries on Wikidata.
Entity schemas in Wikidata In Wikidata, a schema is a data model that outlines the necessary attributes for a data item. For instance, a data item that uses the attribute "
instance of" with the value "
human" would typically include attributes such as "
place of birth," "
date of birth,"
"date of death," and "
place of death." The entity schema in Wikidata utilizes
Shape Expression (ShEx) to describe the data in Wikidata items in the form of a
Resource Description Framework (RDF). The use of entity schemas in Wikidata helps address data inconsistencies and unchecked vandalism. Entity schemas are stored with different identifiers than those used for items, properties, and lexemes. Entity schemas are stored with an "E" identifier, such as
E10 for the entity schema of human data instances and
E270 for the entity schema of building data instances. This extension has since been installed on Wikidata and enables contributors to use ShEx for validating and describing Resource Description Framework data in items and lexemes. Any item or lexeme on Wikidata can be validated against an entity schema, and this makes it an important tool for quality assurance. ==Content==