'' pot that is part of a
common garden experiment at
Universität Greifswald Metadata can be created either by automated information processing or by manual work. Elementary metadata captured by computers can include information about when an object was created, who created it, when it was last updated, file size, and file extension. In this context an
object refers to any of the following: • A physical item such as a book, CD, DVD, a paper map, chair, table, flower pot, etc. • An electronic file such as a digital image, digital photo, electronic document, program file, database table, etc. A
metadata engine collects, stores and analyzes information about data and metadata in use within a domain.
Data virtualization Data virtualization emerged in the 2000s as the new software technology to complete the virtualization "stack" in the enterprise. Metadata is used in data virtualization servers which are enterprise infrastructure components, alongside database and application servers. Metadata in these servers is saved as persistent repository and describe
business objects in various enterprise systems and applications. Structural metadata commonality is also important to support data virtualization.
Statistics and census services Standardization and harmonization work has brought advantages to industry efforts to build metadata systems in the statistical community. Several metadata guidelines and standards such as the European Statistics Code of Practice and ISO 17369:2013 (
Statistical Data and Metadata Exchange or SDMX)
European System of Central Banks, have implemented these and other such standards and guidelines with the goal of improving "efficiency when managing statistical business processes". Until the 1980s, many library catalogs used 3x5 inch cards in file drawers to display a book's title, author, subject matter, and an abbreviated
alpha-numeric string (
call number) which indicated the physical location of the book within the library's shelves. The
Dewey Decimal System employed by libraries for the classification of library materials by subject is an early example of metadata usage. The early paper catalog had information regarding whichever item was described on said card: title, author, subject, and a number as to where to find said item. Beginning in the 1980s and 1990s, many libraries replaced these paper file cards with computer databases. These computer databases make it much easier and faster for users to do keyword searches. Another form of older metadata collection is the use by the US Census Bureau of what is known as the "Long Form". The Long Form asks questions that are used to create demographic data to find patterns of distribution.
Libraries employ metadata in
library catalogues, most commonly as part of an
Integrated Library Management System. Metadata is obtained by
cataloging resources such as books, periodicals, DVDs, web pages or digital images. This data is stored in the integrated library management system,
ILMS, using the
MARC metadata standard. The purpose is to direct patrons to the physical or electronic location of items or areas they seek as well as to provide a description of the item/s in question. More recent and specialized instances of library metadata include the establishment of
digital libraries including
e-print repositories and digital image libraries. While often based on library principles, the focus on non-librarian use, especially in providing metadata, means they do not follow traditional or common cataloging approaches. Given the custom nature of included materials, metadata fields are often specially created e.g. taxonomic classification fields, location fields, keywords, or copyright statement. Standard file information such as file size and format are usually automatically included. Library operation has for decades been a key topic in efforts toward
international standardization. Standards for metadata in digital libraries include
Dublin Core,
METS,
MODS,
DDI,
DOI,
URN,
PREMIS schema,
EML, and
OAI-PMH. Leading libraries in the world give hints on their metadata standards strategies. A similar concept is the synset, short for
synonym set, a group of one or more
synonyms that share a common meaning in some context. Synsets are a fundamental concept in
computational linguistics and
lexical semantics, most notably used in the
WordNet lexical database of English. The concept has been extended to other languages and multilingual projects such as
EuroWordNet,
BabelNet, and the Global WordNet initiative. Synsets have been shown to play an important role in some areas of
Natural Language Processing, providing a bridge between lexical items and conceptual meaning, enabling:
Science and
persistent identifiers Metadata for scientific publications is often created by journal publishers and citation databases such as
PubMed and
Web of Science. The data contained within manuscripts or accompanying them as supplementary material is less often subject to metadata creation, though they may be submitted to e.g. biomedical databases after publication. The original authors and database curators then become responsible for metadata creation, with the assistance of automated processes. Comprehensive metadata for all experimental data is the foundation of the
FAIR Guiding Principles, or the standards for ensuring research data are
findable,
accessible,
interoperable, and
reusable. Such metadata can then be utilized, complemented, and made accessible in useful ways.
OpenAlex is a free online index of over 200 million scientific documents that integrates and provides metadata such as sources,
citations,
author information,
scientific fields, and research topics. Its
API and open source website can be used for metascience,
scientometrics, and novel tools that query this
semantic web of
papers. Another project under development,
Scholia, uses the metadata of scientific publications for various visualizations and aggregation features such as providing a simple user interface summarizing literature about a specific feature of the SARS-CoV-2 virus using
Wikidata's "main subject" property. In research labor, transparent metadata about authors' contributions to works have been proposed – e.g. the role played in the production of the paper, the level of contribution and the responsibilities. Moreover, various metadata about scientific outputs can be created or complemented – for instance, some organizations attempt to track and link citations of papers as 'Supporting', 'Mentioning' or 'Contrasting' the study. Other examples include developments of
alternative metrics – which, beyond providing help for assessment and findability, also aggregate many of the public discussions about a scientific paper on social media such as
Reddit,
citations on Wikipedia, and
reports about the study in the news media – and a call for showing
whether or not the original findings are confirmed or could get reproduced.
Museums Metadata in a museum context is the information that trained cultural documentation specialists, such as
archivists,
librarians, museum
registrars and
curators, create to index, structure, describe, identify, or otherwise specify works of art, architecture, cultural objects and their images. Descriptive metadata is most commonly used in museum contexts for object identification and resource recovery purposes. Most collecting institutions and museums use a
relational database to categorize cultural works and their images.
Online content Metadata has been instrumental in the creation of digital information systems and archives within museums and has made it easier for museums to publish digital content online. This has enabled audiences who might not have had access to cultural objects due to geographic or economic barriers to have access to them. In October 2009, the
Arizona Supreme Court has ruled that metadata records are
public record. Document metadata have proven particularly important in legal environments in which litigation has requested metadata, that can include sensitive information detrimental to a certain party in court. Using
metadata removal tools to "clean" or redact documents can mitigate the risks of unwittingly sending sensitive data. This process partially (see
data remanence) protects law firms from potentially damaging leaking of sensitive data through
electronic discovery. Opinion polls have shown that 45% of Americans are "not at all confident" in the ability of social media sites to ensure their personal data is secure and 40% say that social media sites should not be able to store any information on individuals. 76% of Americans say that they are not confident that the information advertising agencies collect on them is secure and 50% say that online advertising agencies should not be allowed to record any of their information at all.
European Union In the EU, the
Snowden disclosures have been influential in its review of matters concerning privacy and the processing of personal data. As of 2025, of the 18 European countries surveyed by the
Data Retention Directive, only Germany, the Netherlands and Romania do not have any data retention rules in force. On 6 October 2020, the Court of Justice for the European Union (CJEU) ruled that indiscriminate mass data retention schemes are illegal under EU law. With France passing a law on 18 December 2013 that facilitates the collection of data by the French military and intelligence services. While in April 2022 the Portuguese Constitutional Court declared unconstitutional the CJEU laws on the retention of data generated or processed in connection with the provision of publicly available electronic communications services or of public communications networks.
Australia In Australia, the need to strengthen national security has resulted in the introduction of a new metadata storage law. This new law means that both security and policing agencies will be allowed to access up to 2 years of an individual's metadata, with the aim of making it easier to stop any terrorist attacks and serious crimes from happening.
Legislation Legislative metadata has been discussed in various forums, such as workshops held by the
Legal Information Institute at the
Cornell Law School on 22 and 23 March 2010. The documentation for these workshops is titled "Suggested metadata practices for legislation and regulations". A handful of key points have been outlined by these discussions, section headings of which are listed as follows: • General Considerations • Document Structure • Document Contents • Metadata (elements of) • Layering • Point-in-time versus post-hoc
Healthcare Australian medical research pioneered the definition of metadata for applications in health care. That approach offers the first recognized attempt to adhere to international standards in medical sciences instead of defining a proprietary standard under the
World Health Organization (WHO) umbrella. The medical community yet did not approve of the need to follow metadata standards despite research that supported these standards.
Biomedical researches Research studies in the fields of
biomedicine and
molecular biology frequently yield large quantities of data, including results of
genome or
meta-genome sequencing,
proteomics data, and even notes or plans created during the course of research itself. Each data type involves its own variety of metadata and the processes necessary to produce these metadata. General metadata standards, such as ISA-Tab, allow researchers to create and exchange experimental metadata in consistent formats. Specific experimental approaches frequently have their own metadata standards and systems: metadata standards for
mass spectrometry include
mzML and SPLASH, while
XML-based standards such as
PDBML and SRA XML serve as standards for macromolecular structure and sequencing data, respectively. The products of biomedical research are generally realized as peer-reviewed manuscripts and these publications are yet another source of data .
Data warehousing A
data warehouse (DW) is a repository of an organization's electronically stored data. Data warehouses are designed to manage and store the data. Data warehouses differ from
business intelligence (BI) systems because BI systems are designed to use data to create reports and analyze the information, to provide strategic guidance to management. Metadata is an important tool in how data is stored in data warehouses. The purpose of a data warehouse is to house standardized, structured, consistent, integrated, correct, "cleaned" and timely data, extracted from various operational systems in an organization. The extracted data are integrated in the data warehouse environment to provide an enterprise-wide perspective. Data are structured in a way to serve the reporting and analytic requirements. The design of structural metadata commonality using a
data modeling method such as
entity-relationship model diagramming is important in any data warehouse development effort. They detail metadata on each piece of data in the data warehouse. An essential component of a
data warehouse/
business intelligence system is the metadata and tools to manage and retrieve the metadata.
Ralph Kimball describes metadata as the DNA of the data warehouse as metadata defines the elements of the
data warehouse and how they work together.
Kimball et al. refer to three main categories of metadata: technical, business and process. Technical metadata is primarily
definitional, while business metadata and process metadata are primarily
descriptive. The categories sometimes overlap. •
Technical metadata defines the objects and processes in a DW/BI system, as seen from a technical point of view. The technical metadata includes the system metadata, which defines the data structures such as tables, fields, data types, indexes, and partitions in the relational engine, as well as databases, dimensions, measures, and data mining models. Technical metadata defines the data model and the way it is displayed for the users, with the reports, schedules, distribution lists, and user security rights. •
Business metadata is content from the data warehouse described in more user-friendly terms. The business metadata tells you what data you have, where they come from, what they mean and what their relationship is to other data in the data warehouse. Business metadata may also serve as documentation for the DW/BI system. Users who browse the data warehouse are primarily viewing the business metadata. •
Process metadata is used to describe the results of various operations in the data warehouse. Within the
ETL process, all key data from tasks is logged on execution. This includes start time, end time, CPU seconds used, disk reads, disk writes, and rows processed. When troubleshooting the ETL or
query process, this sort of data becomes valuable. Process metadata is the fact measurement when building and using a DW/BI system. Some organizations make a living out of collecting and selling this sort of data to companies – in that case, the process metadata becomes the business metadata for the fact and dimension tables. Collecting process metadata is in the interest of business people who can use the data to identify the users of their products, which products they are using, and what level of service they are receiving.
Internet The
HTML format used to define web pages allows for the inclusion of a variety of types of metadata, from basic descriptive text, dates and keywords to further advanced metadata schemes such as the
Dublin Core,
e-GMS, and
AGLS standards. Pages and files can also be
geotagged with
coordinates, categorized or tagged, including collaboratively such as with
folksonomies. When media has
identifiers set or when such can be generated, information such as
file tags and descriptions can be pulled or
scraped from the Internet – for example about movies. Various online databases are aggregated and provide metadata for various data. The collaboratively built
Wikidata has identifiers not just for media but also abstract concepts, various objects, and other entities, that can be looked up by humans and machines to retrieve useful information and to link knowledge in other knowledge bases and databases. are not executing care and diligence when creating their own metadata and that metadata is part of a competitive environment where the metadata is used to promote the metadata creators own purposes. Studies show that search engines respond to web pages with metadata implementations, and Google has an announcement on its site showing the meta tags that its search engine understands. Enterprise search startup
Swiftype recognizes metadata as a relevance signal that webmasters can implement for their website-specific search engine, even releasing their own extension, known as Meta Tags 2.
Broadcast industry In the
broadcast industry, metadata is linked to audio and video
broadcast media to: •
identify the media:
clip or
playlist names, duration,
timecode, etc. •
describe the content: notes regarding the quality of video content, rating, description (for example, during a sport event,
keywords like
goal,
red card will be associated to some clips) •
classify media: metadata allows producers to sort the media or to easily and quickly find a video content (a
TV news could urgently need some
archive content for a subject). For example, the BBC has a large subject classification system,
Lonclass, a customized version of the more general-purpose
Universal Decimal Classification. This metadata can be linked to the video media thanks to the
video servers. Most major broadcast sporting events like
FIFA World Cup or the
Olympic Games use this metadata to distribute their video content to
TV stations through
keywords. It is often the host broadcaster who is in charge of organizing metadata through its
International Broadcast Centre and its video servers. This metadata is recorded with the images and entered by metadata operators (
loggers) who associate in live metadata available in
metadata grids through
software (such as
Multicam(LSM) or
IPDirector used during the FIFA World Cup or Olympic Games).
Ecology and environment Ecological and environmental metadata is intended to document the "who, what, when, where, why, and how" of data collection for a particular study. This typically means which organization or institution collected the data, what type of data, which date(s) the data was collected, the rationale for the data collection, and the methodology used for the data collection. Metadata should be generated in a format commonly used by the most relevant science community, such as
Darwin Core,
Ecological Metadata Language, or
Dublin Core. Metadata editing tools exist to facilitate metadata generation (e.g. Metavist,
Mercury, Morpho). Metadata should describe the
provenance of the data (where they originated, as well as any transformations the data underwent) and how to give credit for (cite) the data products.
Digital music When first released in 1982, Compact Discs only contained a Table Of Contents (TOC) with the number of tracks on the disc and their length in samples. Fourteen years later in 1996, a revision of the
CD Red Book standard added
CD-Text to carry additional metadata. But CD-Text was not widely adopted. Shortly thereafter, it became common for personal computers to retrieve metadata from external sources (e.g.
CDDB,
Gracenote) based on the TOC. Digital
audio formats such as
digital audio files superseded music formats such as
cassette tapes and
CDs in the 2000s. Digital audio files could be labeled with more information than could be contained in just the file name. That descriptive information is called the
audio tag or audio metadata in general. Computer programs specializing in adding or modifying this information are called
tag editors. Metadata can be used to name, describe, catalog, and indicate ownership or copyright for a digital audio file, and its presence makes it much easier to locate a specific audio file within a group, typically through use of a search engine that accesses the metadata. As different digital audio formats were developed, attempts were made to standardize a specific location within the digital files where this information could be stored. As a result, almost all digital audio formats, including
mp3, broadcast wav, and
AIFF files, have similar standardized locations that can be populated with metadata. The metadata for compressed and uncompressed digital music is often encoded in the
ID3 tag. Common editors such as
TagLib support MP3, Ogg Vorbis, FLAC, MPC, Speex, WavPack TrueAudio, WAV, AIFF, MP4, and ASF file formats.
Cloud applications With the availability of
cloud applications, which include those to add metadata to content, metadata is increasingly available over the Internet. == Administration and management ==