Linguistic Linked Open Data is applied to address a number of scientific research problems: • In all areas of empirical linguistics, computational philology, and
natural language processing, linguistic annotation and linguistic markup represent central elements of analysis. However, progress in this field is being hampered by
interoperability challenges, most notably differences in vocabularies and annotation schemes used for different resources and tools. Using Linked Data to connect language resources and
ontologies/
terminology repositories facilitate re-using shared vocabularies and interpreting them against a common basis. • In
corpus linguistics and computational philology,
overlapping markup represents a notorious problem to conventional
XML formats. Hence, graph-based data models have been suggested since the late 1990s. These are traditionally represented by means of multiple, interlinked XML files (standoff XML), which are poorly supported by off-the-shelf XML technology. Modeling such complex annotations as Linked Data represents a formalism semantically equivalent to standoff XML, but eliminates the need for special-purpose technology, and, instead, relies on the existing RDF ecosystem. • Multilingual issues, including the linking of lexical resources such as
WordNet as performed in the Interlingual Index of the Global WordNet Association and interconnecting heterogeneous resources such as WordNet and Wikipedia, as was done in
BabelNet. • Providing forums for standardization of linguistic resource information Linguistic Linked Open Data is closely related with the development of • best practices for linking lexical data on the web (for data published in accordance with
OntoLex conventions) • best practices for creating
annotations on the web (e.g., using the
Web Annotation standard) • best practices for modelling and sharing textual resources with
overlapping markup Selected research projects Uses and development of LLOD have been subject to several large-scale research projects, including • LOD2. Creating Knowledge out of Interlinked Data (11 EU countries + Korea, 2010–2014) • MONNET. Multilingual Ontologies for Networked Knowledge (5 EU countries, 2010–2013) • LIDER. Linked Data as an enabler of cross-media and multilingual content analytics for enterprises across Europe (5 EU countries, 2013–2015) • QTLeap. Quality Translation by Deep Language Engineering Approaches (6 EU countries, 2013–2016) • LiODi. Linked Open Dictionaries (BMBF eHumanities Early Career Research Group,
Goethe University Frankfurt, Germany, 2015–2020) • FREME. Open Framework of E-Services for Multilingual and Semantic Enrichment of Digital Content (6 EU countries, 2015–2017) • POSTDATA. Poetry Standardization and Linked Open Data (ERC Starting Grant, UNED, Spain, 2016–2021) • Linking Latin (ERC Consolidator Grant, Universita Cattolica del Sacro Cuore, Italy, 2018–2023) • Pret-a-LLOD (5 EU countries, 2019–2021) • NexusLinguarum. European network for Web-centred linguistic
data science (COST Action, 35 COST countries, 2 near neighboring countries, one international partner country, 2019–2023)
Selected resources As of October 2018, the 10 most frequently linked resources in the LLOD diagram are (in order of the number of linked datasets): • The
Ontologies of Linguistic Annotation (
OLiA, linked with 74 datasets) provide reference terminology for linguistic annotations and grammatical metadata; •
WordNet (linked with 51 datasets), a lexical database for English and pivot for developing similar databases for other languages, with several editions (Princeton edition linked with 36 datasets; W3C edition linked with 8 datasets; VU edition linked with 7 datasets); •
DBpedia (linked with 50 datasets) multilingual knowledge basis of general world knowledge, based on Wikipedia; • lexinfo.net (linked with 36 datasets) provides reference terminology for lexical resources; •
BabelNet (linked with 33 datasets) multilingual lexicalized
semantic network, based on the aggregation of various other resources, most notably WordNet and Wikipedia; • lexvo.org (linked with 26 datasets) provides language identifiers and other language-related data. Most importantly, lexvo provides an RDF representation of
ISO 639-3 3-letter codes for language identifiers and information about these languages; • The
ISO 12620 Data Category Registry (ISOcat; RDF edition, linked with 10 datasets) provides a semistructured repository for various language-related terminology. ISOcat is hosted by The Language Archive, respectively, the
DOBES project, at the
Max Planck Institute for Psycholinguistics, but currently in transition to
CLARIN; •
UBY (RDF edition
lemon-Uby, linked with 9 datasets), a lexical network for English, aggregated from various lexical resources; •
Glottolog (linked with 7 datasets) provides fine-grained language identifiers for low-resource languages, in particular, many not covered by lexvo.org; •
Wiktionary-
DBpedia links (
wiktionary.dbpedia.org, linked with 7 datasets), Wiktionary-based lexicalizations for DBpedia concepts. • DBnary an RDF version of 23
Wiktionary Language Editions. == Aspects ==