EIDR is built on a collection of records (which are further sub-divided into fields) that are stored in a central registry. These records are referenced externally by DOIs, which are assigned when a record is created, and each identifier is immutable thereafter. The identifier resolution system underlying DOIs is the
Handle System and so each native EIDR Content ID is a handle formatted, in increasing specificity, to handle, DOI and EIDR standards.
Content ID format The
canonical form of an EIDR Content ID is an instance of a handle and has the format: :
10.5240/XXXX-XXXX-XXXX-XXXX-XXXX-C where •
10.5240 is the DOI prefix for an EIDR asset. The "10" indicates the handle is a DOI; other prefixes are assigned to other asset types (e.g.
academic publications). The digits between the "." and "/" form the sub-prefix, which indicates which registration agency within the International DOI Foundation (IDF) has rights to manage these handles. "5240" is assigned to the EIDR Association. •
XXXX-XXXX-XXXX-XXXX-XXXX-C is the DOI suffix. Each "X" denotes a
hexadecimal digit (A-F), and "C" is an
ISO 7064 Mod 37,36
check digit. There is also a 96-bit compact binary form that is intended for embedding in small payloads such as
watermarks. This form is generated from the canonical format as follows: • 16-bit sub-prefix: generated by interpreting the sub-prefix as a binary value, e.g. B'0001010001111000' • 80-bit suffix: the non-checksum part of the suffix, represented as 10 bytes The
Uniform Resource Name form for an EIDR ID is specified in . For use on the web an EIDR content ID can be represented as a URI in one of these forms: •
https://doi.org/10.5240/XXXX-XXXX-XXXX-XXXX-XXXX-C: this is an EIDR ID represented as a DOI proxy reference (it will be redirected from DOI to the EIDR registry) •
info: [deprecated]: this is an EIDR ID represented as an RFC 4452 compliant "info" URI (remembering that all EIDR IDs are also DOI IDs, but not the converse).
Record types There are four types of content records, each associated with a reserved prefix: •
Content ID (10.5240/XXXX-XXXX-XXXX-XXXX-XXXX-C): is associated with an entertainment asset such as a movie or TV series. Content records are hierarchical, allowing relationships to be expressed such as a Series, whose children would be Seasons, whose children in turn would be individual episodes. Many other relationships are supported, as described below. Content records form the bulk of the data in the EIDR registry. •
Party ID (10.5237/XXXX-XXXX): identifies entities such as registrants, content producers, and distributors. •
Video Service ID (10.5239/XXXX-XXXX): Identifies a video service, colloquially known as a "channel" or "network": a (usually) linear sequence of content scheduled to be broadcast at specified times (e.g. the Service ID for the Cartoon Network is 10.5239/8BE5-E3F6). Video services are hierarchical: for example, a parent may have several children to account for regional or language variations). •
User ID (10.5238/[0-9a-zA-Z_.#()]{2-32}): Identifies a user using a string of 2–32 alphanumeric and selected special characters (illustrated here with
Perl syntax). A User is primarily an administrative concept that is subordinate to Parties (from whom they inherit access rights). Unlike the other EIDR DOIs, the User ID can only be used within EIDR (e.g. programming APIs). The sub-prefixes 5237, 5238, 5239, and 5240 are all assigned to the EIDR Association.
Content Records Content records are objects categorized by their types and relationships. Each has three different (orthogonal) kinds of type: •
Object Type: there are a total of 10 of these. First is the Basic Type, which has the minimal fields necessary to describe a content record. The other 9 are derived from the basic type, and contain extra fields for describing more complex objects. •
Structural Type: these distinguish representations of a work and are listed in increasing order of specificity: •
Abstraction: Used for objects having no reality, such as a series container or the most basic concept of the original work. This corresponds to the
International Standard Musical Work Code (ISWC) for musical works, the
International Standard Text Code (ISTC) for textual works, or the
International Standard Audiovisual Number (ISAN) for audiovisual works. •
Performance: Used for items that are particular versions of a work, such as the original theatrical release or director's cut of a film or a locally censored version of a TV show. This roughly corresponds to the
International Standard Recording Code (ISRC) for musical works and to some uses of the Version ISAN (V-ISAN) for audiovisual works. •
Digital: A particular digital representation of a work, such as an
MPEG-2 encoding of a movie. This corresponds to some uses of the V-ISAN. •
Referent Type: the type of the content asset, independent of a particular manifestation (e.g. a movie shown on TV is still a movie): •
Series: An Abstraction that contains ordered or unordered individual items. •
Season: A second level of grouping below a Series, usually covering a time interval •
TV: Content that first appeared via broadcast. •
Movie: Long-form content that first appeared in a cinema or theater. •
Short: Loosely defined to cover a work that is 40 minutes or less, such as
music videos, theatrical
newsreels, or theatrical or DTV cartoon shorts. •
Web: Content that first appeared on the Web. This is different from content from elsewhere that has been made available on the Web. •
Interactive Material: Content that is not strictly audio-visual. It covers DVD menus, interactive TV overlays, customized players, etc. •
Compilation: Content composed of multiple other assets that cannot be more precisely described, such as a box set of a film franchise. •
Supplemental: This type is for secondary content whose primary purpose is to support, augment, or promote other content. Examples include trailers, outtakes, and promotion documentaries ("making of" pieces).
Basic metadata The following fields (taken from a larger set) comprise the base object data of a content record: •
Structural Type: e.g. Abstraction •
Mode: e.g. AudioVisual (for a movie or TV program); "Audio" for a radio program; "Visual" for a silent work. •
Referent Type: e.g. Movie •
Title: the primary title. Titles and Alternate Titles are further distinguished by: •
Lang: the language of the title expressed as
ISO 639-1 code •
Class: release or regional •
Alternate Title 1..N: one or more alternate titles (often regional or language variants) •
Original Language: the language of the original release expressed as ISO 639-1 code •
Associated Org 1..N: Party ID(s) of producer, studio, etc. •
Release Date: date title was originally released •
Country of Origin:
ISO 3166-1 alpha 2 code, with extensions for defunct countries •
Approximate Length: expressed as XML Schema xs:duration datatype •
Alternate ID 1..N: one or more equivalent IDs expressed in a different asset ID system (see discussion below). •
Credits: only skeletal credits are provided, typically restricted to the director and up to four of the main actors. As noted, it is a non-goal for EIDR to compete with proprietary systems with rich metadata (e.g. plot summaries). The main goal is to assist with disambiguating the title, and helping with validation and de-duplication efforts. •
Registrant: the party that created this content record (e.g. "10.5237/superparty") •
Creation Date: date this content record was created •
Status: normally "valid" (there are special cases for deleted records) •
Last Modification Date: last time this content record was changed
Deleted content records An EIDR ID must be always resolvable, thus under normal circumstances the corresponding Content Record will be permanent. There are two mechanisms available to deal with errors or other unusual circumstances. The preferred one is aliasing, whereby an EIDR ID is transparently redirected to another content record. Aliasing is commonly employed to deal with an asset being registered twice. The other mechanism is the use of tombstone records. This is employed when the Content Record is corrupted, or an otherwise invalid asset was accidentally registered. In this case the ID will be aliased to a special tombstone record. The tombstone can be recognized by applications because its EIDR ID field will be set to the distinguished value "
10.5240/0000-0000-0000-0000-0000-X". Note that "X" means the
24th letter of the Latin alphabet (
ASCII 0x58 or
Unicode U+0058).
Alternate ID Having a rich set of alternate IDs for content is one of the primary goals of EIDR. This allows EIDR IDs to be used everywhere in content workflows; if an alternate ID is needed it can be found in the metadata for the EIDR ID. EIDR supports the inclusion both proprietary and other standard (e.g. ISAN) ID references. Additional Alternate IDs can be added when needed (e.g. by parties wanting to support new workflows). Below is an example of alternate IDs for the EIDR asset 10.5240/EA73-79D7-1B2B-B378-3A73-M (the movie
Blade Runner). If an alternate ID is resolvable algorithmically, for example by placing it appropriately in a template URL, EIDR makes that link available. Alternate IDs are partitioned into non-proprietary and proprietary. The former have distinguished, predefined types (e.g. those issued by ISAN,
IMDb, and IVA), whereas proprietary IDs are all of type "Proprietary", and are further distinguished by an associated DNS domain. As of July 2017, there are over 2 million alternate IDs directly available through EIDR.
Relationships between objects Content objects can be related to each other according to the following table. These relations are expressed as additional fields in the content record and are thus relative to that object. Note that the subject object is the child and the target is the parent (e.g. subject isOf parent). Additional constraints are noted in the table. ==Use in standards and applications==