Although some types of subtags are derived from
ISO or
UN core standards, they do not follow these standards absolutely, as this could lead to the meaning of language tags changing over time. In particular, a subtag derived from a code assigned by
ISO 639,
ISO 15924,
ISO 3166, or
UN M49 remains a valid (though deprecated) subtag even if the code is withdrawn from the corresponding core standard. If the standard later assigns a new meaning to the withdrawn code, the corresponding subtag will still retain its old meaning. This stability was introduced in RFC 4646.
ISO 639-3 and ISO 639-1 RFC 4646 defined the concept of an "extended language subtag" (sometimes referred to as
extlang), although no such subtags were registered at that time.
ISO 639-5 and ISO 639-1/2 ISO 639-5 defines language collections with alpha-3 codes in a different way than they were initially encoded in ISO 639-2 (including one code already present in ISO 639-1, Bihari coded inclusively as
bh in ISO 639-1 and
bih in ISO 639-2). Specifically, the language collections are now all defined in ISO 639-5 as inclusive, rather than some of them being defined exclusively. This means that language collections have a broader scope than before, in some cases where they could encompass languages that were already encoded separately within ISO 639-2. For example, the ISO 639-2 code
afa was previously associated with the name "Afro-Asiatic (Other)", excluding languages such as Arabic that already had their own code. In ISO 639-5, this collection is named "Afro-Asiatic languages" and includes all such languages. ISO 639-2 changed the exclusive names in 2009 to match the inclusive ISO 639-5 names. To avoid breaking implementations that may still depend on the older (exclusive) definition of these collections, ISO 639-5 defines a grouping type attribute for all collections that were already encoded in ISO 639-2 (such grouping type is not defined for the new collections added only in ISO 639-5). BCP 47 defines a "Scope" property to identify subtags for language collections. However, it does not define any given collection as inclusive or exclusive, and does not use the ISO 639-5 grouping type attribute, although the description fields in the Language Subtag Registry for these subtags match the ISO 639-5 (inclusive) names. As a consequence, BCP 47 language tags that include a primary language subtag for a collection may be ambiguous as to whether the collection is intended to be inclusive or exclusive. ISO 639-5 does not define precisely which languages are members of these collections; only the hierarchical classification of collections is defined, using the inclusive definition of these collections. Because of this, RFC 5646 does not recommend the use of subtags for language collections for most applications, although they are still preferred over subtags whose meaning is even less specific, such as "Multiple languages" and "Undetermined". In contrast, the classification of individual languages within their macrolanguage is standardized, in both ISO 639-3 and the Language Subtag Registry.
ISO 15924, ISO/IEC 10646 and Unicode Script subtags were first added to the Language Subtag Registry when RFC 4646 was published, from the list of codes defined in
ISO 15924. They are encoded in the language tag after primary and extended language subtags, but before other types of subtag, including region and variant subtags. Some primary language subtags are defined with a property named "Suppress-Script" which indicates the cases where a single script can usually be assumed by default for the language, even if it can be written with another script. When this is the case, it is preferable to omit the script subtag, to improve the likelihood of successful matching. A different script subtag can still be appended to make the distinction when necessary. For example,
yi is preferred over
yi-Hebr in most contexts, because the Hebrew script subtag is assumed for the
Yiddish language. As another example,
zh-Hans-SG may be considered equivalent to
zh-Hans, because the region code is probably not significant; the written form of Chinese used in Singapore uses the same simplified Chinese characters as in other countries where Chinese is written. However, the script subtag is maintained because it is significant. ISO 15924 includes some codes for script variants (for example,
Hans and
Hant for simplified and traditional forms of Chinese characters) that are unified within
Unicode and
ISO/IEC 10646. These script variants are most often encoded for bibliographic purposes, but are not always significant from a linguistic point of view (for example,
Latf and
Latg script codes for the Fraktur and Gaelic variants of the Latin script, which are mostly encoded with regular Latin letters in Unicode and ISO/IEC 10646). They may occasionally be useful in language tags to expose orthographic or semantic differences, with different analysis of letters, diacritics, and digraphs/trigraphs as default grapheme clusters, or differences in letter casing rules.
ISO 3166-1 and UN M.49 Two-letter region subtags are based on codes assigned, or "exceptionally reserved", in
ISO 3166-1. If the ISO 3166 Maintenance Agency were to reassign a code that had previously been assigned to a different country, the existing BCP 47 subtag corresponding to that code would retain its meaning, and a new region subtag based on
UN M.49 would be registered for the new country. UN M.49 is also the source for numeric region subtags for geographical regions, such as 005 for South America. The UN M.49 codes for economic regions are not allowed. Region subtags are used to specify the variety of a language "as used in" a particular region. They are appropriate when the variety is regional in nature, and can be captured adequately by identifying the countries involved, as when distinguishing
British English (
en-GB) from
American English (
en-US). When the difference is one of script or script variety, as for
simplified versus
traditional Chinese characters, it should be expressed with a script subtag instead of a region subtag; in this example,
zh-Hans and
zh-Hant should be used instead of
zh-CN/zh-SG/zh-MY and
zh-TW/zh-HK/zh-MO. When a distinct language subtag exists for a language that could be considered a regional variety, it is often preferable to use the more specific subtag instead of a language-region combination. For example,
ar-DZ (
Arabic as used in
Algeria) may be better expressed as
arq for
Algerian Spoken Arabic.
Adherence to core standards Disagreements about language identification may extend to BCP 47 and to the core standards that inform it. For example, some speakers of Punjabi believe that the ISO 639-3 distinction between [pan] "Panjabi" and [pnb] "Western Panjabi" is spurious (i.e. they feel the two are
the same language); that sub-varieties of the
Arabic script should be encoded separately in ISO 15924 (as, for example, the
Fraktur and
Gaelic styles of the Latin script are); and that BCP 47 should reflect these views or overrule the core standards with regard to them. BCP 47 delegates this type of judgment to the core standards, and does not attempt to overrule or supersede them. Variant subtags and (theoretically) primary language subtags may be registered individually, but not in a way that contradicts the core standards. == Extensions ==