Notation and nomenclature ISO/IEC 2022 coding specifies a two-layer mapping between character codes and displayed characters.
Escape sequences allow any of a large registry of graphic character sets to be "designated" into one of four working sets, named G0 through G3, and shorter control sequences specify the working set that is "invoked" to interpret bytes in the stream. Encoding byte values ("bit combinations") are often given in
column-line notation, where two decimal numbers in the range 00–15 (each corresponding to a single hexadecimal digit) are separated by a slash. Hence, for instance, codes 2/0 (0x20) through 2/15 (0x2F) inclusive may be referred to as "column 02". This is the notation used in the ISO/IEC 2022 / ECMA-35 standard itself. They may be described elsewhere using
hexadecimal, as is often used in this article, or using the corresponding ASCII characters, although the escape sequences are actually defined in terms of byte values, and the graphic assigned to that byte value may be altered without affecting the control sequence. Byte values from the 7-bit ASCII graphic range (hexadecimal 0x20–0x7F), being on the left side of a character code table, are referred to as "GL" codes
(with "GL" standing for "graphics left") while bytes from the "high ASCII" range (0xA0–0xFF), if available (i.e. in an 8-bit environment), are referred to as the "GR" codes
("graphics right"). The terms "CL" (0x00–0x1F) and "CR" (0x80–0x9F) are defined for the control ranges, but the CL range always invokes the primary (C0) controls, whereas the CR range always either invokes the secondary (C1) controls or is unused. and are always available when G0 is invoked over GL, irrespective of what character sets are designated. They may not be included in graphical character sets, although other sizes or types of
whitespace character may be.
General syntax of escape sequences Sequences using the ESC (escape) character take the form ESC [...] , where the ESC character is followed by zero or more intermediate bytes () from the range 0x20–0x2F, and one final byte () from the range 0x30–0x7E. The first byte, or absence thereof, determines the type of escape sequence; it might, for instance, designate a working set, or denote a single control function. In all types of escape sequences, bytes in the range 0x30–0x3F are reserved for unregistered private uses defined by prior agreement between parties. Control functions from some sets may make use of further bytes following the escape sequence proper. For example, the
ISO 6429 control function "", which can be represented using an escape sequence, is followed by zero or more bytes in the range 0x30–0x3F, then zero or more bytes in the range 0x20–0x2F, then by a single byte in the range 0x40–0x7E, the entire sequence being called a "control sequence".
Graphical character sets Each of the four working sets G0 through G3 may be a 94-character set or a 94n-character
multi-byte set. Additionally, G1 through G3 may be a 96- or 96n-character set. In a 96- or 96n-character set, the bytes 0x20 through 0x7F when GL-invoked, or 0xA0 through 0xFF when GR-invoked, are allocated to and may be used by the set. In a 94- or 94n-character set, the bytes 0x20 and 0x7F are not used. the box drawing set from
ISO/IEC 10367, and ISO-IR-164 (a subset of the G1 set of
ISO-8859-8 with only the letters, used by
CCITT).
Combining characters Characters are expected to be spacing characters, not combining characters, unless specified otherwise by the graphical set in question. ISO 2022 / ECMA-35 also recognizes the use of the
backspace and carriage return control characters as means of combining otherwise spacing characters, as well as the
CSI sequence "Graphic Character Combination" (GCC) Use of the backspace and carriage return in this manner is permitted by
ISO/IEC 646 but prohibited by
ISO/IEC 4873 / ECMA-43 and by
ISO/IEC 8859, on the basis that it leaves the graphical character repertoire undefined. ISO/IEC 4873 / ECMA-43 does, however, permit the use of the GCC function provided that the sequence of characters is kept the same and merely displayed in one space, rather than being over-stamped to form a character with a different meaning.
Control character sets Control character sets are classified as "primary" or "secondary" control code sets, respectively also called "C0" and "C1" control code sets. A C0 control set must contain the ESC (escape) control character at 0x1B (a C0 set containing only ESC is registered as ISO-IR-104), whereas a C1 control set may not contain the escape control whatsoever. Hence, they are entirely separate registrations, with a C0 set being only a C0 set and a C1 set being only a C1 set. or inclusion of any of those ten in the C1 set, is also prohibited by the ISO/IEC 2022 / ECMA-35 standard. whereas a C1 control function may be invoked over the CR range 0x80 through 0x9F (in an 8-bit environment) or by using escape sequences (in a 7-bit or 8-bit environment), For example, ISO/IEC 4873 specifies CR bytes for the C1 controls which it uses (SS2 and SS3). If necessary, which invocation is used may be communicated using
announcer sequences. In the latter case, single control functions from the C1 control code set are invoked using "type Fe" escape sequences,
Other control functions Additional control functions are assigned to "type Fs" escape sequences (in the range ESC 0x60 (`) through ESC 0x7E (~)); these have permanently assigned meanings rather than depending on the C0 or C1 designations. Registration of control functions to type "Fs" sequences must be approved by
ISO/IEC JTC 1/SC 2. although no "3Ft" sequences are currently assigned (as of 2019). Some of these are specified in ECMA-35 (ISO 2022 / ANSI X3.41), others in ECMA-48 (ISO 6429 / ANSI X3.64). ECMA-48 refers to these as "independent control functions". Escape sequences of type "Fp" (ESC 0x30 (0) through ESC 0x3F (?)) or of type "3Fp" (ESC 0x23 (#) [...] 0x30 (0) through ESC 0x23 (#) [...] 0x3F (?)) are reserved for single private use control codes, by prior agreement between parties. Several such sequences of both types are used by
DEC terminals such as the
VT100, and are thus supported by
terminal emulators. An 8-bit code may have GR codes specifying G1 characters, i.e. with its corresponding 7-bit code using
Shift In and
Shift Out to switch between the sets (e.g.
JIS X 0201), although some instead have GR codes specifying G2 characters, with the corresponding 7-bit code using a single-shift code to access the second set (e.g.
T.51). The codes shown in the table below are the most common encodings of these control codes, conforming to
ISO/IEC 6429. The LS2, LS3, LS1R, LS2R and LS3R shifts are registered as single control functions and are always encoded as the escape sequences listed below, This coding is currently recommended by ISO/IEC 2022 / ECMA-35 for applications requiring 7-bit single-byte representations of SS2 and SS3, and may also be used for SS2 only, although older code sets with SS2 at 0x1C also exist, and were mentioned as such in an earlier edition of the standard. The 0x8E and 0x8F coding of the single shifts as shown below is mandatory for
ISO/IEC 4873 levels 2 and 3. Although officially considered shift codes and named accordingly, single-shift codes are not always viewed as shifts, For instance,
ISO/IEC 4873 specifies GL, whereas
packed EUC specifies GR. In 7-bit environments, only GL is used as the single-shift area. If necessary, which single-shift area is used may be communicated using
announcer sequences. The names "locking shift zero" (LS0) and "locking shift one" (LS1) refer to the same pair of C0 control characters (0x0F and 0x0E) as the names "shift in" (SI) and "shift out" (SO). However, the standard refers to them as LS0 and LS1 when they are used in 8-bit environments and as SI and SO when they are used in 7-bit environments.
Registration of graphical and control code sets The
ISO International register of coded character sets to be used with escape sequences (ISO-IR) lists graphical character sets, control code sets, single control codes and so forth which have been registered for use with ISO/IEC 2022. The procedure for registering codes and sets with the ISO-IR registry is specified by
ISO/IEC 2375. Each registration receives a unique escape sequence, and a unique registry entry number to identify it. For example, the
CCITT character set for
Simplified Chinese is known as
ISO-IR-165. Registration of coded character sets with the ISO-IR registry identifies the documents specifying the character set or control function associated with an ISO/IEC 2022 non‑private-use escape sequence. This may be a standard document; however, registration does not create a new ISO standard, does not commit the ISO or IEC to adopt it as an international standard, and does not commit the ISO or IEC to add any of its characters to the
Universal Coded Character Set. ISO-IR registered escape sequences are also used encapsulated in a
Formal Public Identifier to identify character sets used for numeric character references in
SGML (ISO 8879). For example, the string can be used to identify the International Reference Version of
ISO 646-1983, and the
HTML 4.01 specification uses to identify Unicode. The textual representation of the escape sequence, included in the third element of the FPI, will be recognised by SGML implementations for supported character sets. At the other extreme, no multibyte 96-sets have been registered, so the sequences below are strictly theoretical. As with other escape sequence types, the range 0x30–0x3F is reserved for private-use bytes, or
MARC-8, However, in a graphical set designation sequence, if the second byte (for a single-byte set) or the third byte (for a double-byte set) is 0x20 (space), the set denoted is a "
dynamically redefinable character set" (DRCS) defined by prior agreement, which is also considered private use. The manner in which DRCS sets and associated fonts are transmitted, allocated and managed is not stipulated by ISO/IEC 2022 / ECMA-35 itself, although it recommends allocating them sequentially starting with byte 0x40 (@); however, a manner for transmitting DRCS fonts is defined within some telecommunication protocols such as
World System Teletext. There are also three special cases for multi-byte codes. The code sequences ESC $ @, ESC $ A, and ESC $ B were all registered when the contemporary version of the standard allowed multi-byte sets only in G0, so must be accepted in place of the sequences ESC $ ( @ through ESC $ ( B to designate to the G0 character set. There are additional (rarely used) features for switching control character sets, but this is a single-level lookup, in that (as noted above) the C0 set is always invoked over CL, and the C1 set is always invoked over CR or by using escape codes. As noted above, it is required that any C0 character set include the ESC character at position 0x1B, so that further changes are possible. The control set designation sequences (as opposed to the graphical set ones) may also be used from within
ISO/IEC 10646 (UCS/Unicode), in contexts where processing
ANSI escape codes is appropriate, provided that each byte in the sequence is padded to the code unit size of the encoding. A table of escape sequence bytes and the designation or other function which they perform is below. Note that the registry of bytes is independent for the different types. The 94-character graphic set designated by ESC ( A through ESC + A is not related in any way to the 96-character set designated by ESC - A through ESC / A. And neither of those is related to the 94n-character set designated by ESC $ ( A through ESC $ + A, and so on; the final bytes must be interpreted in context. (Indeed, without any intermediate bytes, ESC A is a way of specifying the C1 control code 0x81.) Also note that C0 and C1 control character sets are independent; the C0 control character set designated by ESC ! A (which happens to be the NATS control set for newspaper text transmission) is not the same as the C1 control character set designated by ESC " A (the
CCITT attribute control set for
Videotex).
Interaction with other coding systems The standard also defines a way to specify coding systems that do not follow its own structure. A sequence is also defined for returning to ISO/IEC 2022; the registrations which support this sequence as encoded in ISO/IEC 2022 comprise (as of 2019) various
Videotex formats,
UTF-8, and
UTF-1. Of the sequences switching to UTF-8, ESC % G is the one supported by, for example,
xterm. Although use of a variant of the standard return sequence from UTF-16 and UTF-32 is permitted, the bytes of the escape sequence must be padded to the size of the code unit of the encoding (i.e. 001B 0025 0040 for UTF-16), i.e. the coding of the standard return sequence does not conform exactly to ISO/IEC 2022. For this reason, the designations for UTF-16 and UTF-32 use a without-standard-return syntax. For specifying encodings by labels, the
X Consortium's
Compound Text format defines five private-use DOCS sequences.
Code structure announcements The sequence "announce code structure" (ESC SP (0x20) ) is used to
announce a specific code structure, or a specific group of ISO 2022 facilities which are used in a particular code version. Although announcements can be combined, certain contradictory combinations (specifically, using locking shift announcements 16–23 with announcements 1, 3 and 4) are prohibited by the standard, as is using additional announcements on top of
ISO/IEC 4873 level announcements 12–14 (which fully specify the permissible structural features). Announcement sequences are as follows: ==ISO/IEC 2022 code versions==