Vietnamese alphabet Character encodings There are as many as 46
character encodings for representing the
Vietnamese alphabet.
Unicode has become the most popular form for many of the world's writing systems, due to its great compatibility and software support. Diacritics may be encoded either as
combining characters or as
precomposed characters, which are scattered throughout the
Latin-1 Supplement,
Latin Extended-A,
Latin Extended-B, and
Latin Extended Additional blocks. The
Vietnamese đồng symbol is encoded in the
Currency Symbols block. Unicode's coverage of Vietnamese has been subject to several changes since the 1990s. Early versions of Unicode encoded and as and , respectively. In 2001, these two characters were deprecated as duplicate encodings of and ; this change was incorporated into Unicode 3.2, released in 2002. With the 2009 release of Unicode 5.2, and were undeprecated but discouraged. Historically, the Vietnamese language used other characters beyond the modern alphabet. The
Middle Vietnamese letter
B with flourish (ꞗ) is included in the
Latin Extended-D block. The
apex is not separately encoded in Unicode, because it derives from the Portuguese
tilde, whereas , which derives from the Greek
perispomeni, has always been misencoded as a tilde. As a workaround, represents the apex on
Wikisource and
Wiktionary. For systems that lack support for Unicode, dozens of 8-bit Vietnamese
code pages have been designed. Where
ASCII is required, such as when ensuring readability in plain text e-mail, Vietnamese letters are often encoded according to
Vietnamese Quoted-Readable (VIQR) or
VSCII Mnemonic (VSCII-MNEM), though usage of either variable-width scheme has declined dramatically following the adoption of Unicode on the
World Wide Web. For instance, support for all above mentioned 8-bit encodings, with the exception of Windows-1258, was dropped from
Mozilla software in 2014. Many Vietnamese fonts intended for
desktop publishing are encoded in
VNI or TCVN3 (
VSCII). Popular
web browsers lack support for specialty Vietnamese encodings, so any webpage that uses these fonts appears as unintelligible
mojibake on systems without them installed. Vietnamese often stacks diacritics, so typeface designers must take care to prevent stacked diacritics from colliding with adjacent letters or lines. When a tone mark is used together with another diacritic, offsetting the tone mark to the right preserves consistency and avoids slowing down
saccades. In advertising signage and in
cursive handwriting, diacritics often take forms unfamiliar to other Latin alphabets. For example, the lowercase letter I retains its
tittle in
ì,
ỉ,
ĩ, and
í. These nuances are rarely accounted for in computing environments.
Approaches to character encoding Vietnamese writing requires 134 additional letters (between both cases) besides the 52 already present in ASCII. (as in
VNI for DOS). • Drop the uppercase letters which are least frequently used, • Drop forms of the letter Y with tone marks, necessitating
use of the letter in those circumstances. This approach was rejected by the designers of
VISCII on the basis that a character encoding should not attempt to settle a spelling reform issue. Unicode includes over 10,000 '''' characters as part of Unicode's repertoire of
CJK Unified Ideographs. Of these characters, 10,082 can be found in the
CJK Unified Ideographs Extension B block, while the rest are distributed between the
CJK Unified Ideographs,
CJK Unified Ideographs Extension A, and
CJK Unified Ideographs Extension C blocks. A further 1,028 characters, including over 400 characters specific to the
Tày language, are encoded in the
CJK Unified Ideographs Extension E block. The characters are taken from the Vietnamese standards
TCVN 5773:1993 and
TCVN 6909:2001 [error for TCVN 6056:1995?], as well as from research by the Han-Nom Research Institute and other groups. All the characters in TCVN 5773:1993 and about 95% of the characters in TCVN 6909:2001 [error for TCVN 6056:1995?] have corresponding codepoints in Unicode 5.1, though TCVN 5773:1993 itself mapped most of its characters to the
Private Use Area of Unicode. Unicode 13.0 added two diacritical characters to the
Ideographic Symbols and Punctuation block that were commonly used to indicate borrowed characters in . The two most comprehensive '''' fonts are the
Vietnamese Nôm Preservation Foundation's
Light and the community-developed
HAN NOM A/
HAN NOM B, both of which place a large number of unstandardized characters in the
Private Use Areas. The Unicode Consortium's
Unihan database includes Vietnamese readings of some characters but does not distinguish between
Sino-Vietnamese and '''' readings. Like other
CJKV writing systems, '''' is traditionally
written vertically, from top to bottom and right to left. and may also be annotated using
ruby characters, which is the same as
chữ Quốc Ngữ for Vietnamese. ==Text input==