Vietnamese language and computers

Vietnamese alphabet Character encodings There are as many as 46 character encodings for representing the Vietnamese alphabet. Unicode has become the most popular form for many of the world's writing systems, due to its great compatibility and software support. Diacritics may be encoded either as combining characters or as precomposed characters, which are scattered throughout the Latin-1 Supplement, Latin Extended-A, Latin Extended-B, and Latin Extended Additional blocks. The Vietnamese đồng symbol is encoded in the Currency Symbols block. Unicode's coverage of Vietnamese has been subject to several changes since the 1990s. Early versions of Unicode encoded and as and , respectively. In 2001, these two characters were deprecated as duplicate encodings of and ; this change was incorporated into Unicode 3.2, released in 2002. With the 2009 release of Unicode 5.2, and were undeprecated but discouraged. Historically, the Vietnamese language used other characters beyond the modern alphabet. The Middle Vietnamese letter B with flourish (ꞗ) is included in the Latin Extended-D block. The apex is not separately encoded in Unicode, because it derives from the Portuguese tilde, whereas , which derives from the Greek perispomeni, has always been misencoded as a tilde. As a workaround, represents the apex on Wikisource and Wiktionary. For systems that lack support for Unicode, dozens of 8-bit Vietnamese code pages have been designed. Where ASCII is required, such as when ensuring readability in plain text e-mail, Vietnamese letters are often encoded according to Vietnamese Quoted-Readable (VIQR) or VSCII Mnemonic (VSCII-MNEM), though usage of either variable-width scheme has declined dramatically following the adoption of Unicode on the World Wide Web. For instance, support for all above mentioned 8-bit encodings, with the exception of Windows-1258, was dropped from Mozilla software in 2014. Many Vietnamese fonts intended for desktop publishing are encoded in VNI or TCVN3 (VSCII). Popular web browsers lack support for specialty Vietnamese encodings, so any webpage that uses these fonts appears as unintelligible mojibake on systems without them installed. Vietnamese often stacks diacritics, so typeface designers must take care to prevent stacked diacritics from colliding with adjacent letters or lines. When a tone mark is used together with another diacritic, offsetting the tone mark to the right preserves consistency and avoids slowing down saccades. In advertising signage and in cursive handwriting, diacritics often take forms unfamiliar to other Latin alphabets. For example, the lowercase letter I retains its tittle in ì, ỉ, ĩ, and í. These nuances are rarely accounted for in computing environments. Approaches to character encoding Vietnamese writing requires 134 additional letters (between both cases) besides the 52 already present in ASCII. (as in VNI for DOS). • Drop the uppercase letters which are least frequently used, • Drop forms of the letter Y with tone marks, necessitating use of the letter in those circumstances. This approach was rejected by the designers of VISCII on the basis that a character encoding should not attempt to settle a spelling reform issue. Unicode includes over 10,000 '''' characters as part of Unicode's repertoire of CJK Unified Ideographs. Of these characters, 10,082 can be found in the CJK Unified Ideographs Extension B block, while the rest are distributed between the CJK Unified Ideographs, CJK Unified Ideographs Extension A, and CJK Unified Ideographs Extension C blocks. A further 1,028 characters, including over 400 characters specific to the Tày language, are encoded in the CJK Unified Ideographs Extension E block. The characters are taken from the Vietnamese standards TCVN 5773:1993 and TCVN 6909:2001 [error for TCVN 6056:1995?], as well as from research by the Han-Nom Research Institute and other groups. All the characters in TCVN 5773:1993 and about 95% of the characters in TCVN 6909:2001 [error for TCVN 6056:1995?] have corresponding codepoints in Unicode 5.1, though TCVN 5773:1993 itself mapped most of its characters to the Private Use Area of Unicode. Unicode 13.0 added two diacritical characters to the Ideographic Symbols and Punctuation block that were commonly used to indicate borrowed characters in . The two most comprehensive '''' fonts are the Vietnamese Nôm Preservation Foundation's Light and the community-developed HAN NOM A/HAN NOM B, both of which place a large number of unstandardized characters in the Private Use Areas. The Unicode Consortium's Unihan database includes Vietnamese readings of some characters but does not distinguish between Sino-Vietnamese and '''' readings. Like other CJKV writing systems, '''' is traditionally written vertically, from top to bottom and right to left. and may also be annotated using ruby characters, which is the same as chữ Quốc Ngữ for Vietnamese. ==Text input==