CJK Unified Ideographs The basic block named
CJK Unified Ideographs (4E00–9FFF) contains 20,992 basic
Chinese characters in the range U+4E00 through U+9FFF. The block not only includes characters used in the
Chinese writing system but also
kanji used in the
Japanese writing system,
hanja in
Korea, and chữ Nôm characters in Vietnamese. Many characters in this block are used in all three
writing systems, while others are in only one or two of the three. This block is also known as the
Unified Repertoire and Ordering (
URO), especially when it needs to be differentiated from the other CJK Unified Ideographs blocks. The first 20,902 characters in the block are arranged according to the
Kangxi Dictionary ordering of
radicals. In this system the characters written with the fewest strokes are listed first. The remaining characters were added later, and so are not in radical order. The block is the result of
Han unification, which was somewhat controversial within East Asia. Since single characters used in more than one of Chinese, Japanese and Korean were coded in the same location, and the modern typographical conventions and handwriting curricula differ slightly between regions (not necessarily along language boundaries—for example,
Hong Kong and
Taiwan, which both use
Traditional Chinese, have slightly different local conventions), the appearance of a selected glyph could depend on the particular font being used. However, the URO applies the
source separation rule, meaning that pairs of characters treated as distinct in a character set used as a source for the URO (e.g.
JIS X 0208 as used in e.g.
Shift JIS) would remain pairs of separate characters in the new Unicode encoding. Using
variation selectors, it is possible to specify certain variant CJK ideograms within Unicode. The Adobe-Japan1
character set, which has 14,684
ideographic variation sequences, is an extreme example of the use of variation selectors.
Charts 4E00–62FF,
6300–77FF,
7800–8CFF,
8D00–9FFF.
Sources Note: Most characters appear in multiple sources, so the sum of individual character counts (108,493) is far greater than the number of encoded characters (20,992). In Unicode 4.1, 14
HKSCS-2004 characters and 8
GB 18030 characters were assigned to between U+9FA6 and U+9FBB code points. Since then, other additions were added to this block for various reasons, all summarized in the
version history section below.
CJK Unified Ideographs Extension A The block named
CJK Unified Ideographs Extension A (3400–4DBF) contains 6,592 additional characters in the range U+3400 through U+4DBF.
Charts 3400–4DBF.
Sources Note: Most characters appear in more than one source, so the sum of individual character counts (23,997) is far greater than the number of encoded characters (6,592).
Charts 30000–3134F.
Sources Note: Some characters appear in more than one source, so the sum of individual character counts (5,239) is greater than the number of encoded characters (4,939).
Charts 31350–323AF.
Sources Note: Some characters appear in more than one source, so the sum of individual character counts (4,541) is greater than the number of encoded characters (4,192).
Charts 2EBF0–2EE5F.
Sources Note: Some characters appear in more than one source, making the sum of individual character counts (625) more than the number of encoded characters (622).
CJK Unified Ideographs Extension J A block named
CJK Unified Ideographs Extension J was added as part of Unicode to the
Tertiary Ideographic Plane in the range U+323B0-U+33479, containing 4,298 characters.
Charts 323B0–3347F.
Sources Note: Some characters appear in more than one source, making the sum of individual character counts (4,406) more than the number of encoded characters (4,298).
CJK Compatibility Ideographs The block named
CJK Compatibility Ideographs (F900–FAFF) was created to retain round-trip compatibility with other standards. However, twelve characters in this block actually have the "Unified Ideograph" property: U+FA0E 﨎, U+FA0F 﨏, U+FA11 﨑, U+FA13 﨓, U+FA14 﨔, U+FA1F 﨟, U+FA21 﨡, U+FA23 﨣, U+FA24 﨤, U+FA27 﨧, U+FA28 﨨, and U+FA29 﨩. None of the other characters in this and other "Compatibility" blocks relate to CJK unification. While 龜 and 亀 are not considered unifiable, is considered a duplicate to .
Charts F900–FAFF.
Sources Note: All characters appear in more than one source, so the sum of individual character counts (40) is greater than the number of encoded characters (12). ==Known issues==