
Basic introduction
Unicode: A computer character encoding. Also known as "Unicode", "Universal Code" and "Unicode Code"
Unicode has only one character set. The three characters of Chinese, Japanese and Korean occupy the 0x3000 to 0x9FFF part of Unicode. Unicode currently uses UCS-2, which uses two bytes to encode a character. For example, the encoding of the Chinese character "经" is 0x7ECF. Note that character encoding is generally expressed in hexadecimal. In order to distinguish it from decimal, hexadecimal starts with 0x, and 0x7ECF is converted into decimal. It is 32463. UCS-2 uses two bytes to encode characters. Two bytes are 16-bit binary. 2 to the 16th power is equal to 65536, so UCS-2 can encode up to 65536 characters. Characters encoded from 0 to 127 are the same as ASCII-encoded characters. For example, the Unicode encoding of the letter "a" is 0x0061, which is 97 in decimal, and the ASCII encoding of "a" is 0x61, which is also 97 in decimal. As for the encoding of Chinese characters, in fact Unicode does not support Chinese characters very well, and there is no way. There are a total of 60,000 to 70,000 Chinese characters in Simplified and Traditional Chinese, and UCS-2 can represent up to 65,536, which is just over 60,000. Therefore, Unicode can only exclude some barely used Chinese characters. Fortunately, there are only more than 7,000 commonly used Simplified Chinese characters. In order to represent all Chinese characters, Unicode also has the UCS-4 specification, which uses 4 bytes to encode characters.



















Useful
Useful
Useful