Basic introduction
Unicode: A computer character encoding. Also known as "Unicode", "Universal Code" and "Unicode Code"
Unicode has only one character set. The three characters of Chinese, Japanese and Korean occupy part of Unicode from 0x3000 to 0x9FFF. Unicode is currently commonly used UCS-2, which uses two bytes to encode one character, such as the Chinese character "经" The encoding of A byte is a 16-bit binary, and 2 to the 16th power is equal to 65536, so UCS-2 can encode up to 65536 characters. Characters encoded from 0 to 127 are the same as ASCII-encoded characters. For example, the Unicode encoding of the letter "a" is 0x0061, which is 97 in decimal, and the ASCII encoding of "a" is 0x61, which is also 97 in decimal. For the encoding of Chinese characters, in fact Unicode doesn't support Chinese characters very well, and there's nothing we can do about it. There are a total of 60,000 to 70,000 Chinese characters in simplified and traditional Chinese, and UCS-2 can represent up to 65,536, which is just over 60,000, so Unicode can only exclude some that are rarely used. Chinese characters, fortunately, there are only more than 7,000 commonly used simplified Chinese characters. In order to represent all Chinese characters, Unicode also has the UCS-4 specification, which uses 4 bytes to encode characters.
it works
it works
it works