Revision as of 00:33, 30 August 2005 edit Mordemur (talk \| contribs) 681 edits m wikify ← Previous edit		Revision as of 01:17, 30 August 2005 edit undo Plugwash (talk \| contribs) Extended confirmed users 9,427 edits →Considerations other than size Next edit →
Line 6: ==Considerations other than size== ===For processing=== For processing a format should be easy to search, truncate, and generally process safely. ~~Fixed-size~~All ~~characters~~normal ~~can~~unicode beencodings ~~helpful,~~use ~~but~~some itform ~~should~~of befixed ~~remembered~~size ~~that~~code ~~even~~unit. ifDepending ~~there~~on isthe aformat ~~fixed~~and ~~width per~~the code point ~~(as~~to inbe ~~UTF-32),~~encoded ~~there~~one isor ~~not~~more aof ~~fixed~~these ~~width~~code ~~per~~units ~~displayed~~will ~~character~~represent ~~due~~a tounicode ~~[[combining~~code ~~character]]s~~point. ~~Also~~To ifallow ~~you~~easy ~~are working with a particular [[application programming interface\|API]] heavily~~searching and ~~that API has standardised on~~truncation a ~~particular~~sequence ~~Unicode~~must ~~encoding it~~not isoccour ~~generally~~within a ~~good~~longer ~~idea~~sequence toor ~~use~~accross the ~~encoding~~boundry ~~that~~of ~~the~~two ~~API does to avoid the need to convert before every call to the~~other ~~API~~sequences. UTF-8,UTF-16 isand ~~popular~~UTF-32 ~~because~~have ~~many~~theese ~~APIs~~important ~~date~~properties ~~to the time when Unicode was 16-bit fixed width. Unfortunately using~~but UTF-167 ~~makes~~and ~~characters~~GB18030 ~~outside~~do ~~the~~not. ~~BMP a special case which increases the risk of oversights related to their handling.~~ Fixed-size characters can be helpful, but it should be remembered that even if there is a fixed width per code point (as in UTF-32), there is not a fixed width per displayed character due to [[combining character]]s. If you are working with a particular [[application programming interface\|API]] heavily and that API has standardised on a particular Unicode encoding it is generally a good idea to use the encoding that the API does to avoid the need to convert before every call to the API. Simerally if you are writing a network daemon it may simplify matters to use the same format for processing that you are communicating in. UTF-16 is popular because many APIs date to the time when Unicode was 16-bit fixed width. Unfortunately using UTF-16 makes characters outside the BMP a special case which increases the risk of oversights related to their handling. ===For communication=== Some protocols may be limited to a specific set of encodings, but even when they are not some encodings may offer better compatibility than others with existing implementations. Also the cost of converting between your processing format and your communication format should be considered both in terms of program size (e.g. GB18030 requires a huge mapping table) and run-time requirements. ~~It may simplify matters to use the same format for processing that you are communicating in, especially for servers.~~ ==In detail==

Comparison of Unicode encodings: Difference between revisions