Revision as of 14:56, 31 August 2005 edit 206.139.208.13 (talk) spelling corrections ← Previous edit		Revision as of 15:00, 31 August 2005 edit undo 206.139.208.13 (talk) who in the hell wrote this thing? Next edit →
Line 6: ==Considerations other than size== ===For processing=== For processing a format should be easy to search, truncate, and generally process safely. All normal unicode encodings use some form of fixed size code unit. Depending on the format and the code point to be encoded one or more of these code units will represent a ~~unicode~~Unicode code point. To allow easy searching and truncation a sequence must not occour within a longer sequence or across the boundary of two other sequences. UTF-8,UTF-16 and UTF-32 have these important properties but UTF-7 and GB18030 do not. Fixed-size characters can be helpful, but it should be remembered that even if there is a fixed width per code point (as in UTF-32), there is not a fixed width per displayed character due to [[combining character]]s. If you are working with a particular [[application programming interface\|API]] heavily and that API has standardised on a particular Unicode encoding it is generally a good idea to use the encoding that the API does to avoid the need to convert before every call to the API. ~~Simerally~~Similarly if you are writing a network daemon it may simplify matters to use the same format for processing that you are communicating in. UTF-16 is popular because many APIs date to the time when Unicode was 16-bit fixed width. Unfortunately using UTF-16 makes characters outside the BMP a special case which increases the risk of oversights related to their handling.

Comparison of Unicode encodings: Difference between revisions