Revision as of 11:56, 7 September 2005 edit 213.115.254.252 (talk) →For processing ← Previous edit		Revision as of 15:42, 4 October 2005 edit undo 193.49.124.107 (talk) Typo Next edit →
Line 7: ==Considerations other than size== ===For processing=== For processing, a format should be easy to search, truncate, and generally process safely. All normal unicode encodings use some form of fixed size code unit. Depending on the format and the code point to be encoded one or more of these code units will represent a Unicode code point. To allow easy searching and truncation a sequence must not ~~occour~~occur within a longer sequence or across the boundary of two other sequences. UTF-8, UTF-16 and UTF-32 have these important properties but UTF-7 and GB18030 do not. Fixed-size characters can be helpful, but it should be remembered that even if there is a fixed width per code point (as in UTF-32), there is not a fixed width per displayed character due to [[combining character]]s. If you are working with a particular [[application programming interface\|API]] heavily and that API has standardised on a particular Unicode encoding it is generally a good idea to use the encoding that the API does to avoid the need to convert before every call to the API. Similarly if you are writing server side software it may simplify matters to use the same format for processing that you are communicating in.

Comparison of Unicode encodings: Difference between revisions