Comparison of Unicode encodings: Difference between revisions

Content deleted Content added
Typo
Line 7:
==Considerations other than size==
===For processing===
For processing, a format should be easy to search, truncate, and generally process safely. All normal unicode encodings use some form of fixed size code unit. Depending on the format and the code point to be encoded one or more of these code units will represent a Unicode code point. To allow easy searching and truncation a sequence must not occouroccur within a longer sequence or across the boundary of two other sequences. UTF-8, UTF-16 and UTF-32 have these important properties but UTF-7 and GB18030 do not.
 
Fixed-size characters can be helpful, but it should be remembered that even if there is a fixed width per code point (as in UTF-32), there is not a fixed width per displayed character due to [[combining character]]s. If you are working with a particular [[application programming interface|API]] heavily and that API has standardised on a particular Unicode encoding it is generally a good idea to use the encoding that the API does to avoid the need to convert before every call to the API. Similarly if you are writing server side software it may simplify matters to use the same format for processing that you are communicating in.