Content deleted Content added
spelling corrections |
who in the hell wrote this thing? |
||
Line 6:
==Considerations other than size==
===For processing===
For processing a format should be easy to search, truncate, and generally process safely. All normal unicode encodings use some form of fixed size code unit. Depending on the format and the code point to be encoded one or more of these code units will represent a
Fixed-size characters can be helpful, but it should be remembered that even if there is a fixed width per code point (as in UTF-32), there is not a fixed width per displayed character due to [[combining character]]s. If you are working with a particular [[application programming interface|API]] heavily and that API has standardised on a particular Unicode encoding it is generally a good idea to use the encoding that the API does to avoid the need to convert before every call to the API.
UTF-16 is popular because many APIs date to the time when Unicode was 16-bit fixed width. Unfortunately using UTF-16 makes characters outside the BMP a special case which increases the risk of oversights related to their handling.
|