Unicode equivalence: Difference between revisions

Content deleted Content added
Combining and precomposed characters: Capital form of ij is rare, changed to commonly used lowercase form. While here, changed handcrafted to template
Tags: Mobile edit Mobile web edit Advanced mobile edit
Bender the Bot (talk | contribs)
 
(One intermediate revision by one other user not shown)
Line 18:
For consistency with some older standards, Unicode provides single code points for many characters that could be viewed as modified forms of other characters (such as U+00F1 for "ñ" or U+00C5 for "Å") or as combinations of two or more characters (such as U+FB00 for the ligature "ff" or U+0132 for the [[Dutch alphabet|Dutch letter]] "[[IJ (digraph)|ij]]")
 
For consistency with other standards, and for greater flexibility, Unicode also provides codes for many elements that are not used on their own, but are meant instead to modify or combine with a preceding [[base character]]. Examples of these [[combining character]]s are {{unichar|0303|cwith=◌|nlink=}} and the [[Japanese script|Japanese]] diacritic [[dakuten]] ({{unichar|3099|cwith=◌|use=lang|use2=ja}}).
 
In the context of Unicode, '''character composition''' is the process of replacing the code points of a base letter followed by one or more combining characters into a single [[precomposed character]]; and '''character decomposition''' is the opposite process.
Line 96:
 
==Errors due to normalization differences==
When two applications share Unicode data, but normalize them differently, errors and data loss can result. In one specific instance, [[OS X]] normalized Unicode filenames sent from the [[Netatalk]] and [[Samba (software)|Samba]] file- and printer-sharing software. Netatalk and Samba did not recognize the altered filenames as equivalent to the original, leading to data loss.<ref>{{cite web|url=httphttps://sourceforge.net/tracker/?func=detail&aid=2727174&group_id=8642&atid=108642|title=netatalk / Bugs / #349 volcharset:UTF8 doesn't work from Mac|website=[[SourceForge]]|access-date=20 November 2014}}</ref><ref>{{cite web |url=http://forums.macosxhints.com/archive/index.php/t-99344.html |title=rsync, samba, UTF8, international characters, oh my! |archive-url=https://web.archive.org/web/20100109162824/http://forums.macosxhints.com/archive/index.php/t-99344.html |year=<!--03-01-2009-->2009 |archive-date=January 9, 2010}}</ref> Resolving such an issue is non-trivial, as normalization is not losslessly invertible.
 
==See also==