Unicode equivalence: Difference between revisions

Content deleted Content added
m - extra tag
Bender the Bot (talk | contribs)
 
(3 intermediate revisions by 2 users not shown)
Line 16:
 
===Combining and precomposed characters===
For consistency with some older standards, Unicode provides single code points for many characters that could be viewed as modified forms of other characters (such as U+00F1 for "ñ" or U+00C5 for "Å") or as combinations of two or more characters (such as U+FB00 for the ligature "ff" or U+0132 for the [[Dutch alphabet|Dutch letter]] "[[IJ (digraph)|IJij]]")
 
For consistency with other standards, and for greater flexibility, Unicode also provides codes for many elements that are not used on their own, but are meant instead to modify or combine with a preceding [[base character]]. Examples of these [[combining character]]s are the combining tilde{{unichar|0303|cwith=◌|nlink=}} and the [[Japanese script|Japanese]] diacritic [[dakuten]] ("◌゛", U+{{unichar|3099|cwith=◌|use=lang|use2=ja}}).
 
In the context of Unicode, '''character composition''' is the process of replacing the code points of a base letter followed by one or more combining characters into a single [[precomposed character]]; and '''character decomposition''' is the opposite process.
Line 96:
 
==Errors due to normalization differences==
When two applications share Unicode data, but normalize them differently, errors and data loss can result. In one specific instance, [[OS X]] normalized Unicode filenames sent from the [[Netatalk]] and [[Samba (software)|Samba]] file- and printer-sharing software. Netatalk and Samba did not recognize the altered filenames as equivalent to the original, leading to data loss.<ref>{{cite web|url=httphttps://sourceforge.net/tracker/?func=detail&aid=2727174&group_id=8642&atid=108642|title=netatalk / Bugs / #349 volcharset:UTF8 doesn't work from Mac|website=[[SourceForge]]|access-date=20 November 2014}}</ref><ref>{{cite web |url=http://forums.macosxhints.com/archive/index.php/t-99344.html |title=rsync, samba, UTF8, international characters, oh my! |archive-url=https://web.archive.org/web/20100109162824/http://forums.macosxhints.com/archive/index.php/t-99344.html |year=<!--03-01-2009-->2009 |archive-date=January 9, 2010}}</ref> Resolving such an issue is non-trivial, as normalization is not losslessly invertible.
 
==See also==