Unicode equivalence: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 02:48, 6 November 2024 edit Gamapamani (talk \| contribs) Extended confirmed users 15,740 edits m - extra tag ← Previous edit		Latest revision as of 04:01, 11 August 2025 edit undo Bender the Bot (talk \| contribs) Bots 1,064,377 edits m →Errors due to normalization differences: HTTP to HTTPS for SourceForge Tag: AWB
(3 intermediate revisions by 2 users not shown)
Line 16: ===Combining and precomposed characters=== For consistency with some older standards, Unicode provides single code points for many characters that could be viewed as modified forms of other characters (such as U+00F1 for "ñ" or U+00C5 for "Å") or as combinations of two or more characters (such as U+FB00 for the ligature "ﬀ" or U+0132 for the [[Dutch alphabet\|Dutch letter]] "[[IJ (digraph)\|IJĳ]]") For consistency with other standards, and for greater flexibility, Unicode also provides codes for many elements that are not used on their own, but are meant instead to modify or combine with a preceding [[base character]]. Examples of these [[combining character]]s are ~~the combining tilde~~{{unichar\|0303\|cwith=◌\|nlink=}} and the [[Japanese script\|Japanese]] diacritic [[dakuten]] (~~"◌゛", U+~~{{unichar\|3099\|cwith=◌\|use=lang\|use2=ja}}). In the context of Unicode, '''character composition''' is the process of replacing the code points of a base letter followed by one or more combining characters into a single [[precomposed character]]; and '''character decomposition''' is the opposite process. Line 96: ==Errors due to normalization differences== When two applications share Unicode data, but normalize them differently, errors and data loss can result. In one specific instance, [[OS X]] normalized Unicode filenames sent from the [[Netatalk]] and [[Samba (software)\|Samba]] file- and printer-sharing software. Netatalk and Samba did not recognize the altered filenames as equivalent to the original, leading to data loss.<ref>{{cite web\|url=~~http~~https://sourceforge.net/tracker/?func=detail&aid=2727174&group_id=8642&atid=108642\|title=netatalk / Bugs / #349 volcharset:UTF8 doesn't work from Mac\|website=[[SourceForge]]\|access-date=20 November 2014}}</ref><ref>{{cite web \|url=http://forums.macosxhints.com/archive/index.php/t-99344.html \|title=rsync, samba, UTF8, international characters, oh my! \|archive-url=https://web.archive.org/web/20100109162824/http://forums.macosxhints.com/archive/index.php/t-99344.html \|year=<!--03-01-2009-->2009 \|archive-date=January 9, 2010}}</ref> Resolving such an issue is non-trivial, as normalization is not losslessly invertible. ==See also==