Unicode collation algorithm: Difference between revisions

Content deleted Content added
mostly corrected the basic description ... the UCA is NOT an algorithm for comparing strings, but rather for producing keys that can be byte-wise compared in order to sort the original strings. (Still lots more work to be done on this article.)
Adding local short description: "String collation algorithm", overriding Wikidata description "algorithm"
 
(15 intermediate revisions by 15 users not shown)
Line 1:
{{Short description|String collation algorithm}}
{{no footnotes|date=September 2016}}
__NOTOC__
The '''Unicode collation algorithm''' ('''UCA''') is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from [[String (computer science)|strings]] representing text in any [[writing system]] and [[language]] that can be represented with [[Unicode]]. These keys can then be efficiently compared byte- by-byte comparedbyte in order to [[collate]] or sort them according to the rules of the language, with options for ignoring case, accents, etc.<ref name=":0">{{Cite web |last1=Whistler |first1=Ken |last2=Scherer |first2=Markus |last3=Davis |first3=Mark |author-link3=Mark Davis (Unicode) |date=2022-08-26 |title=UTS #10: Unicode Collation Algorithm |url=https://www.unicode.org/reports/tr10/ |access-date=2023-08-16 |website=[[Unicode]]}}</ref>
 
Unicode Technical Report #10 also specifies the ''Default Unicode Collation Element Table'' (DUCET). This data file specifies a default collation ordering. The DUCET is customizable for different languages,<ref name=":0" /><ref name=":1">{{Cite book |last=Hosken |first=Martin |url=https://scriptsource.org/cms/scripts/render_download.php?format=file&media_id=..%2Fsites%2Fs%2Fmedia%2Fdatabase%2Fssproto%2Fentries%2Fpn%2Frn%2Fpnrnlhkrq9_sort_tutorial.pdf&filename=sort_tutorial.pdf |title=Unicode Sort Tailoring: Tutorial |date=2021-09-23 |publisher=[[SIL International|SIL Writing Systems Technology]] |edition=1.3 |pages=2–3 |access-date=2023-08-16}}</ref> and Somesome such customisationscustomizations can be found in the Unicode [[Common Locale Data Repository]] (CLDR).<ref>{{Cite web |title=CLDR Releases/Downloads |url=https://cldr.unicode.org/index/downloads |access-date=2023-08-16 |website=[[Common Locale Data Repository|Unicode CLDR]] |language=}}</ref>
 
An open source implementation of UCA is included with the [[International Components for Unicode]], ICU.<ref>{{Cite web |title=ICU - International Components for Unicode |url=https://icu.unicode.org/home |access-date=2023-08-16 |website=[[Unicode]]}}</ref><ref>{{Cite web |title=Collations |url=https://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbadmin/natlang-s-7003956.html |access-date=2023-08-16 |website=SyBooks Online}}</ref> ICU supports tailoring, and the collation tailorings from CLDR are included in ICU.<ref>{{Cite Theweb effects|title=Customization of tailoring and many language|url=https://unicode-specific tailoringsorg.github.io/icu/userguide/collation/customization/ are displayed in the on|access-linedate=2023-08-16 '''|website=ICU LocaleDocumentation Explorer'''.|language=}}</ref><ref name=":1" />
 
==See also==
* [[Collation]]
* [[ISO 14651|ISO/IEC 14651]]
* [[European ordering rules]] (EOR)
* [[Common Locale Data Repository]] (CLDR)
 
== References ==
<references />
 
==External links==
* [https://www.unicode.org/reports/tr10/ Unicode Collation Algorithm]: Unicode Technical Standard #10
* [http://developer.mimer.com/sql-unicode-collation-charts/ Mimer SQL Unicode Collation Charts]
* [http://www.collation-charts.org/mysql60/ MySQL Collation Charts]
 
===Tools===
* [httphttps://demo.icuicu4c-projectdemos.unicode.org/icu-bin/locexp?_=en_US&x=col ICU Locale Explorer] An online demonstration of the Unicode Collation Algorithm using [[International Components for Unicode]]
*[https://icu4c-demos.unicode.org/icu-bin/collation.html An ICU collation demo]
* [http://billposer.org/Software/msort.html msort] A sort program that provides an unusual level of flexibility in defining collations and extracting keys.
 
Line 26 ⟶ 30:
[[Category:Unicode algorithms|Collation]]
[[Category:Collation]]
 
 
{{algorithm-stub}}