Content deleted Content added
mostly corrected the basic description ... the UCA is NOT an algorithm for comparing strings, but rather for producing keys that can be byte-wise compared in order to sort the original strings. (Still lots more work to be done on this article.) |
Adding local short description: "String collation algorithm", overriding Wikidata description "algorithm" |
||
(15 intermediate revisions by 15 users not shown) | |||
Line 1:
{{Short description|String collation algorithm}}
__NOTOC__
The '''Unicode collation algorithm''' ('''UCA''') is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from [[String (computer science)|strings]] representing text in any [[writing system]] and [[language]] that can be represented with [[Unicode]]. These keys can then be efficiently compared byte
Unicode Technical Report #10 also specifies the ''Default Unicode Collation Element Table'' (DUCET). This data file specifies a default collation ordering. The DUCET is customizable for different languages,<ref name=":0" /><ref name=":1">{{Cite book |last=Hosken |first=Martin |url=https://scriptsource.org/cms/scripts/render_download.php?format=file&media_id=..%2Fsites%2Fs%2Fmedia%2Fdatabase%2Fssproto%2Fentries%2Fpn%2Frn%2Fpnrnlhkrq9_sort_tutorial.pdf&filename=sort_tutorial.pdf |title=Unicode Sort Tailoring: Tutorial |date=2021-09-23 |publisher=[[SIL International|SIL Writing Systems Technology]] |edition=1.3 |pages=2–3 |access-date=2023-08-16}}</ref> and
An open source implementation of UCA is included with the [[International Components for Unicode]], ICU.<ref>{{Cite web |title=ICU - International Components for Unicode |url=https://icu.unicode.org/home |access-date=2023-08-16 |website=[[Unicode]]}}</ref><ref>{{Cite web |title=Collations |url=https://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbadmin/natlang-s-7003956.html |access-date=2023-08-16 |website=SyBooks Online}}</ref> ICU supports tailoring, and the collation tailorings from CLDR are included in ICU.<ref>{{Cite
==See also==
* [[Collation]]
* [[
* [[European ordering rules]] (EOR)
* [[Common Locale Data Repository]] (CLDR)
== References ==
<references />
==External links==
* [https://www.unicode.org/reports/tr10/ Unicode Collation Algorithm]: Unicode Technical Standard #10
* [http://developer.mimer.com/sql-unicode-collation-charts/ Mimer SQL Unicode Collation Charts]
===Tools===
* [
*[https://icu4c-demos.unicode.org/icu-bin/collation.html An ICU collation demo]
* [http://billposer.org/Software/msort.html msort] A sort program that provides an unusual level of flexibility in defining collations and extracting keys.
Line 26 ⟶ 30:
[[Category:Unicode algorithms|Collation]]
[[Category:Collation]]
{{algorithm-stub}}
|