Content deleted Content added
m improve awkward wording |
Adding local short description: "String collation algorithm", overriding Wikidata description "algorithm" |
||
(4 intermediate revisions by 4 users not shown) | |||
Line 1:
{{Short description|String collation algorithm}}
The '''Unicode collation algorithm''' ('''UCA''') is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from [[String (computer science)|strings]] representing text in any [[writing system]] and [[language]] that can be represented with [[Unicode]]. These keys can then be efficiently compared byte by byte in order to [[collate]] or sort them according to the rules of the language, with options for ignoring case, accents, etc.<ref name=":0">{{Cite web |last=Whistler |first=Ken |last2=Scherer |first2=Markus |last3=Davis |first3=Mark |author-link3=Mark Davis (Unicode) |date=2022-08-26 |title=UTS #10: Unicode Collation Algorithm |url=https://www.unicode.org/reports/tr10/ |access-date=2023-08-16 |website=[[Unicode]]}}</ref>▼
__NOTOC__
▲The '''Unicode collation algorithm''' ('''UCA''') is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from [[String (computer science)|strings]] representing text in any [[writing system]] and [[language]] that can be represented with [[Unicode]]. These keys can then be efficiently compared byte by byte in order to [[collate]] or sort them according to the rules of the language, with options for ignoring case, accents, etc.<ref name=":0">{{Cite web |
Unicode Technical Report #10 also specifies the ''Default Unicode Collation Element Table'' (DUCET)
An open source implementation of UCA is included with the [[International Components for Unicode]], ICU.<ref>{{Cite web |title=ICU - International Components for Unicode |url=https://icu.unicode.org/home |access-date=2023-08-16 |website=[[Unicode]]}}</ref><ref>{{Cite web |title=Collations |url=https://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbadmin/natlang-s-7003956.html |access-date=2023-08-16 |website=SyBooks Online}}</ref> ICU supports tailoring, and the collation tailorings from CLDR are included in ICU.<ref>{{Cite web |title=Customization |url=https://unicode-org.github.io/icu/userguide/collation/customization/ |access-date=2023-08-16 |website=ICU Documentation |language=}}</ref><ref name=":1" />
Line 19 ⟶ 21:
===Tools===
* [https://icu4c-demos.unicode.org/icu-bin/locexp?_=en_US&x=col ICU Locale Explorer] An online demonstration of the Unicode Collation Algorithm using [[International Components for Unicode]]
*[https://icu4c-demos.unicode.org/icu-bin/collation.html An ICU collation demo]
* [http://billposer.org/Software/msort.html msort] A sort program that provides an unusual level of flexibility in defining collations and extracting keys.
|