Standard Compression Scheme for Unicode: Difference between revisions

Content deleted Content added
History & use: Added the current version of UTR17 as a source, since the article claims SCSU's classification still hasn't changed
Line 6:
[[Reuters]] originally developed SCSU, then under the name RCSU for Reuters Compression Scheme for Unicode.<ref>{{Cite web|url=https://unicode.org/iuc/iuc9/Friday2.html#b3|title = Ninth International Unicode Conference - Friday - Track B}}</ref><ref>{{Cite web|url=https://unicode.org/iuc/iuc10/program.html|title=Tenth International Unicode Conference - Conference Program}}</ref><ref>{{Cite web|url=https://unicode.org/reports/tr6-10.html|title=Compression Scheme for Unicode}}</ref><ref name=Ewellic>{{Cite web|url=https://www.unicode.org/notes/tn14/UnicodeCompression.pdf|title = A survey of Unicode compression}}</ref>
 
At first the Unicode Consortium considered it to be a character encoding,<ref>{{Cite web|url=https://unicode.org/reports/tr17/tr17-2.html|title = UTR#17: Character Encoding Model}}</ref> but in 1999 changed its mind: although it was still considered a transfer encoding syntax, for a while it was no longer considered a character encoding because different compressors might yield different outputs for the same text.<ref>https://unicode.org/reports/tr17/tr17-3.html#Transfer Encoding Syntax</ref> However, in 2004 this decision was reverted and now SCSU is considered a ''compressing'' character encoding scheme, as opposed to a simple or compound character encoding scheme.<ref>{{Cite web|url=https://unicode.org/L2/L2004/04288-tr17-5d2.html#CharacterEncodingScheme|title=UTR#17: Character Encoding Model|date=2004-07-14}}</ref><ref>{{Cite web |title=UTR#17: Unicode Character Encoding Model |url=https://unicode.org/reports/tr17/ |access-date=2023-11-14 |website=unicode.org}}</ref>
 
Roman Czyborra (of [[GNU Unifont]]) wrote a decompressor.<ref>https://czyborra.com/scsu/scsu.c {{Bare URL plain text|date=March 2022}}</ref> The IBM-contributed decompressor is found in [[International Components for Unicode]], along with a compressor written in Java.<ref>{{Cite web|url=https://github.com/unicode-org/icu/blob/3f043c7693e20c8cded76035918dad104e7256e3/icu4j/main/classes/charset/src/com/ibm/icu/charset/CharsetSCSU.java|title = International Components for Unicode|website = [[GitHub]]|date = 22 October 2021}}</ref> Simpler reference codecs are available as attachments to TR6.