Revision as of 20:21, 14 November 2023 edit Warudo (talk \| contribs) Extended confirmed users 9,346 edits →History & use Tag: 2017 wikitext editor ← Previous edit		Revision as of 21:47, 17 December 2024 edit undo Rofraja (talk \| contribs) Extended confirmed users, Pending changes reviewers 47,182 edits m Replaced 2 bare URLs by {{Cite web}}; Replaced "Archived copy" by actual titles Next edit →
Line 8: At first the Unicode Consortium considered it to be a character encoding,<ref>{{Cite web\|url=https://unicode.org/reports/tr17/tr17-2.html\|title = UTR#17: Character Encoding Model}}</ref> but in 1999 changed its mind: although it was still considered a transfer encoding syntax, for a while it was no longer considered a character encoding because different compressors might yield different outputs for the same text.<ref>https://unicode.org/reports/tr17/tr17-3.html#Transfer Encoding Syntax</ref> However, in 2004 this decision was reverted and now SCSU is considered a ''compressing'' character encoding scheme, as opposed to a simple or compound character encoding scheme.<ref>{{Cite web\|url=https://unicode.org/L2/L2004/04288-tr17-5d2.html#CharacterEncodingScheme\|title=UTR#17: Character Encoding Model\|date=2004-07-14}}</ref><ref>{{Cite web \|title=UTR#17: Unicode Character Encoding Model \|url=https://unicode.org/reports/tr17/ \|access-date=2023-11-14 \|website=unicode.org}}</ref> Roman Czyborra (of [[GNU Unifont]]) wrote a decompressor.<ref>{{Cite web\| title=This is a deflator to UTF-8 output for input compressed in SCSU \| url=https://czyborra.com/scsu/scsu.c ~~{{Bare~~\| ~~URL~~archive-url=https://web.archive.org/web/19990908230458/http://czyborra.com:80/scsu/scsu.c ~~plain text~~\| archive-date=~~March 2022~~1999-09-08}}</ref> The IBM-contributed decompressor is found in [[International Components for Unicode]], along with a compressor written in Java.<ref>{{Cite web\|url=https://github.com/unicode-org/icu/blob/3f043c7693e20c8cded76035918dad104e7256e3/icu4j/main/classes/charset/src/com/ibm/icu/charset/CharsetSCSU.java\|title = International Components for Unicode\|website = [[GitHub]]\|date = 22 October 2021}}</ref> Simpler reference codecs are available as attachments to TR6. [[Symbian OS]], an operating system for mobile phones and other mobile devices, uses SCSU to serialize strings. Line 26: == Comparison with general-purpose plain text compression schemes == Because UTF-16 or UTF-8 text might occupy more space than its equivalent in pre-Unicode encodings did, one might want to use compression such as SCSU to mitigate this problem.<ref>{{Cite web\| title=Implementation Guidelines \| url=https://unicode.org/versions/Unicode3.0.0/ch05.pdf ~~{{Bare~~\| ~~URL~~archive-url=https://web.archive.org/web/20150730234318/http://www.unicode.org/versions/Unicode3.0.0/ch05.pdf ~~PDF~~\| archive-date=~~March 2022~~2015-07-30}}</ref> In comparison with general-purpose compressors, it is not necessarily advantageous to use SCSU.<ref name=Ewellic/> Also, while it can be used as a text encoding, because of the stateful nature of the algorithm difficulties may arise when using it as an internal text representation since basic text operations become non-trivial. Treated purely as a compression algorithm, SCSU is inferior to most commonly used general-purpose algorithms for texts of over a few kilobytes.

Standard Compression Scheme for Unicode: Difference between revisions