Indian Script Code for Information Interchange: Difference between revisions

Content deleted Content added
m drop linefeeds from tooltips
Bender the Bot (talk | contribs)
m Codepage layout: HTTP to HTTPS for SourceForge
 
(19 intermediate revisions by 14 users not shown)
Line 1:
{{short description|Coding scheme for Indian writing systems}}
{{Contains special characters|Indic}}
{{More citations needed|date=February 2022}}
'''Indian ScriptStandard Code for Information Interchange''' ('''ISCII''') is a coding scheme for representing various writing systems of [[India]]. It encodes the main [[Indic script]]s and a Roman transliteration. The supported scripts are: [[Eastern Nagari|Bengali–Assamese]], [[Devanagari]], [[Gujarāti script|Gujarati]], [[Gurmukhi]], [[Kannada script|Kannada]], [[Malayalam script|Malayalam]], [[OriyaOdia script|OriyaOdia]], [[Tamil script|Tamil]], and [[Telugu script|Telugu]]. ISCII does not encode the writing systems of India that are based on [[Persian language|Persian]], but its writing system switching codes nonetheless provide for [[Kashmiri language|Kashmiri]], [[Sindhi language|Sindhi]], [[Urdu]], [[Persian language|Persian]], [[Pashto language|Pashto]] and [[Arabic]]. The Persian-based writing systems were subsequently encoded in the [[Perso-Arabic Script Code for Information Interchange|PASCII]] encoding.
 
ISCII has not been widely used outside certain government institutions, although a variant without the {{ctrl|ATR|internal=yes}} mechanism was used on [[classic Mac OS]], [[Mac OS Devanagari encoding|Mac OS Devanagari]],<ref name="appledevanagari"/> and it has now been rendered largely obsolete by [[Unicode]]. Unicode uses a separate block for each Indic writing system, and largely preserves the ISCII layout within each block.<ref name="unicode">{{cite book |title=The Unicode Standard v15.0 Chapter 12 |publisher=The Unicode Consortium |url=https://www.unicode.org/versions/Unicode15.0.0/ch12.pdf |access-date=13 August 2024}}</ref>{{rp|p=462}}
 
==Background==
 
The Brahmi-derived writing systems have similar structure.<ref name="unicode"/>{{rp|p=462}} So ISCII encodes letters with the same phonetic value at the same code point, overlaying the various scripts. For example, the ISCII codes 0xB3 0xDB represent [ki]. This will be rendered as കി in [[Malayalam]], कि in Devanagari, as ਕਿ in Gurmukhi, and as கி in Tamil. The writing system can be selected in rich text by markup or in plain text by means of the {{ctrl|ATR|internal=yes}} code described below.
 
One motivation for the use of a single encoding is the idea that it will allow easy [[transliteration]] from one writing system to another.<ref name="unicode"/>{{rp|p=462}} However, there are enough incompatibilities that this is not really a practical idea.
 
ISCII is an 8-bit encoding.<ref name="std"/>{{rp|p=4}} The lower 128 code points are plain [[American Standard Code for Information Interchange|ASCII]], the upper 128 code points are ISCII-specific. In addition to the code points representing characters, ISCII makes use of a code point with mnemonic {{ctrl|ATR|internal=yes}} that indicates that the following byte contains one of two kinds of information. One set of values changes the writing system until the next writing system indicator or end-of-line. Another set of values select display modes such as bold and italic. ISCII does not provide a means of indicating the default writing system.
 
== Codepage layout ==
 
The following table shows the character set for [[Devanagari]]. The code sets for Assamese, Bengali, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu are similar, with each Devanagari form replaced by the [[Brahmic family of scripts|equivalent form in each writing system]]{{r|name=unicode|p=462}}. Each character is shown with its decimal code and its [[Unicode]] equivalent.
 
{|{{chset-table-header1|ISCII Devanagari<ref name="std">{{Cite book|url=https://varamozhi.sourceforge.net/iscii91.pdf|publisher=Beureau of Indian Standards|title=IS13194:1991 (Soft copy)|year=1999}}</ref>{{rp|p=14}}}}
{|{{chset-table-header1|ISCII Devanagari}}
|-
| {{chset-left1|0x}}
Line 52 ⟶ 53:
| {{chset-ctrl1 | 28 U+001C: Control (alias INFORMATION SEPARATOR FOUR) (alias FILE SEPARATOR) (alias FS) | &nbsp;[[File separator|FS]]&nbsp; }}
| {{chset-ctrl1 | 29 U+001D: Control (alias INFORMATION SEPARATOR THREE) (alias GROUP SEPARATOR) (alias GS) | &nbsp;[[Group separator|GS]]&nbsp; }}
| {{chset-ctrl1 | 30 U+001E: Control (alias INFORMATION SEPARATOR TWO) (alias RECORD SEPARATOR) (alias RS) | &nbsp;[[Record separator|RS&]]&nbsp; }}
| {{chset-ctrl1 | 31 U+001F: Control (alias INFORMATION SEPARATOR ONE) (alias UNIT SEPARATOR) (alias US) | &nbsp;[[Unit separator|US]]&nbsp; }}
|-
Line 58 ⟶ 59:
| {{chset-ctrl1 | 32 U+0020: SPACE (alias SP) | &nbsp;[[Space character|SP]]&nbsp; }}
| {{chset-cell1 | 33 U+0021: EXCLAMATION MARK | [[!]] }}
| {{chset-cell1 | 34 U+0022: QUOTATION MARK | [[&quot;"]] }}
| {{chset-cell1 | 35 U+0023: NUMBER SIGN | [[Number sign|#]] }}
| {{chset-cell1 | 36 U+0024: DOLLAR SIGN | [[$]] }}
Line 121 ⟶ 122:
| {{chset-cell1 | 89 U+0059: LATIN CAPITAL LETTER Y | [[Y]] }}
| {{chset-cell1 | 90 U+005A: LATIN CAPITAL LETTER Z | [[Z]] }}
| {{chset-cell1 | 91 U+005B: LEFT SQUARE BRACKET | [[Left square bracket|&#x5Blsqb;]] }}
| {{chset-cell1 | 92 U+005C: REVERSE SOLIDUS | [[Backslash|&#x5C;\]] }}
| {{chset-cell1 | 93 U+005D: RIGHT SQUARE BRACKET | [[Right square bracket|&#x5Drsqb;]] }}
| {{chset-cell1 | 94 U+005E: CIRCUMFLEX ACCENT | [[^]] }}
| {{chset-cell1 | 95 U+005F: LOW LINE | [[Underscore|_]] }}
|-
| {{chset-left1|6x}}
| {{chset-cell1 | 96 U+0060: GRAVE ACCENT | [[`']] }}
| {{chset-cell1 | 97 U+0061: LATIN SMALL LETTER A | [[a]] }}
| {{chset-cell1 | 98 U+0062: LATIN SMALL LETTER B | [[b]] }}
Line 157 ⟶ 158:
| {{chset-cell1 | 121 U+0079: LATIN SMALL LETTER Y | [[y]] }}
| {{chset-cell1 | 122 U+007A: LATIN SMALL LETTER Z | [[z]] }}
| {{chset-cell1 | 123 U+007B: LEFT CURLY BRACKET | [[Left curly bracket|&#x7B;{]] }}
| {{chset-cell1 | 124 U+007C: VERTICAL LINE | [[Vertical bar|&#x7C;{{pipe}}]] }}
| {{chset-cell1 | 125 U+007D: RIGHT CURLY BRACKET | [[Right curly bracket|&#x7D;}]] }}
| {{chset-cell1 | 126 U+007E: TILDE | [[~]] }}
| {{chset-ctrl1 | 127 U+007F: Control (alias DELETE) (alias DEL) | [[Delete character|DEL]] }}
Line 315 ⟶ 316:
; {{anchor|ATR}}ATR character—code point EF (239): The ATR (attribute) character followed by a byte code is used to switch to a different font attribute (such as bold) or to a different ISCII or [[PASCII]] language (such as Bengali), up to the next ATR sequence or the end of the line. This has no direct Unicode equivalent, as font attributes are not part of Unicode, and each script has a distinct set of code points.
{| class="wikitable collapsible"
|+ Presentational attributes {{r|name=std|p=31}}
|-
!ATR + byte!!Mnemonic!!Formatting option
Line 340 ⟶ 341:
|}
{| class="wikitable collapsible"
|+ Shifts to ISCII scripts {{r|name=std|p=31}}
|-
!ATR + byte!!Mnemonic!!ISCII script
Line 398 ⟶ 399:
|}
; {{anchor|nuqta}}Nukta character ़—code point E9 (233): The [[nukta]] character after another ISCII character is used for a number of rarer characters which don't exist in the main ISCII set. For example क (ka) + ़ (nukta) = क़ (qa). These characters have precomposed forms in Unicode, as shown in the following table.
{| class="wikitable collapsible Unicode" style="font-size:120%;"
|+ Single Unicode characters corresponding to ISCII nukta sequences
! ISCII<br>code point !! Original<br>character !! Character<br>with nukta !! Unicode<br>code point
Line 448 ⟶ 449:
* 57011: Punjabi (Gurmukhi)
 
==Code points for all languagelanguages==
 
{| class="wikitable collapsible collapsed Unicode" border="1" style="text-align:center; font-size:100%;"
|+ Code set for all abugidas using ISCII{{refn|{{multiref|This table can be derived from the correpsondece by tables 2 and 3 in the ISCII standard here<ref name="std"/> and the [[Unicode Standard]] code charts.}}}}
|+ Code-set for all abugidas using ISCII
! Hex !! Official<br>Listing !! [[ISO 15919]]!! colspan="2"| [[Devanagari]]!! colspan="2"| [[Bengali alphabet|Bengali]]
! colspan="2" |Assamese!! colspan="2" | [[Gurmukhi script|Gurmukhi]]!! colspan="2"| [[Gujarati script|Gujarati]]!! colspan="2"| [[Oriya script|Oriya]]!! colspan="2"| [[Tamil script|Tamil]]!! colspan="2"| [[Telugu script|Telugu]]!! colspan="2"| [[Kannada script|Kannada]]!! colspan="2"| [[Malayalam script|Malayalam]]
|-
! A0 || Sign [[Om|OM]] || Ōm̐ || ॐ || 0950 || colspan="2"|
! colspan="2" | || colspan="2" | || ૐ || 0AD0 || colspan="2"| || colspan="2"| || colspan="2"| || colspan="2"| || colspan="2"|
|-
! A1 || Vowel-modifier [[Chandrabindu|CHANDRABINDU]] || || ँ || 0901 || ঁ || 0981
!ঁ
!0981|| ਁ || 0A01 || ઁ || 0A81 || ଁ || 0B01 || colspan="2"| || ఁ || 0C01 || colspan="2"| || colspan="2"|
Line 864 ⟶ 865:
== External links ==
* [http://ltrc.iiit.ac.in/showfile.php?filename=downloads/FC-1.0/fc.html Converters from/to ISCII to/from various fonts]
* [http://padma.mozdev.org Padma – Mozilla extension for transforming ISCII to Unicode] {{Webarchive|url=https://web.archive.org/web/20191001172317/http://padma.mozdev.org/ |date=2019-10-01 }}
* [http://varamozhi.sourceforge.net/iscii91.pdf The ISCII 1991 standard (PDF)]
* [http://padma.mozdev.org Padma – Mozilla extension for transforming ISCII to Unicode]
* [https://web.archive.org/web/20091026134653/http://geocities.com/vnagarjuna/padma.html Padma – Transformer from ISCII to Unicode for Telugu]
* [http://www.phpclasses.org/browse/package/2991.html PHP script for ISCII to and from Unicode]