Optical Character Recognition (Unicode block): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 13:36, 18 December 2020 edit HarJIT (talk \| contribs) Extended confirmed users 12,434 edits →Subheadings ← Previous edit		Latest revision as of 16:17, 26 July 2024 edit undo Drmccreedy (talk \| contribs) Extended confirmed users, Template editors 26,285 edits m →History: add sticky header
(11 intermediate revisions by 5 users not shown)
Line 5: \|script1 = [[Script (Unicode)#Special script property values\|Common]] \|symbols = OCR controls \|sources = [[ISO 2033]] \|1_0_0 = 11 \|note = <ref>{{cite web\|url=https://www.unicode.org/ucd/\|title=Unicode character database\|work=The Unicode Standard\|accessdate=~~2016~~2023-07-0926}}</ref><ref>{{cite web\|url=https://www.unicode.org/versions/enumeratedversions.html\|title=Enumerated Versions of The Unicode Standard\|work=The Unicode Standard\|accessdate=~~2016~~2023-07-0926}}</ref> }} Line 20 ⟶ 21: ===OCR-A=== {{further\|OCR-A}} [[File:Verrechnungsscheck, WestLB, Landeshauptkasse Düsseldorf, 2004.jpg\|right\|thumb\|A partly redacted German [[cheque]], showing use of ⑂, ⑀ and ⑁ in the machine-readable line]] The OCR-A subheading contains six characters taken from the [[OCR-A]] font described in the ISO 1073-1:1976 standard: {{unichar\|2440\|OCR HOOK}}, {{unichar\|2441\|OCR CHAIR}}, {{unichar\|2442\|OCR FORK}}, {{unichar\|2443\|OCR INVERTED FORK}}, {{unichar\|2444\|OCR BELT BUCKLE}}, and {{unichar\|2445\|OCR BOW TIE}}. The OCR bow tie is given the [[Unicode character property#Name\|informative alias]] "unique asterisk". The hook, chair and fork, in addition to a long vertical bar, are included in the most basic "numeric" implementation level of OCR-A, which includes digits but excludes letters and conventional punctuation.<ref>{{cite web \|url=https://ecma-international.org/wp-content/uploads/ECMA-8_2nd_edition_january_1977.pdf \|title=Nominal Character Dimensions of the Numeric OCR-A Font \|edition=2nd \|id=ECMA-8 \|year=1977 \|author=European Computer Manufacturers Association \|author-link=Ecma International}}</ref> By contrast, the most basic implementation level of [[OCR-B]] instead includes the digits, [[plus sign]], [[less-than sign]], [[greater-than sign]], long vertical bar and seven of the capital letters;<ref>{{cite web \|url=https://www.open-std.org/JTC1/SC2/WG3/docs/n470.pdf#page=12 \|page=8 \|title=9.1: Subset 1: Minimal alphanumeric subset \|work=Proposal for Type 3 Technical Report, TR 15907, Information technology—Revision of OCR-B standard (ISO 1073-2:1976) \|id=ISO/IEC JTC1/SC2/WG3 N470 \|date=1998-09-28 \|author=ISO/IEC JTC1/SC2/WG3 \|author-link=ISO/IEC JTC 1/SC 2}}</ref> as such, there are no characters specific to OCR-B in the Optical Character Recognition block. ===MICR=== {{further\|Magnetic ~~Ink~~ink ~~Character~~character ~~Recognition~~recognition}} [[File:NIXON, Richard M (signed check).jpg\|right\|thumb\|A cheque signed by [[Richard Nixon]], showing use of ⑆, ⑇, ⑈ and ⑉ in the machine-readable line]] The MICR subheading contains four punctuation characters for [[cheque\|bank cheque]] identifiers, taken from the [[magnetic ink character recognition]] E-13B font (codified in the ISO 1004:1995 standard): {{unichar\|2446\|OCR BRANCH BANK IDENTIFICATION}}, {{unichar\|2447\|OCR AMOUNT OF CHECK}}, {{unichar\|2448\|OCR DASH}}, and {{unichar\|2449\|OCR CUSTOMER ACCOUNT NUMBER}}. The latter two characters are misnamed: their names were inadvertently switched when they were named in the 1993 (first) edition of [[ISO/IEC 10646]],<ref>{{citation\|mode=cs1 \|url=https://www.unicode.org/wg2/docs/n4103.pdf \|page=29 \|section=T.3. Optical Character Recognition \|title=Unconfirmed minutes of WG 2 meeting 58 \|author=ISO/IEC JTC 1/SC 2/WG 2 \|author-link=ISO/IEC JTC 1/SC 2 \|date=2012-01-03 \|id=SC2 N4188 / WG2 N4103 \|quotation=These Magnetic Ink Character Recognition (MICR) symbols are used by banks on checks. The names of these characters were inadvertently mixed up in the 1993 edition of ISO/IEC 10646.}}</ref> a mistake which had been present since Unicode 1.0.0.<ref>{{cite web \|url=https://www.unicode.org/versions/Unicode1.0.0/CodeCharts2.pdf \|work=The Unicode Standard \|version=version 1.0 \|title=3.8: Block-by-Block Charts \|publisher=[[Unicode Consortium]]}}</ref> Although their formal names remain unchanged due to the Unicode stability policy, they both have corrected [[Unicode character property#Name\|normative alias]]es: U+2448 ⑈ is {{sc\|MICR ON US SYMBOL}}, and U+2449 ⑉ is {{sc\|MICR DASH SYMBOL}}<ref>{{citation\|mode=cs1 \|url=https://www.unicode.org/notes/tn27/tn27-4.html \|title=Known Anomalies in Unicode Character Names \|publisher=[[Unicode Consortium]] \|id=Unicode Technical Note #27 \|first1=Asmus \|last1=Freytag \|first2=Rick \|last2=McGowan \|first3=Ken \|last3=Whistler \|date=2017-04-10 \|edition=4}}</ref> (the standard notes that "the Unicode character names include several misnomers"). These symbols had previously been encoded by the ISO-IR-98 encoding defined by [[ISO 2033]]:1983, in which they were simply named {{sc\|SYMBOL ONE}} through {{sc\|SYMBOL FOUR}}.<ref>{{cite ~~web \|url=https://www.itscj.ipsj.or.jp/~~iso-ir~~/098.pdf~~ \|~~title~~number=~~ISO-IR-~~98: A\|title=E13B ~~set~~Graphic ofCharacter 14Set ~~graphic characters of the E13B font.~~\|id-in-title=yes \|~~author~~sponsor=ISO/TC97/SC2 \|~~author~~sponsor-link=ISO/IEC JTC 1/SC 2#History ~~\|publisher=ITSCJ/[[Information Processing Society of Japan\|IPSJ]]~~ \|date=1985-08-01}}</ref> All four characters have informative aliases in the Unicode charts: "transit", "amount", "on us", and "dash" respectively. ===OCR=== Line 40 ⟶ 45: The following Unicode-related documents record the purpose and process of defining specific characters in the Optical Character Recognition block: {{sticky header}} {\| class="wikitable sticky-header" \|- ! [[Unicode#Versions\|Version]] !! {{nobr\|Final code points<ref group=lower-alpha name=final/>}} !! Count !! [[International Committee for Information Technology Standards\|L2]] ID !! [[ISO/IEC JTC 1/SC 2\|WG2]] ID !! Document \|- \| rowspan="34" \| 1.0.0 \|\| rowspan="34" \| U+2440..244A \|\| rowspan="34" \| 11 \|\| \|\| \|\| (to be determined) \|- \| {{nobr\|[https://www.unicode.org/L2/L2010/10416.htm L2/10-416R]}} \|\| \|\| {{Citation\|title=UTC #125 / L2 #222 Minutes\|date=2010-11-09\|first=Lisa\|last=Moore\|ref=none\|section=Consensus 125-C39\|quote=Create two formal aliases, U+2448 MICR ON US SYMBOL and U+2449 MICR DASH SYMBOL for Unicode 6.1.}} \|- \| ~~{{nobr~~\|\| [https://www.unicode.org/L2wg2/~~L2010~~docs/~~10416~~n4103.~~htm~~pdf ~~L2/10-416R~~N4103]~~}} \|\|~~ \|\| {{Citation\|title=~~UTC~~Unconfirmed ~~#125~~minutes /of L2WG ~~#222~~2 ~~Minutes~~meeting 58\|date=~~2010~~2012-1101-0903\|~~first~~ref=~~Lisa\|last=Moore~~none\|section=~~Consensus~~T.3. ~~125-C39\|quote=Create~~Optical ~~two~~Character ~~formal aliases, U+2448 MICR ON US SYMBOL and U+2449 MICR DASH SYMBOL for Unicode 6.1.~~Recognition}} \|- \| {{nobr\|\| [https://www.unicode.org/~~wg2~~L2/~~docs~~L2022/~~n4103~~22065-edcom-rept-utc171.~~pdf~~html ~~N4103~~L2/22-065]}} \|\| \|\| {{Citation\|title=~~Unconfirmed~~Editorial ~~minutes~~Committee ofReport WGand 2Recommendations ~~meeting~~for 58UTC #171Meeting\|date=~~2012~~2022-0104-0313\|first=Ken\|last=Whistler\|ref=none\|section=~~T.3~~Opt Subject: Unicode 14.0 "Optical Character Recognition" code chart [Affects U+2447]}} \|- class="sortbottom" \| colspan="6" \| {{reflist\|group=lower-alpha\|refs=<ref name=final>Proposed code points and characters names may differ from final code points and names</ref>}}