Talk:Windows code page: Difference between revisions

Content deleted Content added
No edit summary
m Reverted 1 edit by 2409:40C2:6018:4DEB:287B:76FF:FE61:65C4 (talk) to last revision by Cewbot
 
(28 intermediate revisions by 22 users not shown)
Line 1:
{{WikiProject banner shell|class=B|
the generator code is given below, readtxt.pas can be obtained from http://bewareserv.sourceforge.net/ most of the data used came from http://www.unicode/org/PUBLIC/mappings the .mspx.html files used where unicode.org didn't have a mapping availible for the code page in question are from http://www.microsoft.com/globaldev/reference/oem.mspx
{{WikiProject Computing }}
{{WikiProject Microsoft Windows |importance=mid}}
}}
 
== Way too big? ==
<pre>
This page should link to the individual code pages instead of listing all of the tables at once (for one thing it is missing most of the mappings anyway, like for 932). --''unsigned''
program byposition;
 
: {{Done}}. Someone else apparently have already cleaned up the article.--[[User:Makkachin|Makkachin]] ([[User talk:Makkachin|talk]]) 00:50, 21 July 2013 (UTC)
uses
sysutils,readtxt; //we use our own text reader as the delphi one can't handle
//unix format text
const
maxcharset =12;//9;
startat = $80;
{$define showallchars}
{$define breakbeforecodepoint}
var
buildarray : array[0..255,0..maxcharset] of longint;
names : array[0..31] of string;
procedure processcharset(name:string;number:byte;filename:string);
var
t: treadtext;
line:string;
i,j:integer;
begin
names[number] := name;
 
== ANSI or not? ==
readtext_init(t,filename);
repeat
line := readtext_line(t);
if (length(line)>=11) and (line[2]='x') and (line[7]='x') then begin;
//writeln('processing line '+line);
//unicode.org format
buildarray[strtoint('$'+copy(line,3,2)),number] := strtoint('$'+copy(line,8,4));
end else if (length(line)>=11) and (copy(line,3,5)=' = U+') then begin;
//ms format
buildarray[strtoint('$'+copy(line,1,2)),number] := strtoint('$'+copy(line,8,4));
end;
until readtext_eof(t);
end;
var
t: textfile;
i,j,k : integer;
firstline : boolean;
goodline : boolean;
rowcounter : integer;
comparevalue : integer;
begin
for i := 0 to 255 do for j := 0 to maxcharset do buildarray[i,j] := -1;
 
Once and for all, is it correct to say "ANSI" to the Windows code pages? Currently, some pages on wikipedia say it's wrong (as ANSI never defined these code pages, but Microsoft just says "ANSI" to it anyway), while this article makes the impression that it is ok. --[[User:Abdull|Abdull]] 23:53, 17 March 2006 (UTC)
{processcharset('windows-874|874',0,'CP874.txt');
:Well microsofts technical documents use that term all over the place and i don't belive anyone uses the term ansi code page for anything else. I can't imagine ANSI are particularlly happy about having thier name put to something that isn't thiers though. I guess it all depends on how you define right and wrong ;) [[User:Plugwash|Plugwash]] 10:44, 18 March 2006 (UTC)
processcharset('windows-1250|1250',1,'CP1250.txt');
:::ANSI explicitly disavows any understanding of the standards they publish, and explicitly repudiates any suggestion that they have an opinion about the meaning of their standards. So if you want to say, for example, that the first 127 code points all represent the same character .... ANSI doesn't have an opinion on that. Wikipedia on the other hand.... <!-- Template:Unsigned IP --><small class="autosigned">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/124.188.134.186|124.188.134.186]] ([[User talk:124.188.134.186#top|talk]]) 06:05, 25 October 2020 (UTC)</small> <!--Autosigned by SineBot-->
processcharset('windows-1251|1251',2,'CP1251.txt');
::Microsoft is now leaning away from "ansi" and uses "active" instead to describe the current code page. <small><span class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:131.107.0.73|131.107.0.73]] ([[User talk:131.107.0.73|talk]] • [[Special:Contributions/131.107.0.73|contribs]]) </span></small><!-- Template:Unsigned -->
processcharset('windows-1252|1252',3,'CP1252.txt');
processcharset('windows-1253|1253',4,'CP1253.txt');
processcharset('windows-1254|1254',5,'CP1254.txt');
processcharset('windows-1255|1255',6,'CP1255.txt');
processcharset('windows-1256|1256',7,'CP1256.txt');
processcharset('windows-1257|1257',8,'CP1257.txt');
processcharset('windows-1258|1258',9,'CP1258.txt');}
 
== Redundant comment ==
processcharset('code page 437|437',0,'CP437.txt');
processcharset('code page 720|720',1,'720.mspx.html');
processcharset('code page 737|737',2,'CP737.txt');
processcharset('code page 775|775',3,'CP775.txt');
processcharset('code page 850|850',4,'CP850.txt');
processcharset('code page 852|852',5,'CP852.txt');
processcharset('code page 855|855',6,'CP855.txt');
processcharset('code page 857|857',7,'CP857.txt');
processcharset('code page 858|858',8,'858.mspx.html');
processcharset('code page 862|862',9,'CP862.txt');
processcharset('code page 866|866',10,'CP866.txt');
processcharset('windows-874|874',11,'CP874.txt');
processcharset('windows-1258|1258',12,'CP1250.txt');
 
"Recent Microsoft products and APIs use Unicode internally, but many applications and APIs '''(including Java)''' continue to…"
assignfile(t,'output.txt');
: does the Java comment seem relevant? I mean, there are a million and one applications that use the older methods, shouldn't we name them here too ?
rewrite(t);
writeln(t,'<table {{prettytable}}>');
 
== Removal of chart ==
 
I have removed the charts of the code pages from this article, and put all the information that was left in a miscellaneous section. I am using the latest version of the screen reader [[JAWS (screen reader)|JAWS]] on a fairly fast computer and JAWS froze for thirty seconds when I entered this page. I almost couldn't edit the section with the chart to remove it. I can understand that happening at somewhere like [[wikipedia:articles for deletion/yesterday]] or [[wikipedia:requests for adminship]] when it is busy, but JAWS should never freeze for more than a few seconds when I enter an article. Not even the article [[United States]] is that taxing on JAWS resources. The charts for code pages are available elsewhere on the Internet in an uneditable form. '''[[User:Graham87|Graham]]'''[[User talk:Graham87|<span style="color:green;">87</span>]] 08:29, 21 June 2007 (UTC)
firstline := true;
rowcounter := 0;
for i := startat to 255 do begin
goodline := false;
comparevalue := buildarray[i,0];
{$ifdef showallchars}
goodline := true;
{$else}
for j := 1 to maxcharset do begin
if comparevalue <> buildarray[i,j] then goodline := true;
end;
{$endif}
if goodline then begin
if (rowcounter and ($1F shr(0{$ifndef twocol}+1{$endif} {$ifdef breakbeforecodepoint}+1{$endif} ))) = 0 then begin
write(t,'<tr>');
{$ifdef twocol}for j := 1 to 2 do{$endif} begin
 
== Available only to managed applications? ==
write(t,'<td>position<br>([[hexadecimal|hex]])');
for k := 0 to maxcharset do begin;
write(t,'<td>[['+names[k]+']]');
end;
end;
end;
{$ifdef twocol}if (rowcounter and 1) =0 then{$endif} write(t,'<tr>');
write(t,'<td>'+inttohex(i,2));
inc(rowcounter);
//if firstline then begin
// firstline := false;
// write(t,'<td>{{uplusfirst}}'+inttohex(i,4));
//end else begin
 
1201 — Unicode (BMP of ISO 10646, UTF-16BE) has the following description: "Available only to managed applications"
//end;
Does it make any sense to include this explanation?
The 1200 - utf-16 encoding is also available only to managed applications according to the linked MS page, but here its missing.
[[User:Hubalu|Hubalu]] ([[User talk:Hubalu|talk]]) 10:24, 30 January 2014 (UTC)
 
== Parked stuff found on category page ''Category:Windows code pages'' ==
 
Still missing: [[windows-874]] (redirects to ISO/IEC 8859-11),
[[code page 50220]], [[code page 50221]], [[code page 50222]], which are variations on [[ISO-2022-JP]].
 
External links
*[http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx Full list of Windows code pages]
*[http://www.microsoft.com/globaldev/reference/cphome.mspx Windows Code Page reference chart]
*[http://www.iana.org/assignments/charset-reg IANA Charset Name Registrations]
*[http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS Unicode mapping table for Windows code pages]
*[http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit Unicode mappings of windows code pages with "best fit"]
 
windows-874:
for j := 0 to maxcharset do begin
*[http://www.microsoft.com/globaldev/reference/sbcs/874.mspx Windows 874 reference chart]
write(t,'<td>');
*[http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP874.TXT Unicode mapping table for Windows 874]
case buildarray[i,j] of
*[http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit874.txt Unicode mappings of windows 874 with "best fit"]
-1 : ;
$00 : write(t,'[[NUL]]');
$01 : write(t,'[[SOH]]');
$02 : write(t,'[[STX]]');
$03 : write(t,'[[ETX]]');
$04 : write(t,'[[EOT]]');
$05 : write(t,'[[ENQ]]');
$06 : write(t,'[[ACK]]');
$07 : write(t,'[[BEL]]');
$08 : write(t,'[[BS]]');
$09 : write(t,'[[TAB]]');
$0A : write(t,'[[LF]]');
$0B : write(t,'[[VT]]');
$0C : write(t,'[[FF]]');
$0D : write(t,'[[CR]]');
$0E : write(t,'[[SO]]');
$0F : write(t,'[[SI]]');
 
End of stuff moved from category page.
$10 : write(t,'[[DLE]]');
$11 : write(t,'[[DC1]]');
$12 : write(t,'[[DC2]]');
$13 : write(t,'[[DC3]]');
$14 : write(t,'[[DC4]]');
$15 : write(t,'[[NAK]]');
$16 : write(t,'[[SYN]]');
$17 : write(t,'[[ETB]]');
$18 : write(t,'[[CAN]]');
$19 : write(t,'[[EM]]');
$1A : write(t,'[[SUB]]');
$1B : write(t,'[[ESC]]');
$1C : write(t,'[[FS]]');
$1D : write(t,'[[GS]]');
$1E : write(t,'[[RS]]');
$1F : write(t,'[[US]]');
 
== More than two groups of code pages ==
$80 : write(t,'[[PAD]]');
$81 : write(t,'[[HOP]]');
$82 : write(t,'[[BPH]]');
$83 : write(t,'[[NBH]]');
$84 : write(t,'[[IND]]');
$85 : write(t,'[[NEL]]');
$86 : write(t,'[[SSA]]');
$87 : write(t,'[[ESA]]');
$88 : write(t,'[[HTS]]');
$89 : write(t,'[[HTJ]]');
$8A : write(t,'[[VTS]]');
$8B : write(t,'[[PLD]]');
$8C : write(t,'[[PLU]]');
$8D : write(t,'[[RI]]');
$8E : write(t,'[[SS2]]');
$8F : write(t,'[[SS3]]');
 
{{Cquote|text=There are two groups of code pages in Windows systems: OEM and ANSI code pages.}}
$90 : write(t,'[[DCS]]');
There are actually 4 groups of code pages: ANSI, OEM, Mac and EBCDIC under Windows.
$91 : write(t,'[[PU1]]');
All of them can vary dependent on the locale of the Windows version.
$92 : write(t,'[[PU2]]');
See LOCALE_IDEFAULTMACCODEPAGE and LOCALE_IDEFAULTEBCDICCODEPAGE in this link: https://msdn.microsoft.com/de-de/library/windows/desktop/dd373761(v=vs.85).aspx
$93 : write(t,'[[STS]]');
$94 : write(t,'[[CCH]]');
$95 : write(t,'[[MW]]');
$96 : write(t,'[[SPA]]');
$97 : write(t,'[[EPA]]');
$98 : write(t,'[[SOS]]');
$99 : write(t,'[[SGCI]]');
$9A : write(t,'[[SCI]]');
$9B : write(t,'[[CSI]]');
$9C : write(t,'[[ST]]');
$9D : write(t,'[[OSC]]');
$9E : write(t,'[[PM]]');
$9F : write(t,'[[APC]]');
 
== KOI8 ==
$A0 : write(t,'[[NBSP]]');
$AD : write(t,'[[SHY]]');
 
Where is the source for “20866” designation by Microsoft? [[User:Incnis Mrsi|Incnis Mrsi]] ([[User talk:Incnis Mrsi|talk]]) 10:28, 3 August 2019 (UTC)
 
else write(t,'[[&#x'+inttohex(buildarray[i,j],4)+';]]');
end;
{$ifdef breakbeforecodepoint}
if buildarray[i,j] >=0 then write(t,'<br><small>U+'+inttohex(buildarray[i,j],4)+'</small>');
{$else}
if buildarray[i,j] >=0 then write(t,'<sub>U+'+inttohex(buildarray[i,j],4)+'</sub>');
{$endif}
end;
writeln(t,'</td>');
 
 
end;
end;
writeln(t,'</table>');
closefile(t);
//for counter := 0 to 65535 do begin;
 
end.
</pre>