Character encodings in HTML: Difference between revisions

Content deleted Content added
Suffice (talk | contribs)
Line 14:
</code></blockquote>
 
Either method advises the receiver that the file being sent uses the character set specified. Of course, it would be a very bad idea to send incorrect information. For example, a server where multiple users may place files created on different machines cannot promise that all the files it sends will conform (some users may have machines with different character sets). For this reason, many servers simply do not send the information at all, to avoid making any false promises. (it should also be noted that the specification in the http headers overrides a specification as a meta tag in the document itself which can be a pain if its set up wrong and you don't have the access or the knowlage to change it).
 
Browsers receiving a file with no character set information must make a blind assumption. The safest is probably to assume [[ISOwindows-1252]] (which is similar to iso-8859-1]] but has printable charactors in place of some control codes that are fobidden in html anyway), but it is also common for browsers to assume the character set native to the machine on which they are running. The consequence of choosing incorrectly is that characters outside the printable ASCII range (32 to 126) may appear incorrectly. This presents few problems for English-speaking users, but other languages require characters outside that range for everyday use. In [[CJK]] environments where there are several different multibyte encodings in use autodetection is often employed.
 
For maximum compatibility, it is increasingly common for multilingual websites to use the [[UTF-8]] encoding of the [[ISO 10646]]/[[Unicode]] character set, which provides a superset of almost all existing character sets.