Content deleted Content added
→See also: rm line break to make proper HTML list for screen readers |
Fix high-priority Linter errors. I hope you don't mind this minor cleanup edit in your user space. |
||
Line 32:
Sometimes the semicolon is erroneously omitted. The bot attempts to detect this and suggests a repair, subject to manual approval by the bot operator.
However, some entities are not checked for missing semicolons because they would cause too many false positives due to URLs of the form <nowiki>http://xxxxx.yyy?aaaa=....</nowiki>'''&bbbb'''<nowiki>=...</nowiki>. For example <
==== Numeric character references ====
Line 105:
More generally, <nowiki>[[A|B]]</nowiki> is simplified to <nowiki>[[B]]</nowiki> if A and B differ only trivially (first letter case-insensitive and disregarding leading and trailing blanks).
If A and B cannot be simplified, any leading and trailing blanks in the "A" part of <nowiki>[[A|B]]</nowiki> are removed; however, they are not removed in the "B" part of <nowiki>[[A|B]]</nowiki> or <nowiki>[[B]]</nowiki> (because we could have, for instance, <
A flag can be set to do further link simplification when certain conditions are fulfilled (such as the page at "A" being a redirect to B). This functionality is described at the [[User:Curpsbot-unicodify/redirects|/redirects]] sub-page. '''This is still in development''' and is currently turned off.
Line 173:
Numeric character reference (&#<num>;) or character entity references (&<name>;) are not converted when they represent ASCII characters (eg, &#39; &amp; &gt; &lt; &quot;). This is because such usage may be intended to avoid being considered wiki markup: for instance [http://en.wikipedia.org/w/index.php?title=Battle_of_Calabria&oldid=21409945]:
:<
where <
:''Warspite'''s 381 mm rounds
Line 180:
However, printable ASCII (not [[control character]]s or [[DEL]]) is almost always converted when it occurs in the form of %NN in link page names: for instance:
:<
The exceptions are for %5B ( [ ), %5D ( ] ) and %7C ( | ), which are converted to numeric character references instead because otherwise they would interfere with the <nowiki>[[ | ]]</nowiki> syntax. This is mostly hypothetical, since it's unlikely that these will ever occur in article titles.
Line 200:
==Missing semicolons==
The bot will try to detect missing final semicolons in character entity references (such as "<
: <
* Some entities are not checked because they are substrings of another entity (for instance, "&sigma" is not checked because there would be a false positive with every occurrence of "&sigmaf")
Line 207:
The bot will also try to detect missing final semicolons in numeric character references (such as "<
==Right-to-left and bidirectional text==
Line 226:
When these numeric character references are converted to Unicode, the appearance (in the browser's editor or in the diffs [http://en.wikipedia.org/w/index.php?title=Ani_Maamin&diff=22549317&oldid=17321508]) displays as:
[[<span dir="ltr">he:אני מאמין (פיוט)</span>]]
when really it should display as:
[[he:<span dir="rtl">אני מאמין (פיוט)</span>]]
Note that this is only a display issue: the actual underlying Unicode characters are all in proper sequence and the [[:he:אני מאמין (פיוט)|Hebrew interwiki link itself]] works fine and takes you to the correct page. The issue is that the browser display can't decide whehter the final closing parenthesis should attach to the preceding Hebrew letter ("ט") and display as "(" as a right-to-left closing parenthesis, or whether it should attach to the following ASCII character "]" and display as ")" as a left-to-right closing parenthesis.
When embedded within article text — like this: אני מאמין (פיוט) — there may also be display issues, but in this case it is sufficient to enclose the text within <span dir="rtl"> … </span> to make it display properly: <span dir="rtl">אני מאמין (פיוט)</span>.
In the case of Arabic or Hebrew interwiki links, I'll usually go ahead and manually approve the change: the convenience to Arabic- or Hebrew-speaking editors to be able to actually read the interwiki link (instead of dealing with &# soup) outweighs the single misplaced parenthesis. Other cases are handled on a case-by-case basis. In some especially complicated cases of embedding (for example [[Template:User ar-1]]) it will be preferable to leave the numeric character references rather than convert to Unicode.
|