Module talk:Lang-zh: Difference between revisions

Content deleted Content added
 
(5 intermediate revisions by 3 users not shown)
Line 21:
{{archive box | auto=yes }}
__TOC__
 
== Trailing bold in l= not being removed ==
 
In <syntaxhighlight>{{zh|t=竹子林站|j=Zuk1 Zi2 Lam4 Zaam6|l = '''Bamboo Forest station'''}}</syntaxhighlight>, the opening bold markup is properly removed, but the trailing bold markup is not removed. It looks like the regular expression at <syntaxhighlight>term = string.gsub(term, "^([ \"']*)(.*)([ \"']*)$", "%2")</syntaxhighlight> needs some adjustment to the middle wildcard search. – [[User:Jonesey95|Jonesey95]] ([[User talk:Jonesey95|talk]]) 13:23, 16 September 2024 (UTC)
 
:{{ping|Jonesey95}} This is because the * operator is greedy, so .* matches everything else in the string. Changing .* to .*? would make it lazy, so that the final term catches all trailing characters. In other words, change the line of code to: <syntaxhighlight>term = string.gsub(term, "^([ \"']*)(.*?)([ \"']*)$", "%2")</syntaxhighlight> [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 13:51, 16 September 2024 (UTC)
::Thanks! That fixed the problem at [[Zhuzilin station]] and probably other pages. – [[User:Jonesey95|Jonesey95]] ([[User talk:Jonesey95|talk]]) 17:26, 16 September 2024 (UTC)
::Thank you for fixing my shoddy regex, by the way. <span style="border-radius:2px;padding:3px;background:#1E816F">[[User:Remsense|<span style="color:#fff">'''Remsense'''</span>]]<span style="color:#fff">&nbsp;‥&nbsp;</span>[[User talk:Remsense|<span lang="zh" style="color:#fff">'''论'''</span>]]</span> 13:05, 19 September 2024 (UTC)
:::{{re|Jonesey95|Remsense}} On further reflection, this doesn't work as intended. I had thought the string was a regex, but it is in fact a Lua pattern, which is slightly different. The Lua equivalent of *? is - which would give: <syntaxhighlight>term = string.gsub(term, "^([ \"']*)(.-)([ \"']*)$", "%2")</syntaxhighlight> Writing .*? in Lua (as I suggested above) actually means greedily matching all characters (.*) followed by a single question mark (? can also be an operator, but Lua pattern operators can't be nested so in this context it is interpreted as a literal). So actually the new pattern usually doesn't make a substitution, unless there is a question mark. This means it usually fails, e.g. where there are multiple glosses separated by commas and spaces, the spaces are not stripped. However, looking at what the pattern match applies to, I'm not completely sure I understand why the quotes should be stripped in the first place (is there a set of testcases to check against?). At [[Zhuzilin station]], the current code makes no substitution, and so it keeps the bold formatting, presumably as intended. The old code meant that the bold formatting was stripped at the beginning and not the end, so the rest of the article became bold (which was a bad and confusing error). Correcting .*? to .- as above would strip both, making it impossible to add bold formatting. Is the intention to catch cases where an editor unnecessarily adds quotes to the gloss? Is this a common problem? If so, is removing the ability to add bold and italic formatting a fair price to pay?
::: If we want to strip one quote mark but no more (so that we catch editors manually adding quotes, but allow formatting), pattern matching is a bit more complicated. I think it would be easiest to separate the stripping of whitespace and quotes. When stripping one single quote, we need to check that there isn't more than one, but we also need to allow the string to contain an apostrophe (so we can't just use [^']- in the middle) and a gloss could potentially be a single character (so we can't just use [^'].-[^'] in the middle). So it seems easiest to strip the leading and trailing quotes separately. This gives three lines (I've also removed two sets of brackets that were capturing substrings that weren't used): <syntaxhighlight>term = string.gsub(term, "^ *(.-) *$", "%1")
term = string.gsub(term, "^[\"']?([^\"'].-)$", "%1")
term = string.gsub(term, "^(.-[^\"'])[\"']?$", "%1")</syntaxhighlight> [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 15:43, 24 September 2024 (UTC)
::::I think it's fine to strip all quote marks, in any quantity. That was the original intent of the code, and I don't see any complaints on this page. Adding bold to text is probably against [[WP:MOS]], and adding italics should be done with a parameter. People can use {{tag|b}} and {{tag|i}} tags if they insist on them. – [[User:Jonesey95|Jonesey95]] ([[User talk:Jonesey95|talk]]) 15:51, 24 September 2024 (UTC)
:::::Okay. I had taken your comment about fixing the Zhuzilin station article to mean that keeping the bold markup was intended, but I can see why it could be discouraged. I've also just found [[Template:Lang-zh/testcases]] (I had only looked under Module:Lang-zh before), and I don't see any testcases for stripping markup. So, if stripping markup is the desired functionality, the .- version above would work. I think it would make sense to document this, since there are three different kinds of thing being stripped: whitespace, markup, and quotes (double quotes aren't markup). It could be documented either on [[Template:Lang-zh/doc]] or directly as a code comment next to the line we're discussing, e.g. "remove trailing and leading spaces, quotes, and bold/italic markup". [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 20:39, 24 September 2024 (UTC)
:::::Currently, this stripping only applies to literal glosses and not translations, but they should reasonably be treated the same. So, fixing the pattern, matching all whitespace (not just spaces), expanding the comments, and applying the same to the translation, I suggest changing lines 236-247 to the following:<syntaxhighlight> elseif (part == "l") then
local terms = ""
-- put individual, potentially comma-separated glosses in single quotes
-- (first strip leading and trailing whitespace and quotes, including bold/italic markup)
for term in val:gmatch("[^;,]+") do
term = string.gsub(term, "^([%s\"']*)(.-)([%s\"']*)$", "%2")
terms = terms .. "&apos;" .. term .. "&apos;, "
end
val = string.sub(terms, 1, -3)
elseif (part == "tr") then
-- put translations in double quotes
-- (first strip leading and trailing spaces and quotes, including bold/italic markup)
val = string.gsub(val, "^([%s\"']*)(.-)([%s\"']*)$", "%2")
val = "&quot;" .. val .. "&quot;"
end</syntaxhighlight> [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 09:31, 25 September 2024 (UTC)
:::::{{re|Jonesey95|Remsense}} What do you think? Are you happy with the above suggestion?
{{edit template-protected|Module:Lang-zh|answered=yes}}
:::::Also, instead of directly using a Lua string pattern, it might be more readable and maintainable to use an existing function for stripping leading and trailing characters, namely mw.text.trim:<syntaxhighlight> elseif (part == "l") then
local terms = ""
-- put individual, potentially comma-separated glosses in single quotes
-- (first strip leading and trailing whitespace and quotes, including bold/italic markup)
for term in val:gmatch("[^;,]+") do
term = mw.text.trim(term, "%s\"'")
terms = terms .. "&apos;" .. term .. "&apos;, "
end
val = string.sub(terms, 1, -3)
elseif (part == "tr") then
-- put translations in double quotes
-- (first strip leading and trailing spaces and quotes, including bold/italic markup)
val = mw.text.trim(val, "%s\"'")
val = "&quot;" .. val .. "&quot;"
end</syntaxhighlight> [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 09:02, 27 September 2024 (UTC)
::::::{{re|Jonesey95|Remsense}} pinging again. The current code inserts quotes incorrectly, e.g. on the following pages you can see an opening quote followed by a space: [[Gun (staff)]], [[Indonesian slang]], [[Ping On]]. The code above would fix this. [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 14:23, 8 October 2024 (UTC)
::::::: Reping {{u|Remsense}}. Could you please look at this? I think you're one of the only template editors who has enough understanding of Chinese orthography to understand what is being changed here and why. [[User:Pppery|* Pppery *]] [[User talk:Pppery|<sub style="color:#800000">it has begun...</sub>]] 17:19, 16 November 2024 (UTC)
::::::::Will have this looked at ASAP. Huge apologies for letting it slip through the cracks for months. <span style="border-radius:2px;padding:3px;background:#1E816F">[[User:Remsense|<span style="color:#fff">'''Remsense'''</span>]]<span style="color:#fff">&nbsp;‥&nbsp;</span>[[User talk:Remsense|<span lang="zh" style="color:#fff">'''论'''</span>]]</span> 23:39, 16 November 2024 (UTC)
:::::::::{{done}}{{snd}}so sorry for the delay, again. <span style="border-radius:2px;padding:3px;background:#1E816F">[[User:Remsense|<span style="color:#fff">'''Remsense'''</span>]]<span style="color:#fff">&nbsp;‥&nbsp;</span>[[User talk:Remsense|<span lang="zh" style="color:#fff">'''论'''</span>]]</span> 23:37, 18 November 2024 (UTC)
 
== The unnamed parameter ==
Line 79 ⟶ 29:
 
::{{re|Folly Mox|Toadspike}} I agree that this is confusing behaviour. It looks like the module currently only processes an unnamed argument at the end, on lines 287-309, in the case that it has not constructed any output. This section of code also duplicates some of the code earlier in the module, which is bad practice. As well as fixing the problem above, it would also be simpler and more maintainable to remove this section of code and instead map {{para|1}} to {{para|c}} at the beginning (e.g. just after line 103, where two other aliases are defined). [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 15:37, 22 April 2025 (UTC)
 
== Use / between simplified and traditional when labels=no ==
 
I was looking at [[Chinese classifier]], which uses a lot of inline Chinese, and realized that it would be better, when labels are off, to have a slash between the simplified and traditional versions of a phrase instead of the current semicolon. As an example, "The classifier 个; 個, pronounced..." would look better as "The classifier 个/個, pronounced...". Since these are alternate versions of the same character (which is not explained when labels are off), I think the slash conveys this better than the semicolon. [[User:Toadspike|<span style="color:#21a81e;font-variant: small-caps;font-weight:bold;">'''Toadspike'''</span>]] [[User talk:Toadspike|<span style="color:#21a81e;font-variant: small-caps;font-weight:bold;">[Talk]</span>]] 21:12, 30 January 2025 (UTC)
 
== no-merging s and t ==
Line 107 ⟶ 53:
::{{tqbm|In your updated version, you have used the {{para|l}} parameter, but I would have put "Cheung Po the Kid" as the value, per the linked article.}} I just used what was already there and it's the title of the linked article. I don't speak Chinese anyway so I wouldn't feel comfortable changing it. But this is beside the point of my question. — [[User:W.andrea|W.andrea]] ([[User talk:W.andrea|talk]]) 13:53, 2 May 2025 (UTC)
:::I see what you mean. This module appears to separate each parameter with semicolons. The list of parameters in lines 18–52 are undifferentiated. I think someone would have to adjust the module code to precede "lit." with a comma. – [[User:Jonesey95|Jonesey95]] ([[User talk:Jonesey95|talk]]) 17:24, 2 May 2025 (UTC)
 
== L switch throwing Linter errors when value ends with a closing italics tag ==
 
Just noticed this on the [[Mangtong]] page while clearing old "missing end tag" errors:
 
{|-
!Code...
|
!Renders as...
|-
|<code><nowiki>A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l=reformed ''mangtong''}}), was developed in the 20th century.</nowiki></code>
|
|A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l= reformed ''mangtong''}}), was developed in the 20th century.
|}
 
 
It seems like passing a value ending with an italicized value via the <code>l=</code> parameter throws the Linter error. Adding a &nbsp or equivalent parameter after the closing italics tag won't resolve the error...
 
{|-
|<code><nowiki>A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l=reformed ''mangtong''&nbsp;}}), was developed in the 20th century.</nowiki></code>
|
|A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l= reformed ''mangtong''&nbsp;}}), was developed in the 20th century.
|-
|<code><nowiki>A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l=reformed ''mangtong''{{nbsp}}}}), was developed in the 20th century.</nowiki></code>
|
|A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l= reformed ''mangtong''{{nbsp}}}}), was developed in the 20th century.
|-
|<code><nowiki>A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l=reformed ''mangtong'' }}), was developed in the 20th century.</nowiki></code>
|
|A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l= reformed ''mangtong'' }}), was developed in the 20th century.
|}
 
 
The only solution appears to be if a non-apostrophe/non-quotation after the close-italics tag but before the }}:
 
{|-
|<code><nowiki>A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l=′reformed ''mangtong''′}}), was developed in the 20th century.</nowiki></code>
|
|A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l=′reformed ''mangtong''′}}), was developed in the 20th century.
|}
 
...which seems a bit of a hack.
 
 
Interestingly, attempting to italicize the entire value results in both italics tags being ignored (same as when leaving italics tags out altogether)...
 
{|-
|<code><nowiki>A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l=''reformed mangtong''}}), was developed in the 20th century.</nowiki></code>
|
|A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l=''reformed mangtong''}}), was developed in the 20th century.
|-
|<code><nowiki>A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l=reformed mangtong}}), was developed in the 20th century.</nowiki></code>
|
|A modernized version of the ''mangtong'', called ''gǎigé mángtǒng'' ( {{zh|c=改革芒筒|l=reformed mangtong}}), was developed in the 20th century.
|}
 
Any chance this can be fixed?
 
[[User:SirOlgen|SirOlgen]] ([[User talk:SirOlgen|talk]]) 16:28, 21 August 2025 (UTC)
 
:So to put it in non-Lint terms, it doesn't appear as though this template's "literal" parameter is properly handling italics tags which occur either first or last in the passed string. [[User:SirOlgen|SirOlgen]] ([[User talk:SirOlgen|talk]]) 18:09, 21 August 2025 (UTC)
::Yes, see this previous discussion: [[Module_talk:Lang-zh/Archive_5#Trailing_bold_in_l=_not_being_removed]]. A workaround is to use HTML italic tags, e.g. <code><nowiki>{{zh|l=reformed <i>mangtong</i>}}</nowiki></code> for {{zh|l=reformed <i>mangtong</i>}}. The stripping of bold and italic markup is commented in the code but not mentioned in the template documentation. At the time of the previous discussion, I wasn't convinced that stripping quotation marks was necessary, but I also didn't know of any use cases where it would cause a problem and so I didn't push the point further. However, this looks like a valid use case, and I think we should revisit the question of why quotes should be stripped. In contrast, this doesn't happen for {{tl|lit}}, which otherwise has extremely similar functionality to the {{para|l}} parameter. [[User:Freelance Intellectual|Freelance Intellectual]] ([[User talk:Freelance Intellectual|talk]]) 21:57, 22 August 2025 (UTC)
:::Thanks a million for the background info and workaround (I'm embarrassed for not having thought to try that, LOL). This does seem like a pretty obscure use case, but it also seems a near certainty there will be more examples out there among the 1.9M outstanding "[[Special:LintErrors/missing-end-tag|missing end tag]]" errors.
:::Thanks again!
:::[[User:SirOlgen|SirOlgen]] ([[User talk:SirOlgen|talk]]) 02:11, 24 August 2025 (UTC)