Module talk:WikidataIB/Archive 7: Difference between revisions

Content deleted Content added
m Archiving 1 discussion(s) from Module talk:WikidataIB) (bot
m Archiving 1 discussion(s) from Module talk:WikidataIB) (bot
Line 210:
:::::Weird that I still see the underline on my browser (Windows/Chrome) even with text-decoration: none &mdash;&nbsp;Martin <small>([[User:MSGJ|MSGJ]]&nbsp;·&nbsp;[[User talk:MSGJ|talk]])</small> 18:48, 24 September 2020 (UTC) <div style="text-decoration: none; display: inline-block; width: 6em; background-color: #EEF;">Jane Belson&nbsp;[[#top|&#10000;]]</div>
:::::I'm obviously not understanding how to use the CSS properly &mdash;&nbsp;Martin <small>([[User:MSGJ|MSGJ]]&nbsp;·&nbsp;[[User talk:MSGJ|talk]])</small> 20:36, 26 September 2020 (UTC)
 
== Strip slash from the end of ___domain-only URLs ==
 
In the second and third infobox of [[Microsoft Office]], Wikidata value of <code><nowiki>https://office.com/</nowiki></code> is displayed as [https://office.com/ office.com/]. {{tlx|URL|<nowiki>https://office.com/</nowiki>}} is displayed as {{URL|https://office.com/}} (without a trailing slash), and I think the template got it right. Can someone change this module so that it removes the trailing slash if (and only if) the URL consists of only a ___domain name? (So e.g. <code><nowiki>https://example.org/page/</nowiki></code> would still display the trailing slash.) Thanks in advance! —[[User:Tacsipacsi|Tacsipacsi]] ([[User talk:Tacsipacsi|talk]]) 20:34, 23 September 2020 (UTC)
:I removed the slash on Wikidata. Problem solved? &mdash;&nbsp;Martin <small>([[User:MSGJ|MSGJ]]&nbsp;·&nbsp;[[User talk:MSGJ|talk]])</small> 08:18, 24 September 2020 (UTC)
:: {{re|Tacsipacsi}} I've fixed the code:
::* <code><nowiki>{{#invoke:WikidataIB |url2 |url=https://office.com}}</nowiki></code> → {{#invoke:WikidataIB |url2 |url=https://office.com}}
::* <code><nowiki>{{#invoke:WikidataIB |url2 |url=https://office.com/}}</nowiki></code> → {{#invoke:WikidataIB |url2 |url=https://office.com/}}
::* <code><nowiki>{{#invoke:WikidataIB |url2 |url=https://example.org/page/}}</nowiki></code> → {{#invoke:WikidataIB |url2 |url=https://example.org/page/}}
:: Let me know if you find any more problems. --[[User:RexxS|RexxS]] ([[User talk:RexxS|talk]]) 15:58, 24 September 2020 (UTC)
:::{{reply to|MSGJ}} No, that only hides the issue—there may be tens of thousands of Wikidata items (or even more) that still have trailing slash, and new ones can be created any time.
:::{{reply to|RexxS}} Your [https://en.wikipedia.org/w/index.php?title=Module:WikidataIB&diff=980096523 approach] of “everything that contains a period is a ___domain name” is not 100% correct, see e.g. <code><nowiki>{{#invoke:WikidataIB |url2 |url=https://example.org/wiki/index.php/}}</nowiki></code> → {{#invoke:WikidataIB |url2 |url=https://example.org/wiki/index.php/}}; it actually fails for internationalized ___domain names like [[.рф]] as well (<code><nowiki>{{#invoke:WikidataIB |url2 |url=http://кц.рф/}</nowiki></code> → {{#invoke:WikidataIB |url2 |url=http://кц.рф/}}), although those are probably not that common. <code>^([^/]+)/$</code> (i.e. anything that contains no slash except for the trailing one) looks better. —[[User:Tacsipacsi|Tacsipacsi]] ([[User talk:Tacsipacsi|talk]]) 20:34, 25 September 2020 (UTC)
:::: {{re|Tacsipacsi}} I think that https://example.org/wiki/index.php/ is an invalid url. As far as I know, it doesn't actually hurt to strip a trailing / from any url, but perhaps you can think of a counter-example? I agree about the non-ascii tlds, but I think I'll wait for an actual example to crop up so that I can see what might arise before I hammer the server with yet another code change. It had a tough day yesterday. --[[User:RexxS|RexxS]] ([[User talk:RexxS|talk]]) 22:58, 25 September 2020 (UTC)
:::::{{reply to|RexxS}} I don’t know where the structure of URLs is defined, but I’m pretty sure it’s valid—why would it not be? Actually, I’ve seen MediaWiki wikis with such URL scheme (e.g. the main page would be <code><nowiki>https://example.org/wiki/index.php/Main_Page</nowiki></code>), although unfortunately I don’t remember which ones. In the <code>index.php/</code> case, the trailing slash probably doesn’t matter, but https://en.wikipedia.org/wiki/Module_talk:WikidataIB is quite different from https://en.wikipedia.org/wiki/Module_talk:WikidataIB/. I could create a page named https://en.wikipedia.org/wiki/Module:WikidataIB.lua/, where dropping trailing slash would change the link to point to something entirely different (even though this actual example doesn’t seem to make much sense, but I hope you get the idea). —[[User:Tacsipacsi|Tacsipacsi]] ([[User talk:Tacsipacsi|talk]]) 23:35, 25 September 2020 (UTC)
:::::: {{re|Tacsipacsi}} The point is that a webserver will try to match the final segment with a file in the directory pointed to by the preceding part; if that does not exist, it will then attempt to treat the final segment as a directory and look inside that for index.htm, index.php, default.asp, etc. depending on its configuration. That means that http://www.metropolis2.co.uk/StRexx, http://www.metropolis2.co.uk/StRexx/ and http://www.metropolis2.co.uk/StRexx/index.htm all return the same file. You'll find that MediaWiki servers respond to urls of the form <code><nowiki>https://example.org/w/index.php?title=Main_Page</nowiki></code>, but that requests the index.php ''file'' and uses it to read the name of the required page from the title parameter. The "Main_Page" part isn't actually part of the path to the executable file. Because MediaWiki servers process the segment after /wiki/ internally, we are allowed to create pages whose titles contain characters that would not normally be allowed in names of parts of urls. You can see the issue when you try to construct the sandbox or doc page for https://en.wikipedia.org/wiki/Module:WikidataIB.lua/ – but that issue does not generally arise on normal webservers and is very unlikely to present a problem for the external websites found in our infoboxes. --[[User:RexxS|RexxS]] ([[User talk:RexxS|talk]]) 00:13, 26 September 2020 (UTC)
:::::::{{reply to|RexxS}} What do you mean by “normal webservers”? If an Apache server serving static HTML files from the <code>/var/www/html</code> directory, then yes, those don’t mind whether there’s a trailing slash after the final directory name. But I’m pretty sure these are quite a minority of today’s web traffic; it’s dominated with dynamic websites ranging from MediaWiki/Drupal/Joomla! running on [[LAMP (software bundle)|LAMP]], to Java application servers that do even TCP socket management on their own. Application servers process path information on their own and are free to decide whether the trailing slash is significant. (Probably most of the time it isn’t, but there’s no guarantee; I’ve seen websites that annoyingly consider trailing slash significant.) Yes, it’s unlikely that this causes problems, but it may. (By the way, I like the way [[c:Template:Wikidata Infobox]] is developed: active development takes place in the sandbox, and the main version is updated roughly monthly, so that the servers don’t need to reparse 3.1M pages too often. I also don’t see what’s the issue with creating [[Module:WikidataIB.lua//doc]]. Yes, its title looks ugly, but my point was that it’s possible, not that it’s nice.) —[[User:Tacsipacsi|Tacsipacsi]] ([[User talk:Tacsipacsi|talk]]) 18:40, 26 September 2020 (UTC)
:::::::: {{re|Tacsipacsi}} you assert "I’m pretty sure these are quite a minority of today’s web traffic". I disagree completely. I just tried out the first dozen or so articles from "What links here" for [[Template:Url]] and couldn't find one where adding a final slash didn't return exactly the same site. No sensible website admin would configure their webserver to give different results for the two cases. There's just too much room for visitors to add or omit a trailing slash. You need to show some concrete examples of where an official website shows the difference you claim occurs before I'll be convinced you're not worrying about a problem that doesn't exist. --[[User:RexxS|RexxS]] ([[User talk:RexxS|talk]]) 22:21, 26 September 2020 (UTC)
:::::::::{{reply to|RexxS}} No, I state that dynamic websites are the majority. These dynamic websites may, or may not, consider trailing slashes significant. For example, on the official website of the [[Eötvös Loránd University]], https://www.elte.hu/en/equalaccess is a live link, while https://www.elte.hu/en/equalaccess/ is 404. —[[User:Tacsipacsi|Tacsipacsi]] ([[User talk:Tacsipacsi|talk]]) 23:24, 26 September 2020 (UTC)
:::::::::: {{re|Tacsipacsi}} that's odd. The official website in the infobox of [[Eötvös Loránd University]] is given as https://www.elte.hu/en/ which works with or without the trailing slash. The official website in the ''External links'' section is fetched from Wikidata and is the Hungarian version at https://www.elte.hu/ (which also works with or without the trailing /). I think you've just got a misconfigured subpage in your example. --[[User:RexxS|RexxS]] ([[User talk:RexxS|talk]]) 23:41, 26 September 2020 (UTC)
:::::::::::{{reply to|RexxS}} This is the official website of the ELTE. Not its homepage, but an official website, which is what you asked for. If you want a {{Property|P856}} value, here you are: {{Q|Q31093231}}’s homepage runs on the same buggy software. By the way, I honestly don’t understand why you fight so much over this for the sake of fighting—except for not wanting to change, you haven’t shown any reason not to change to this more robust yet just as simple regexp. —[[User:Tacsipacsi|Tacsipacsi]] ([[User talk:Tacsipacsi|talk]]) 17:35, 27 September 2020 (UTC)
:::::::::::: {{re|Tacsipacsi}} I'm talking about how the call is used - in an infobox for an official website. I'm not trying to fight you, but I want to tease out the issues here. Before I retired I set up a lot of websites both dynamic and static for clients and I'm not used to finding customer-facing websites where mistyping a '/' would give them a 404. Nevertheless, it's been eight years since I retired and things change: I'm merely interested in the change so I don't make mis-assumptions in future. I've already changed the regex to your version in my master copy of the module, but I haven't updated the live version yet. Did you miss my comment "{{tq|I agree about the non-ascii tlds, but I think I'll wait for an actual example to crop up so that I can see what might arise before I hammer the server with yet another code change}}"? If you think we've thrashed out all the issues, then I can make the new version live, but I didn't think there was any rush. --[[User:RexxS|RexxS]] ([[User talk:RexxS|talk]]) 21:55, 27 September 2020 (UTC)
:::::::::::::{{reply to|RexxS}} There’s no rush, but I felt you don’t want to implement this change at all. I haven’t missed that comment, but probably misunderstood it. I’ve always been confident with my approach (and don’t even think retrospectively that this confidence was unjustified at any point), so it’s not me who’s yet to come to the conclusion that everything is fine. —[[User:Tacsipacsi|Tacsipacsi]] ([[User talk:Tacsipacsi|talk]]) 21:00, 28 September 2020 (UTC)
:::::::::::::: {{re|Tacsipacsi}} I've had to fix some errors caused by trying to solve the [[#Wrapping of pencil icon]] thread above, so I took the opportunity to implement your regex for url2 at the same time. Please let me know if you spot any problems now. Thanks --[[User:RexxS|RexxS]] ([[User talk:RexxS|talk]]) 14:36, 29 September 2020 (UTC)
 
=== Add wbr for / ===
Slightly related to the above. The {{tl|url}} template seems to cause the url to line break nicely so it takes up less horizontal space. But when I used this module, the amount of space taken up by the infobox increased (on my browser) and it now looks quite ugly. Please compare [https://en.wikipedia.org/w/index.php?title=Northern_Spire_Bridge&oldid=945682399 this version] to [https://en.wikipedia.org/w/index.php?title=Northern_Spire_Bridge&direction=next&oldid=945682399 this version] &mdash;&nbsp;Martin <small>([[User:MSGJ|MSGJ]]&nbsp;·&nbsp;[[User talk:MSGJ|talk]])</small> 14:16, 24 September 2020 (UTC)
:
: {{re|MSGJ|label=Martin}} That's odd. I gave a solution for that in response to [[Module talk:WikidataIB/Archive 7 #Add more wbr in URLs]], but either I never implemented it, or it was lost in the sandbox shuffling. Just put it down to senility. I've implemented the fix:
:* <code><nowiki>{{#invoke:WikidataIB |url2 |url=https://www.sunderland.gov.uk/article/14608/Northern-Spire}}</nowiki></code> in a 14em column gives: <div style="display: inline-block; width: 14em; background-color: #F4F4FF;">{{#invoke:WikidataIB |url2 |url=https://www.sunderland.gov.uk/article/14608/Northern-Spire}}</div>
: Let me know if that doesn't solve the problem for you. Cheers --[[User:RexxS|RexxS]] ([[User talk:RexxS|talk]]) 14:53, 24 September 2020 (UTC)
::I was using getValue|P856. Should I change to use url2? &mdash;&nbsp;Martin <small>([[User:MSGJ|MSGJ]]&nbsp;·&nbsp;[[User talk:MSGJ|talk]])</small> 16:55, 24 September 2020 (UTC)
:::
::: I wrote url2 specifically to format values fetched from Wikidata (they could be blank, of course, which trips up {{tl|url}}). I would expect it to be used something like this in an infobox:
:::* <code><nowiki>|data99 = {{#invoke:WikidataIB |url2 |url={{wdib |P586 |qid={{{qid|}}} |fwd=ALL |osd=n |{{{website|}}} }} }}</nowiki></code>
::: where {{para|website}} is a local parameter that will override fetching from Wikidata. The template {{tl|wdib}} is just a convenience wrapper for <code><nowiki>{{#invoke:WikidataIB |getValue}}</nowiki></code> --[[User:RexxS|RexxS]] ([[User talk:RexxS|talk]]) 17:39, 24 September 2020 (UTC)
::::Thanks - that's working well now &mdash;&nbsp;Martin <small>([[User:MSGJ|MSGJ]]&nbsp;·&nbsp;[[User talk:MSGJ|talk]])</small> 13:43, 25 September 2020 (UTC)