Content deleted Content added
→best practices for image information fields: new section |
|||
(15 intermediate revisions by 3 users not shown) | |||
Line 136:
{{ping|JPxG}} About [[Special:Diff/1139235436]]: it looks like [[User:WegweiserBot|WegweiserBot]] is using display names for author names, but [[WP:SPT]] is using usernames. This is going to cause the two scripts to keep overriding each other, so we should choose one approach. I would prefer to include only username, as that makes it easier to do things like make user links. I can see that there would be a case for listing display names instead, though. Or if we really want, we could include both, in tables like <syntaxhighlight inline lang="lua">{user = "Bluerasberry", display = "Lane Rasberry"}</syntaxhighlight>. What do you think would be best? — '''''[[User:Mr. Stradivarius|<span style="color: #194D00; font-family: Palatino, Times, serif">Mr. Stradivarius</span>]]''''' <sup>[[User talk:Mr. Stradivarius|♪ talk ♪]]</sup> 04:28, 14 February 2023 (UTC)
:{{ping|Mr. Stradivarius}} Sorry for not having responded to this for a million years (I thought I had at the time). I pondered this for a while; what I eventually came up with was for each writer to be credited under one canonical name, which in a great many cases was the person's real name -- or failing that, whatever name they want to be credited under. Just going by username would simplify the process of linking to their userpage, but it's sometimes the case that we run articles from people with different home wikis, or Heaven forfend, people who aren't Wikimedians at all -- so I think whatever scheme we use should account for this. The idea of having two fields for the author does put another idea in my head, which is to have some kind of structured data for authors.
:Most newspapers and magazines have bylines for authors, which appear under their articles, and on the CMS's author page (e.g. https://slate.com/author/mary-c-curtis shows Mary Curtis's articles, but also "Mary C. Curtis is a columnist at Roll Call and host of its Equal Time podcast" etc etc). It occurs to me that this would be nice to do for ''Signpost'' articles as well, which would allow for the freedom to link to whatever someone wanted (like say they wanted the link to go to their blog or personal website), as well as to give more detailed bylines than just a name.
:I don't know how complicated it would be to implement something like that in the module -- my guess is "very" -- so it is more of a pipe dream I'm plopping out here than a thing I think is likely to actually happen, but it is worth considering. Maybe. '''[[User:JPxG|jp]]'''×'''[[User talk:JPxG|g]]''' 08:19, 6 August 2023 (UTC)
::On that matter, I absolutely do not appreciate you/the bot having mass-changed all these bylines in old stories some months ago without consulting with the rest of the team first. There are various arguments against it, for example that it is ahistorical to retroactively change the byline of something that was published over a decade ago - it does make a difference whether a story was written under a pseudonym or a real name, for instance. But however one weighs the pros and cons here (also about which version of a contributor's name to standardize on), the main point is that it is not a technical decision but a content decision, one that can also greatly affects [[Search engine results page|SERPs]] for people's names btw.
::Regards, [[User:HaeB|HaeB]] ([[User talk:HaeB|talk]]) 20:59, 6 August 2023 (UTC)
:::Perhaps this was dumb. I did post about this to [[Wikipedia_talk:Wikipedia_Signpost/Technical#Byline_pages|Wikipedia talk:Wikipedia Signpost/Technical]] back in January; perhaps some more discussion would have been good. From the height of a few months, this seems like it would have been a better choice, but at the time, I was just trying to make author search function. What I discovered was that the lack of machine-readable Signpost archives was concealing a [[Special:Permalink/1136012646|bizarre shitpile]] of some 1,212 distinct author fields, about [[Wikipedia:Wikipedia Signpost/Statistics/Authors|300 of which]] were nonsense. For example, sorting alphabetically gives <code>[User:GerardM</code>, <code><nowiki>{{{2}}}</nowiki></code>, <code>§hep</code>, <code>\/</code>, <code>+sj +</code>, <code>2 May</code>, <code>3 July 2006</code>, <code>03 July 2006</code>, <code>3family6</code>, <code>3family6 1 April 2016 19:58 (UTC)</code>, <code>05 November 2007</code>, <code>10 other editors</code>, <code>11 other editors</code>, <code>12 August 2015</code>, <code>14 April</code>, <code>18 August</code>, <code>19 November 2012</code>, <code>22mikpau</code>, <code>24 April</code>, <code>24 Apri</code>, <code>26 April 2010</code>, <code>27 other editors,</code>, <code>28 other editors</code>, <code>32 other editors</code>, <code>38 editors</code>, <code>51 other editors</code>, <code>53 other editors</code>, <code>79 other editors</code>, <code>91 editors</code>, <code>106 editors on the French Wikipedia; translated for The Signpost by JohnNewton8</code>, <code>273 other editors</code>, <code>1233</code>, <code>2008</code>, <code>16912 Rhiannon 15 July 2015</code> (literally all of which are parsing errors except for three); your articles, specifically, were under <code>HaeB</code>, <code>Tilman Bayer</code>, <code>Tilman Bayer</code>, <code>Tilman Bayer 1 April 2016 19:58 (UTC)</code>, <code>Tilman Bayer 03:08 (UTC)</code> (two of those ''appearing'' to be identical strings, but one of them with a nonprinting character). I guess what I am trying to say here is that if you want me to de-alias your name changes, I can do that, but I think that given the volume of work being done it was not something I could realistically post about at [[WT:Signpost]] (I recall that around December/January you had been saying we should be stricter about posting stuff on the right talk pages, which is why it was at /Technical). '''[[User:JPxG|jp]]'''×'''[[User talk:JPxG|g]]''' 23:14, 6 August 2023 (UTC)
::::Appreciate your reply, but my point above was exactly that this is a major content decision (and one affecting authors - and interested readers! - who may not be interested in following the Signpost's technical side). In that repsect, /Technical was the wrong place for such discussions (at least without announcing them in more content-focused locations too, which I don't recall seeing at the time).
::::As for the specific content examples you're raising: Yes, I'm sure there were a lot of uncontroversial fixes among these mass changes, but also a lot of deliberate choices being altered. (I wasn't mainly thinking about my own bylines here, but apart from nonprinting characters etc., these were also not accidentally different.) Lastly, I would recommend to distinguish between building a more systematic, consistent way for handling bylines in future editions (great!) and retroactively changing reader-facing content in old issues. Regards, [[User:HaeB|HaeB]] ([[User talk:HaeB|talk]]) 00:02, 7 August 2023 (UTC)
== JSON for the indices ==
{{ping|Mr. Stradivarius}} Per [https://www.mediawiki.org/wiki/Extension:Scribunto/Lua_reference_manual#mw.loadJsonData this], it's now possible to have Lua modules load JSON rather than Lua tables. I think this would be a lot better to work with (i.e. all utilities wouldn't have to constantly serialize and deserialize Lua tables using the idiosyncratic whitespace/indentation/etc format). Additionally, a page with the JSON content model would be constrained and sanitized by MediaWiki, rather than a Lua table which can just have wrong stuff in it etc. I would like to write some more utilities to work with these indices but the Lua tables are kind of an awkward sticking point. What would the procedure be for converting them? I would be able to write patches for the Signpost tagger and Wegweiser (and may be able to help with the Lua module itself). <b style="font-family: monospace; color:#E35BD8">[[User:JPxG|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG|<b style="color: #029D74">g</b>]][[User talk:JPxG|🗯️]]</b> 23:52, 8 December 2023 (UTC)
:{{ping|JPxG}} I guess you could do that, but you would still have the issue of making sure the JSON is indented and sorted the same way across all tools that write to the data pages. Just switching to JSON doesn't guarantee that all JSON parsers format everything the exact same way. Also, you can't save syntactically incorrect Lua pages; Mediawiki won't allow it. So I don't think there is a real difference between Lua and JSON there. One pitfall is that you would have to change the content model to JSON when creating the data pages, which as far as I'm aware only admins can do. This would mean that WegweiserBot would have to become an admin bot, which would require another BRFA. Also, non-admin users would no longer be able to use SignpostTagger to tag pages for which the data page is not yet created; they would have to wait until either WegweiserBot or an admin user changes the content model. All in all, to me this sounds like a solution looking for a problem. — '''''[[User:Mr. Stradivarius|<span style="color: #194D00; font-family: Palatino, Times, serif">Mr. Stradivarius</span>]]''''' <sup>[[User talk:Mr. Stradivarius|♪ talk ♪]]</sup> 08:47, 9 December 2023 (UTC)
::My thinking is mostly that converting the JSON to Lua tables adds another dependency for every program that puts things into or out of the indices; JSON is a pretty widely used standard and there's lots of utilities available for parsing it. It just seems like it's more straightforwardly compatible with what exists now, and more forward-compatible with whatever the future brings.
::As for the content models, I hadn't thought about that. It is true that [[WP:TEMPLATEEDITOR|TE]]s can change content models, but that's still not very great. Fortunately my RfA was successful so I'd be happy to manually pre-create and set content models for blank JSON indices up to, say, 2060 or so ;) <b style="font-family: monospace; color:#E35BD8">[[User:JPxG|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG|<b style="color: #029D74">g</b>]][[User talk:JPxG|🗯️]]</b> 02:41, 10 December 2023 (UTC)
== New subheading and image fields ==
Lately it has occurred to me that the way old Signpost archives get generated is very ass-backwards. We ''have'' a giant database of every article, its title, its author, et cetera... we're just not using it. It does get used ''sometimes'', like in [[Wikipedia:Wikipedia_Signpost/Templates/Single_talk]]. But for the archive issues, we have hundreds of individual pages, like [[Wikipedia:Wikipedia Signpost/Single/2022-11-28]], that redundantly store article titles, subheadings, et cetera. The modifications I'm working on now (which I've incorporated into the snippet template and the publishing script) allow articles to be associated with custom images, so the archives will have that too, but then this creates a problem: modifying an image for an article requires me to edit the article, then update [[Wikipedia:Wikipedia Signpost]], then update [[Wikipedia:Wikipedia Signpost/2023-12-04]], and such.
Anyway: I'd like to, as much as possible, use the module for stuff instead of static wikitext pages. However, this will again require some more fields to be added. Right now, my thinking is:
* <code>subhed</code> (subheading, or blurb, or whatever) -- string, nowadays this is a sentence or so but in some old issues it's GIGANTIC (paragraph or more). Has various random crap in it (templates, quotation marks, etc).
* <code>piccy</code> -- string, image file for the article. These, ideally, would be a 1:1 aspect ratio, but might sometimes not be, which brings me to:
* <code>piccy-meta</code> -- coordinates for CSS crop of the image. I am not 100% on this. It might just be four CSS crop coordinates. But I am also contemplating other attributes, like filters (what if we want to desaturate an image, etc).
Like above, I am fine to incorporate these into the publishing script, Wegweiser, and the tagging script, but may need some assistance with the Lua part (and I don't know if this should be done before or after the lua table-json thing). {{ping|Mr. Stradivarius}} what do you think? <b style="font-family: monospace; color:#E35BD8">[[User:JPxG|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG|<b style="color: #029D74">g</b>]][[User talk:JPxG|🗯️]]</b> 00:04, 9 December 2023 (UTC)
:{{ping|JPxG}} This sounds fine to me, although I would avoid abbreviating the fields: <code>subheading</code> and <code>image</code> should be fine. I would also make "image" a table, so you could do something like <syntaxhighlight lang="lua" inline>image = {filename = "Example.png", width = 100, height = 100}</syntaxhighlight>, where "width" and "height" (or whatever metadata you need) are optional. I'm not aware of any way to crop an image in Mediawiki using inline styles - I think we are limited to the options provided at [[mw:Help:Images/en]], but let me know if I'm missing something. Best — '''''[[User:Mr. Stradivarius|<span style="color: #194D00; font-family: Palatino, Times, serif">Mr. Stradivarius</span>]]''''' <sup>[[User talk:Mr. Stradivarius|♪ talk ♪]]</sup> 09:03, 9 December 2023 (UTC)
::{{ping|Mr. Stradivarius}} I was able to figure out what was going on in {{tl|CSS crop}} well enough to implement it (far slimmer) on the snippet template; specifics (and examples) at [[Wikipedia talk:Wikipedia Signpost/Newsroom#Template for article images]]. Here is what the template looks like with all its arguments;
<nowiki>{{Signpost/snippet|2023-12-04|Essay|I am going to die|And so are you.|0.3 MB|sub=0.3 MB|by=[[User:WhatamIdoing|WhatamIdoing]]|pic=File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG|credit-name=William Michael Harnett|credit-license=PD|pic-p=800|pic-x=350|pic-y=100}}</nowiki>
{{Signpost/snippet|2023-12-04|Essay|I am going to die|And so are you.|0.3 MB|sub=0.3 MB|by=[[User:WhatamIdoing|WhatamIdoing]]|pic=File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG|credit-name=William Michael Harnett|credit-license=PD|pic-p=800|pic-x=350|pic-y=100}}
:After actually working it out enough to build something that functioned, it turns out there are only three meta attributes needed to fully specify a scale and crop: <code>scale</code>, <code>x</code> and <code>y</code>. All are integers, although I think it might also be useful to permit values like <code>top</code>, <code>bottom</code>, <code>left</code>, <code>right</code> and <code>center</code> (these aren't handled by the template yet but they could be in the future). There are also two more params I didn't think of above, <code>author</code> and <code>license</code>, which are necessary for image attribution. <b style="font-family: monospace; color:#E35BD8">[[User:JPxG|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG|<b style="color: #029D74">g</b>]][[User talk:JPxG|🗯️]]</b> 03:59, 11 December 2023 (UTC)
=== Subheadings ===
Need to be supported by various scripts in order to work properly.
*Accepts {{yeac}} processes {{yeac}} Wegweiser
*Accepts {{yeac}} processes {{yeac}} Signpost tagger
*Accepts {{yeac}} processes {{yeac}} Module
I have rewritten Wegweiser to fetch metadata from parsing article wikitext instead of HTML pages; this posed some slight difficulties with respect to user tags in author fields but is now good. It can now work a lot faster, and it also provides subheadline metadata. There's a line I have commented out right now in the script, but can enable to make it store subheading data when it parses metadata. When the subheading data is in the module indices, the module still works fine to retrieve articles etc (of course it can't do anything to ''parse'' or use the subheading yet, but it doesn't break anything). However, the Signpost tagger chokes trying to save tags for articles with subheading data and won't work on them, so I won't do all of the indices with it for now. <b style="font-family: monospace; color:#E35BD8">[[User:JPxG|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG|<b style="color: #029D74">g</b>]][[User talk:JPxG|🗯️]]</b> 22:06, 15 December 2023 (UTC)
:Now done. <b style="font-family: monospace; color:#E35BD8">[[User:JPxG|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG|<b style="color: #029D74">g</b>]][[User talk:JPxG|🗯️]]</b> 12:23, 23 December 2023 (UTC)
== best practices for image information fields ==
So right now I've integrated into Wegweiser and SignpostTagger the fields for piccy information, like this:
<pre> {
date = "2023-12-04",
subpage = "Essay",
title = "I am going to die",
authors = {"WhatamIdoing"},
tags = {"essay"},
views = {d007 = 1526, d015 = 1919, d030 = 2029, d060 = 2029, d090 = 2029, d120 = 2029, d180 = 2029},
piccycredits = "William Michael Harnett",
piccyfilename = "File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG",
piccylicense = "PD",
piccyscaling = "400",
piccyxoffset = "70",
piccyyoffset = "",
subhead = "And so are you.",
},
</pre>
496 chars. This seems, uh, stupid. It works but I have temporarily reverted. Since these six fields are all about the same thing, there's no good reason to have them occupy six whole fields -- they should probably just be a dict like the viewcounts are. Like such:
<pre> {
date = "2023-12-04",
subpage = "Essay",
title = "I am going to die",
authors = {"WhatamIdoing"},
tags = {"essay"},
views = {d007 = 1526, d015 = 1919, d030 = 2029, d060 = 2029, d090 = 2029, d120 = 2029, d180 = 2029},
piccy = {filename = "File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG", credits = "William Michael Harnett", license = "PD", scaling = "400", xoffset = "70", yoffset = ""}
subhead = "And so are you.",
},</pre>
467. But, upon thinking this thought, something rather devious popped into my mind: aren't these labels kind of long? It seems pretty insubstantial, but... there are a lot of articles. Six letters for each key, across {{Signpost/Number of articles}} (currently 5462) articles (plus 3 extra chars for the <code> = </code> necessitated by using a label at all) is {{#expr: {{Signpost/Number of articles}} * 9}} bytes (currently 49158). For reference, the size of all extant module indices is {{#expr: {{PAGESIZE:Module:Signpost/index/2005|R}} + {{PAGESIZE:Module:Signpost/index/2006|R}} + {{PAGESIZE:Module:Signpost/index/2007|R}} + {{PAGESIZE:Module:Signpost/index/2008|R}} + {{PAGESIZE:Module:Signpost/index/2009|R}} + {{PAGESIZE:Module:Signpost/index/2010|R}} + {{PAGESIZE:Module:Signpost/index/2011|R}} + {{PAGESIZE:Module:Signpost/index/2012|R}} + {{PAGESIZE:Module:Signpost/index/2013|R}} + {{PAGESIZE:Module:Signpost/index/2014|R}} + {{PAGESIZE:Module:Signpost/index/2015|R}} + {{PAGESIZE:Module:Signpost/index/2016|R}} + {{PAGESIZE:Module:Signpost/index/2017|R}} + {{PAGESIZE:Module:Signpost/index/2018|R}} + {{PAGESIZE:Module:Signpost/index/2019|R}} + {{PAGESIZE:Module:Signpost/index/2020|R}} + {{PAGESIZE:Module:Signpost/index/2021|R}} + {{PAGESIZE:Module:Signpost/index/2022|R}} + {{PAGESIZE:Module:Signpost/index/2023|R}} + {{PAGESIZE:Module:Signpost/index/2024|R}} + {{PAGESIZE:Module:Signpost/index/2025|R}}}} (currently 1406253). So that's, uh, {{#expr:4915800/1406253 round 3}}% of the total index size being taken up just by key names. These have to be parsed by, basically, everything, and it's not quite clear that repeating the field names all these thousands of times is more efficient than just having them as an array whose ordering is documented. Perhaps it would be better to do this? {{ping|Mr. Stradivarius}} <b style="font-family: monospace; color:#E35BD8">[[User:JPxG|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG|<b style="color: #029D74">g</b>]][[User talk:JPxG|🗯️]]</b> 13:51, 23 December 2023 (UTC)
|