Module talk:Signpost: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Add topic

Revision as of 08:47, 9 December 2023 edit Mr. Stradivarius (talk \| contribs) Edit filter managers, Administrators 59,228 edits →JSON for the indices: reply ← Previous edit		Latest revision as of 13:51, 23 December 2023 edit undo JPxG (talk \| contribs) Edit filter managers, Autopatrolled, Administrators 121,573 edits →best practices for image information fields: new section Tag: New topic
(8 intermediate revisions by 2 users not shown)
Line 149: {{ping\|Mr. Stradivarius}} Per [https://www.mediawiki.org/wiki/Extension:Scribunto/Lua_reference_manual#mw.loadJsonData this], it's now possible to have Lua modules load JSON rather than Lua tables. I think this would be a lot better to work with (i.e. all utilities wouldn't have to constantly serialize and deserialize Lua tables using the idiosyncratic whitespace/indentation/etc format). Additionally, a page with the JSON content model would be constrained and sanitized by MediaWiki, rather than a Lua table which can just have wrong stuff in it etc. I would like to write some more utilities to work with these indices but the Lua tables are kind of an awkward sticking point. What would the procedure be for converting them? I would be able to write patches for the Signpost tagger and Wegweiser (and may be able to help with the Lua module itself). <b style="font-family: monospace; color:#E35BD8">[[User:JPxG\|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG\|<b style="color: #029D74">g</b>]][[User talk:JPxG\|🗯️]]</b> 23:52, 8 December 2023 (UTC) :{{ping\|JPxG}} I guess you could do that, but you would still have the issue of making sure the JSON is indented and sorted the same way across all tools that write to the data pages. Just switching to JSON doesn't guarantee that all JSON parsers format everything the exact same way. Also, you can't save syntactically incorrect Lua pages; Mediawiki won't allow it. So I don't think there is a real difference between Lua and JSON there. One pitfall is that you would have to change the content model to JSON when creating the data pages, which as far as I'm aware only admins can do. This would mean that WegweiserBot would have to become an admin bot, which would require another BRFA. Also, non-admin users would no longer be able to use SignpostTagger to tag pages for which the data page is not yet created; they would have to wait until either WegweiserBot or an admin user changes the content model. All in all, to me this sounds like a solution looking for a problem. — '''''[[User:Mr. Stradivarius\|<span style="color: #194D00; font-family: Palatino, Times, serif">Mr. Stradivarius</span>]]''''' <sup>[[User talk:Mr. Stradivarius\|♪ talk ♪]]</sup> 08:47, 9 December 2023 (UTC) ::My thinking is mostly that converting the JSON to Lua tables adds another dependency for every program that puts things into or out of the indices; JSON is a pretty widely used standard and there's lots of utilities available for parsing it. It just seems like it's more straightforwardly compatible with what exists now, and more forward-compatible with whatever the future brings. ::As for the content models, I hadn't thought about that. It is true that [[WP:TEMPLATEEDITOR\|TE]]s can change content models, but that's still not very great. Fortunately my RfA was successful so I'd be happy to manually pre-create and set content models for blank JSON indices up to, say, 2060 or so ;) <b style="font-family: monospace; color:#E35BD8">[[User:JPxG\|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG\|<b style="color: #029D74">g</b>]][[User talk:JPxG\|🗯️]]</b> 02:41, 10 December 2023 (UTC) == New subheading and image fields == Lately it has occurred to me that the way old Signpost archives get generated is very ass-backwards. We ''have'' a giant database of every article, its title, its author, et cetera... we're just not using it. It does get used ''sometimes'', like in [[Wikipedia:Wikipedia_Signpost/Templates/Single_talk]]. But for the archive issues, we have hundreds of individual pages, like [[Wikipedia:Wikipedia Signpost/Single/2022-11-28]], that redundantly store article titles, subheadings, et cetera. The modifications I'm working on now (which I've incorporated into the snippet template and the publishing script) allow articles to be associated with custom images, so the archives will have that too, but then this creates a problem: modifying an image for an article requires me to edit the article, then update [[Wikipedia:Wikipedia Signpost]], then update [[Wikipedia:Wikipedia Signpost/2023-12-04]], and such. Line 159 ⟶ 161: * <code>piccy-meta</code> -- coordinates for CSS crop of the image. I am not 100% on this. It might just be four CSS crop coordinates. But I am also contemplating other attributes, like filters (what if we want to desaturate an image, etc). Like above, I am fine to incorporate these into the publishing script, Wegweiser, and the tagging script, but may need some assistance with the Lua part (and I don't know if this should be done before or after the lua table-json thing). {{ping\|Mr. Stradivarius}} what do you think? <b style="font-family: monospace; color:#E35BD8">[[User:JPxG\|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG\|<b style="color: #029D74">g</b>]][[User talk:JPxG\|🗯️]]</b> 00:04, 9 December 2023 (UTC) :{{ping\|JPxG}} This sounds fine to me, although I would avoid abbreviating the fields: <code>subheading</code> and <code>image</code> should be fine. I would also make "image" a table, so you could do something like <syntaxhighlight lang="lua" inline>image = {filename = "Example.png", width = 100, height = 100}</syntaxhighlight>, where "width" and "height" (or whatever metadata you need) are optional. I'm not aware of any way to crop an image in Mediawiki using inline styles - I think we are limited to the options provided at [[mw:Help:Images/en]], but let me know if I'm missing something. Best — '''''[[User:Mr. Stradivarius\|<span style="color: #194D00; font-family: Palatino, Times, serif">Mr. Stradivarius</span>]]''''' <sup>[[User talk:Mr. Stradivarius\|♪ talk ♪]]</sup> 09:03, 9 December 2023 (UTC) ::{{ping\|Mr. Stradivarius}} I was able to figure out what was going on in {{tl\|CSS crop}} well enough to implement it (far slimmer) on the snippet template; specifics (and examples) at [[Wikipedia talk:Wikipedia Signpost/Newsroom#Template for article images]]. Here is what the template looks like with all its arguments; <nowiki>{{Signpost/snippet\|2023-12-04\|Essay\|I am going to die\|And so are you.\|0.3 MB\|sub=0.3 MB\|by=[[User:WhatamIdoing\|WhatamIdoing]]\|pic=File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG\|credit-name=William Michael Harnett\|credit-license=PD\|pic-p=800\|pic-x=350\|pic-y=100}}</nowiki> {{Signpost/snippet\|2023-12-04\|Essay\|I am going to die\|And so are you.\|0.3 MB\|sub=0.3 MB\|by=[[User:WhatamIdoing\|WhatamIdoing]]\|pic=File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG\|credit-name=William Michael Harnett\|credit-license=PD\|pic-p=800\|pic-x=350\|pic-y=100}} :After actually working it out enough to build something that functioned, it turns out there are only three meta attributes needed to fully specify a scale and crop: <code>scale</code>, <code>x</code> and <code>y</code>. All are integers, although I think it might also be useful to permit values like <code>top</code>, <code>bottom</code>, <code>left</code>, <code>right</code> and <code>center</code> (these aren't handled by the template yet but they could be in the future). There are also two more params I didn't think of above, <code>author</code> and <code>license</code>, which are necessary for image attribution. <b style="font-family: monospace; color:#E35BD8">[[User:JPxG\|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG\|<b style="color: #029D74">g</b>]][[User talk:JPxG\|🗯️]]</b> 03:59, 11 December 2023 (UTC) === Subheadings === Need to be supported by various scripts in order to work properly. Accepts {{yeac}} processes {{yeac}} Wegweiser Accepts {{yeac}} processes {{yeac}} Signpost tagger Accepts {{yeac}} processes {{yeac}} Module I have rewritten Wegweiser to fetch metadata from parsing article wikitext instead of HTML pages; this posed some slight difficulties with respect to user tags in author fields but is now good. It can now work a lot faster, and it also provides subheadline metadata. There's a line I have commented out right now in the script, but can enable to make it store subheading data when it parses metadata. When the subheading data is in the module indices, the module still works fine to retrieve articles etc (of course it can't do anything to ''parse'' or use the subheading yet, but it doesn't break anything). However, the Signpost tagger chokes trying to save tags for articles with subheading data and won't work on them, so I won't do all of the indices with it for now. <b style="font-family: monospace; color:#E35BD8">[[User:JPxG\|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG\|<b style="color: #029D74">g</b>]][[User talk:JPxG\|🗯️]]</b> 22:06, 15 December 2023 (UTC) :Now done. <b style="font-family: monospace; color:#E35BD8">[[User:JPxG\|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG\|<b style="color: #029D74">g</b>]][[User talk:JPxG\|🗯️]]</b> 12:23, 23 December 2023 (UTC) == best practices for image information fields == So right now I've integrated into Wegweiser and SignpostTagger the fields for piccy information, like this: <pre> { date = "2023-12-04", subpage = "Essay", title = "I am going to die", authors = {"WhatamIdoing"}, tags = {"essay"}, views = {d007 = 1526, d015 = 1919, d030 = 2029, d060 = 2029, d090 = 2029, d120 = 2029, d180 = 2029}, piccycredits = "William Michael Harnett", piccyfilename = "File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG", piccylicense = "PD", piccyscaling = "400", piccyxoffset = "70", piccyyoffset = "", subhead = "And so are you.", }, </pre> 496 chars. This seems, uh, stupid. It works but I have temporarily reverted. Since these six fields are all about the same thing, there's no good reason to have them occupy six whole fields -- they should probably just be a dict like the viewcounts are. Like such: <pre> { date = "2023-12-04", subpage = "Essay", title = "I am going to die", authors = {"WhatamIdoing"}, tags = {"essay"}, views = {d007 = 1526, d015 = 1919, d030 = 2029, d060 = 2029, d090 = 2029, d120 = 2029, d180 = 2029}, piccy = {filename = "File:Memento Mori 'To This Favour' by William Michael Harnett, c. 1879.JPG", credits = "William Michael Harnett", license = "PD", scaling = "400", xoffset = "70", yoffset = ""} subhead = "And so are you.", },</pre> 467. But, upon thinking this thought, something rather devious popped into my mind: aren't these labels kind of long? It seems pretty insubstantial, but... there are a lot of articles. Six letters for each key, across {{Signpost/Number of articles}} (currently 5462) articles (plus 3 extra chars for the <code> = </code> necessitated by using a label at all) is {{#expr: {{Signpost/Number of articles}} 9}} bytes (currently 49158). For reference, the size of all extant module indices is {{#expr: {{PAGESIZE:Module:Signpost/index/2005\|R}} + {{PAGESIZE:Module:Signpost/index/2006\|R}} + {{PAGESIZE:Module:Signpost/index/2007\|R}} + {{PAGESIZE:Module:Signpost/index/2008\|R}} + {{PAGESIZE:Module:Signpost/index/2009\|R}} + {{PAGESIZE:Module:Signpost/index/2010\|R}} + {{PAGESIZE:Module:Signpost/index/2011\|R}} + {{PAGESIZE:Module:Signpost/index/2012\|R}} + {{PAGESIZE:Module:Signpost/index/2013\|R}} + {{PAGESIZE:Module:Signpost/index/2014\|R}} + {{PAGESIZE:Module:Signpost/index/2015\|R}} + {{PAGESIZE:Module:Signpost/index/2016\|R}} + {{PAGESIZE:Module:Signpost/index/2017\|R}} + {{PAGESIZE:Module:Signpost/index/2018\|R}} + {{PAGESIZE:Module:Signpost/index/2019\|R}} + {{PAGESIZE:Module:Signpost/index/2020\|R}} + {{PAGESIZE:Module:Signpost/index/2021\|R}} + {{PAGESIZE:Module:Signpost/index/2022\|R}} + {{PAGESIZE:Module:Signpost/index/2023\|R}} + {{PAGESIZE:Module:Signpost/index/2024\|R}} + {{PAGESIZE:Module:Signpost/index/2025\|R}}}} (currently 1406253). So that's, uh, {{#expr:4915800/1406253 round 3}}% of the total index size being taken up just by key names. These have to be parsed by, basically, everything, and it's not quite clear that repeating the field names all these thousands of times is more efficient than just having them as an array whose ordering is documented. Perhaps it would be better to do this? {{ping\|Mr. Stradivarius}} <b style="font-family: monospace; color:#E35BD8">[[User:JPxG\|<b style="color:#029D74">jp</b>]]×[[Special:Contributions/JPxG\|<b style="color: #029D74">g</b>]][[User talk:JPxG\|🗯️]]</b> 13:51, 23 December 2023 (UTC)