Module talk:Signpost

This is an old revision of this page, as edited by JPxG (talk | contribs) at 03:16, 8 January 2023 (New fields). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Latest comment: 2 years ago by JPxG in topic New fields

Problem

To editor Mr. Stradivarius: I'm trying to use this but when I bring up the tags interface, all the fields are greyed out and I can't add title or tags to new Signpost articles. Past articles already tagged are fine. Chris Troutman (talk) 23:08, 6 February 2022 (UTC)Reply

@Chris troutman: Hm, there seems to be a problem with User:Mr. Stradivarius/gadgets/SignpostTagger. I'll take a look at it later on. — Mr. Stradivarius ♪ talk ♪ 00:58, 7 February 2022 (UTC)Reply
@Chris troutman: It turns out I didn't build SignpostTagger to handle the situation when a year module such as Module:Signpost/index/2019 does not exist. This meant that it was failing for all articles from 2020 onwards, as those index modules never got created. I have now fixed the gadget, so you should be able to save tags for any article now. If the year index module does not exist, the gadget will create it. Best — Mr. Stradivarius ♪ talk ♪ 14:52, 10 February 2022 (UTC)Reply
@Mr. Stradivarius: Many thanks! Chris Troutman (talk) 22:20, 11 February 2022 (UTC)Reply

Adding authorship

Hi @Mr. Stradivarius! Thanks so much for creating this. Would it be possible to add authors (bylines) to the metadata? This could be used to generate profile pages of Signpost writers and their articles. Cheers! 🐶 EpicPupper (he/him | talk) 05:08, 6 June 2022 (UTC)Reply

Hi EpicPupper. It's certainly possible. Is this something that's been discussed with other Signpost editors? It would be a not-insignificant amount of work to update the module and the gadget, and I'd like to be sure that it's a feature that people actually want before putting in the work to build it. Best — Mr. Stradivarius ♪ talk ♪ 07:07, 6 June 2022 (UTC)Reply
Hi @Mr. Stradivarius! This has been discussed with the other EiC and some other people, and we think that this is a good idea :) Another (separate) proposal would be to incorporate some type of tagging system per-publication or during publication, so that editors can add tags using a template, and perhaps the publication script would automatically add it to the module. Cheers 🐶 EpicPupper (he/him | talk) 01:30, 22 June 2022 (UTC)Reply

Apologies for the extremely slowpoke.jpg followup on this, but I think author tags would be a very good idea, and implementing them here would save me a large amount of work versus implementing them separately in an independent module. As one example, they would allow us to link authors' bylines to lists of their articles, as basically all modern news outlets do. and I am willing to assist in modifying the script / module (or assist with harmonization of input data on the Signpost pages themselves, automation to update old indices etc) if additional work is required. jp×g 15:53, 4 November 2022 (UTC)Reply

@EpicPupper and JPxG: I have updated Module:Signpost and WP:SPT to support adding authorship. The index modules can now have an "authors" table, which you can see some examples of at Module:Signpost/index/2019. To really make this useful, though, we need to add author data for all 4000 Signpost articles, or at least a significant subset. I plan on writing a script that can do this automatically when I next find some free time. Best — Mr. Stradivarius ♪ talk ♪ 06:54, 12 November 2022 (UTC)Reply
@Mr. Stradivarius: Excellent! One loves to see it. I think we are aligned on automatically filling out article information, to the point that I was already beginning to write it when I saw this. My thinking is a simple Python script that runs on a user's computer and does the following:
- Retrieves Signpost article pages as wikitext from the server
- Parses out departments, headlines, subheadings, author information (and potentially other metadata)
- Retrieves the Signpost module indices
- Integrates the scraped metadata (either by upserting, or only inserting where fields are blank)
- ???
- Profit!
Parenthetically: when contemplating this, it got me to thinking that it might be a good idea to automatically run this after each issue is published (i.e. it seems like a lot of work, and rather error-prone, for people to manually insert index entries for each article, rather than simply adding tags manually). Let me know what you think of this, and if it's a good idea to go forward with running it for each issue. jp×g 22:48, 16 November 2022 (UTC)Reply
@JPxG: Yes, this sounds like a good idea. I suspect that it will be easier to get the Signpost article metadata from the HTML rather than from the wikitext. We can insert IDs into the elements we need to parse, as I did here for the article authors, which then allowed me to write the author-parsing code without too much trouble.

I see article subheadings on Wikipedia:Wikipedia Signpost, but I don't see them on article pages or anywhere in the archives. Would it be acceptable to leave these out of the index modules? Adding these would also mean adding them to WP:SPT, and I would prefer to keep things simple if the subheadings are not used all that much. Also, I couldn't find any mention of departments in the Signpost articles I checked - do you have any examples of the department metadata that you mentioned?

Also, yes, this script, or a variant of it, should be run after each article is published (or we could probably just run it daily). It is not that much of a stretch from running a script on a user's computer to running a script every day automatically on Toolforge. Best — Mr. Stradivarius ♪ talk ♪ 00:12, 17 November 2022 (UTC)Reply

Hello everyone, and also bug?

Hi. I first heard of this module a while ago, but I didn't have the time to go into great detail with it -- now I am trying to do a comprehensive review of Signpost technical infrastructure, so I am here. First of all, I think it whips ass. This is great! I have a few ideas for how I could use it to accomplish a few new features (and probably some ideas for new features the module could have).

Second of all, I notice that something strange seems to be happening in Module:Signpost/index/2022 (and possibly elsewhere): a bunch of article titles have "subscribe subscribe" at their beginning for no apparent reason. If I have time I will try to go figure out what is causing this (probably some templates not playing well together) but I am not very familiar with Lua so it is unlikely I can fix it very well myself if it ends up being something in the module. jp×g 15:45, 4 November 2022 (UTC)Reply

Looks like they begin some time around August. Here is a diff with the weird text -- looks like it is coming in from SPT. jp×g 15:49, 4 November 2022 (UTC)Reply
@JPxG: I have now fixed this. It was due to SPT trying to get the article title by converting everything inside the <h2>...</h2> tags to text, but this included the subscribe link added by mw:Extension:DiscussionTools. This link was presumably added around August. I chose to fix this by getting the title from a new "data-signpost-article-title" attribute added to Wikipedia:Wikipedia Signpost/Templates/Signpost-article-header-v2, and as a backup, getting it from the span inside the h2 tag with the class "mw-headline". I also went through and fixed all the instances where the "subscribe subscribe" links were added to the index modules. All signpost articles newly tagged since August had the "subscribe subscribe" text added, so it was not just limited to 2022 articles, although that's where the problem was most common. — Mr. Stradivarius ♪ talk ♪ 08:52, 11 November 2022 (UTC)Reply

New fields

@Mr. Stradivarius: Today I succeeded in writing something that I have wanted for quite some time, viz. a way to look at Signpost viewership statistics that isn't bad and useless. I'll be posting the source code soon, but basically, it does something very simple: it finds and records view counts for Signpost articles after publication (for a standardized interval afterwards, for purposes of comparison). Anyway, the reason this involves this module is as such:

Storing this data necessitates the creation of some large index of all Signpost articles, and rather than reinvent the wheel, I reckon it would be useful to do so in this module's indices, and I've found a way to make my script parse and update the Lua tables properly. I tested it briefly on Module:Signpost/index/2022 (diff here of what it looks like with the extra fields). I'm not very hot with Lua, so I don't know what this does on the backend utilities that use this module, but SPT works fine with these extra fields, as does Wikipedia talk:Wikipedia Signpost/Single/2022-01-30 (which uses Wikipedia:Wikipedia Signpost/Templates/Single talk, which uses Wikipedia:Wikipedia Signpost/Templates/Article list maker, which uses Module:Signpost).

Anyway, I have everything working, and I am ready to add the fields to all the indices (only back to 2015 since per-page view counts aren't available before then), but I wanted to hold off and make sure that this isn't going to break everything first. What do you say? jp×g 08:19, 5 January 2023 (UTC)Reply

@JPxG: What will the data be used for? My first reaction is that unless you need to make the data available via a template, you could use the Pageviews API to get the data dynamically and not have to worry about storing it in the index modules. If we do need to store the data in the index modules, WP:SPT will need to be updated; with the current way that it is written, it will delete all the extra view fields when it changes any tags (see this diff for an example). Also, rather than using fields like views30, views60 etc., I would prefer that the page view statistics are put into their own subtable, like views = {[7] = 642, [30] = 1966, [60] = 2279, [90] = 2419}. The data would be more structured this way. Best — Mr. Stradivarius ♪ talk ♪ 06:51, 6 January 2023 (UTC)Reply
 Y I thought restructuring the views would be hard, but it wasn't really. Anyway, yeah -- the pageviews thing is a little strange. Basically, it is necessary because {{Graph:PageViews}} is bizarrely broken (it can only return graphs, and is completely incapable of returning straightforward numbers -- tried to figure this out for quite some time to no avail). That is to say, if we want to just look at "how many pageviews did the traffic report get versus the discussion report", we either have to manually enter each page title into the pageviews website, or wild-ass-guess the area under the curve on a graph...
At any rate, if it's possible, I would be glad to help rewrite whatever part of the JS poses issues for passthrough of extra parameters (since this might prove useful for other stuff as well). jp×g 03:16, 8 January 2023 (UTC)Reply

Fleshing

The current version of Wegweiser, while not perfect, now has the ability to pull article lists from the PrefixIndex API and generate skeleton entries (no tags, but date and subpage) in the indices. I filled them out from 2005 to present, which added some several hundred articles previously unindexed (i.e. 2017 only had a couple articles in the index for some reason). jp×g 01:33, 7 January 2023 (UTC)Reply