Wikipedia talk:Arbitration Committee/Requests for comment/Article creation at scale/Archive 4

This is an archive of past discussions on Wikipedia talk:Arbitration Committee/Requests for comment/Article creation at scale. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Archive 3

Archive 4

Formatting for Q2

Latest comment: 2 years ago17 comments6 people in discussion

I'm not sure how you want the lists in Question 2 to be presented? Do you want a section for each person's statement that contains a numbered list and explanation? A horizontal list? Something else? Thryduulf (talk) 19:11, 3 October 2022 (UTC)

I was thinking something like this:

A, B, A2, B2, C, C2, D (explanation of reasoning) Valereee (talk) 19:17, 3 October 2022 (UTC)

And then if lengthier discussion was felt necessary, it could go in the sectioned comments.

Or similar -- something that would make it fairly easy for closers to parse. If you have a suggestion, by all means! I know this is all very complicated, and if we can make it simpler I'm all for it. :D Valereee (talk) 19:17, 3 October 2022 (UTC)

I have no better suggestions so I'll go with that! Thanks. Thryduulf (talk) 19:18, 3 October 2022 (UTC)

@Valereee: Why do we have to list all seven? If, for example, I agree with C and think all the others are wrong, how am I supposed to order the wrong ones? Scolaire (talk) 10:16, 4 October 2022 (UTC)

Yes, that is a silly requirement—you can't force someone to agree with a proposal—and since there are six "do something" options and one "do nothing", intrinsically biases the outcome in favour of a change from the status quo. – Joe (talk) 11:52, 4 October 2022 (UTC)

Ranked choices can help the closers determine whether and where there is consensus. In the case of multiple choices, some of which overlap, often consensus is difficult to determine. There's nothing silly about it, and it wasn't a decision made for arbitrary reasons or for the joy of making rules. Valereee (talk) 12:30, 4 October 2022 (UTC)

Most forms of ranked choice voting don't require voters to rank each candidate. To repeat Scolaire's question, what are people supposed to do if they don't support some options under any circumstances? – Joe (talk) 12:45, 4 October 2022 (UTC)

If you leave them off completely, I'm sure the closers can take that into account and assume you are completely opposed to anything you didn't rank and don't see any of them as being any better than the others. It means they won't know whether you consider D the absolute worst choice and A2, B2, and C2 only lesser evils, but if you don't want to give them that information, they're experienced closers and completely capable of making their own assumptions. :D Ranking all the choices means you're telling them exactly what your opinion is instead of hoping they're assuming correctly. You're also free to make a statement with your !vote to explain. Valereee (talk) 12:56, 4 October 2022 (UTC)

So it's not required? Can I take that part out of the rules? As I said, this isn't a theoretical point: requiring or even encouraging people to rank all choices biases the outcome. – Joe (talk) 13:06, 4 October 2022 (UTC)

It isn't in the rules. It's in the instructions for answering that question with a "Please" in front of it, and no, I'd like it to stay in the instructions as I believe more people being aware that's requested (and perhaps being willing to understand the reasoning behind it) will help the closers ascertain whether there is any consensus. Valereee (talk) 13:18, 4 October 2022 (UTC)

Are people supposed to understand that it's optional because it's expressed politely? And I'm trying to explain that it doesn't help clarify a consensus, it obscures consensus by forcing people to pick second preferences from a severely biased deck. This is the problem with 'moderators'... do I really have to take this to an ARCA just so we can have a normal RfC format? – Joe (talk) 13:31, 4 October 2022 (UTC)

If that's what you need to do, then I guess so. I can only do my best here, Joe. I know you've been opposed to pretty much everything about this process right from the start, but I am doing my literal best in a difficult situation. Valereee (talk) 13:40, 4 October 2022 (UTC)

I've made an edit to clarify no one is going to be p-blocked for not ranking all choices. :D Valereee (talk) 13:49, 4 October 2022 (UTC)

Good edit- it's true that in a ranked choice voting, the usual advice is to not rank options voters consider completely unsuitable. –xeno^talk 14:40, 5 October 2022 (UTC)

Adding 'that you don't consider completely unsuitable'. Not sure I'm in love with "unsuitable", but that's what's there for now. Valereee (talk) 15:13, 5 October 2022 (UTC)

Yes I know it's not you valereee. It's the initial short-sighted decision to give one (or two?) people the responsibility of managing amendments to projectwide policy. I know you have a lot on your shoulders and I'm sorry to add to it. Thank you for the edit. – Joe (talk) 14:04, 4 October 2022 (UTC)

Introduction/background

Latest comment: 2 years ago5 comments3 people in discussion

There exists a policy that automated or semi-automated creation requires a bot request for approval. More recently, concerns have been raised in multiple venues that the continuing creation of such articles has overwhelmed editors’ ability to track and assess these articles, and that the churn has become a waste of time and a cause of disruption - The policy that automated or semi-automated creation requires a bot request is pretty uncontroversial, I think. The issue at hand isn't really continuing creation of such articles but the mass creation of articles by other means (i.e. manually). — Rhododendrites ^talk \\ 19:29, 3 October 2022 (UTC)

Valereee: Should we add "(or article creation at scale performed manually)" to the introduction/background - to make it clear it's the scale, and not the method that is of concern? –xeno^talk 14:46, 5 October 2022 (UTC)

Totally, edit it however makes sense. I didn't because I wasn't immediately sure how to improve. Valereee (talk) 14:55, 5 October 2022 (UTC)

Cool - done at Special:Diff/1114259437. Does that address your comment Rhododendrites? –xeno^talk 16:12, 5 October 2022 (UTC)

More or less, yeah.

— Rhododendrites ^talk \\ 19:55, 5 October 2022 (UTC)

Rules questions

Latest comment: 2 years ago2 comments2 people in discussion

Within your own section you may present your !votes - but there are separate sections for support/oppose, too, so where do !votes go? — Rhododendrites ^talk \\ 19:35, 3 October 2022 (UTC)

!votes into support/oppose sections. Other comments, if needed, into the comments sections following the support/oppose sections. (Not trying to be a dick, just trying to make life easier for closers.) Valereee (talk) 20:37, 3 October 2022 (UTC)

Q3 (definition)

Latest comment: 2 years ago2 comments2 people in discussion

I struck my vote and moved it after getting confused by Q3. The header asks if we should have a definition, with "rate, source, similarity, other" seemingly options that could go in said definition. The text of the proposal, however, answers the question with a specific definition. It may be useful to either offer multiple definitions for people to choose from (with a "none of the above" that requires explanation), or to break it up into two or three questions about rate (with multiple options) and relatedness (with multiple options).

Thought about just adding a Q4 for a different option, but that seems like it could spiral. Or perhaps we should just add other possibilities and rely on moderators/clerks to organize accordingly? — Rhododendrites ^talk \\ 20:21, 3 October 2022 (UTC)

@Rhododendrites, if you have a better suggestion, one of the things we can do here is refine this definition. It was discussed pretty much ad nauseam in the workshop (and even before that in the drafting phase of the workshop) without being able to come up with a definition to !vote on here. I think what I'd suggest is that all suggested better definitions maybe go into subs of Q3? So Q3A, for example? Valereee (talk) 20:35, 3 October 2022 (UTC)

Question 6: New mass creator permission

Latest comment: 2 years ago4 comments2 people in discussion

Joe Roe has kindly pointed out the idiocy of my Question 6 (sorry, late at night here). Am I allowed to edit the original proposal text or are we stuck now? Espresso Addict (talk) 06:59, 5 October 2022 (UTC)

Hey, EA! We'd need to ~~strike~~ and add new wording (and ideally ping previous voters/commenters, although in this case these are people unlikely to wander off). It might be better to simply strike the whole thing and collapse, then start fresh with Q6A? Valereee (talk) 12:44, 5 October 2022 (UTC)

@Valereee: Should I just withdraw it? On further consideration it to some extent pends off agreeing a computer-readable definition of mass creation, which seems not to be the way the discussion is heading. Espresso Addict (talk) 19:17, 5 October 2022 (UTC)

@Espresso Addict, you can withdraw it if you like, and/or replace it with something reflecting your new thinking. I'm not following the 'pends off agreeing a computer readable definition of mass creation, which seems not to be the say the discussion is heading.' I don't actually know what a computer-readable definition of mass creation is <g> but if your feeling is that it's something we should be discussing, by all means bring it up with a proposal. Valereee (talk) 19:28, 5 October 2022 (UTC)

Out of scope

Latest comment: 2 years ago19 comments3 people in discussion

Hey, @Thryduulf, re: this comment. I was looking at that question, too, and wondering if it should be held for the AfD-at-scale RfC. I was waffling a bit because it does specifically discuss mass creations, but my first thought was that it should be held. I don't want to be hasty or overbearing; can you expand a bit on the out-of-scope question? (Other commenters specifically also invited in.) Valereee (talk) 12:53, 5 October 2022 (UTC)

The way I see it is that defining speedy deletion criteria is pretty much exclusively about deletion not creation and only after we have some sort of agreement about what mass creation is and what types of mass creation are and are not acceptable (given that almost everybody seems to agree it's not always inappropriate) can we usefully begin to work out speedy deletion criteria that meet the four requirements. Thryduulf (talk) 13:04, 5 October 2022 (UTC)

Which is basically what this RfC is about: if the community can figure out how to regulate mass creation, mass deletion becomes less of an issue. If the community can't figure out how to regulate mass creation, the folks at AfD need some tools to handle the bad ones, but first we needed to see if the community could, in fact, figure out a solution to the problem of bad mass creation. And this proposal concerns a mass deletion tool. Am I making any sense? Valereee (talk) 13:08, 5 October 2022 (UTC)

Not really. This RfC is about mass creation but the proposal is about speedy deletion. How to deal with the deletion of pages that have been mass created is the purpose of the next RfC. Proposals here should be about (a) defining what mass creation is and (b) how to deal with the issue prior to pages being mass created; dealing with the pages after they have been created should be (as I understand things) left until the next RfC.Thryduulf (talk) 13:29, 5 October 2022 (UTC)

Yes, that's what I was trying to get at, probably not enough coffee yet. I've collapsed, with a link here for any discussion. Valereee (talk) 13:32, 5 October 2022 (UTC)

@Valereee: My rationale for including the solution in this RfC is that the purpose is to determine consensus about policy going forward surrounding creation of articles at scale and to form consensus on those solutions. I believe that creating a speedy deletion criterion is not altering any AfD policy in any way, but instead is proposing a solution to future mass-creations at scale. I would kindly ask that you unhat the section and that you allow for discussion to take place; the question seems to have attracted interest among editors and it is specific to solving the issue of mass creations rather than reforming AfD. — Red-tailed hawk _(nest) 16:22, 5 October 2022 (UTC)

RTH, can you discuss why this would be helpful as part of our due diligence here before we get to the AfD RfC, rather than simply being a part of that RfC? I'm not sure I'm seeing how this is better here than there. Why do we need to deal with this before we go to that RfC? How will having this question dealt with here answer questions we need to have answered before that RfC? Valereee (talk) 16:30, 5 October 2022 (UTC)

@Valereee: First off, speedy deletion is not AfD, and the issue of article deletions at scale has historically been WP:CLUSTERFUCK nominations when there is a mix of notable and non-notable subjects. What this solution would do is, rather than deleting articles at-scale in one discussion, allow editors to tag individual mass-created articles that fail this criterion for deletion. Since we have already determined that questions like whether or not All WP:MASSCREATEd articles (except those not required to meet GNG) must be cited to at least one source which would plausibly contribute to GNG: that is, which constitutes significant coverage in an independent reliable secondary source are in-scope, I don't really see a reason that the discussion (which already began and drew a decent number of editors) should be pushed off.

Second off, very simply put, this fits because the sole purpose of this RfC is to determine consensus about policy going forward surrounding creation of articles at scale and to form consensus on those solutions. That language is quite strong regarding this RfC's purpose and does not frame this as a mere precursor to another RfC that we should push off things to that would also be in that second RfC's scope, contrary to what has been argued above. If you think that the question is equally suited for the next RfC, that's fine, but it's also nonsensical to strike this as out-of-scope if it falls within the sole purpose of this RfC. — Red-tailed hawk _(nest) 16:51, 5 October 2022 (UTC) (clarified 17:16, 5 October 2022 (UTC))

Alternatively, I could very well (given that Question #2C is in-scope) create a proposal that say all mass-created articles must have a reliably sourced indication of importance.. I could leave it at that, but that also lacks any sort of enforcement mechanism, which weakens the proposed solution. I think that the proposal is better when it includes a specific mechanism of enforcement, which is part of why this was framed as creating a new speedy deletion criterion.

If the objection is simply that the solution involves deletion, the alternative of all mass-created articles must have a reliably sourced indication of importance. Mass-creating articles without doing so is considered disruptive editing would clearly be in-scope in this RfC and not the one about deletion at scale. But it also would shoehorn the policy solution to framing this as a per se explicit editor conduct issue, rather than merely treating it as a content issue that may or may not be a conduct issue. — Red-tailed hawk _(nest) 17:02, 5 October 2022 (UTC)

I'm trying to follow...you wrote: Since we have already determined that questions like whether or not All WP:MASSCREATEd articles (except those not required to meet GNG) must be cited to at least one source which would plausibly contribute to GNG: that is, which constitutes significant coverage in an independent reliable secondary source, I don't really see a reason that the discussion (which already began and drew a decent number of editors) should be pushed off. What have we already determined about that proposal? I feel like there's a missing clause? Valereee (talk) 17:09, 5 October 2022 (UTC)

We've determined that the proposal is in-scope. I was missing a clause there, sorry. — Red-tailed hawk _(nest) 17:14, 5 October 2022 (UTC)

Gotcha, thanks. I think your suggestion of a new creations-focused proposal would be a good solution, maybe a question 2A since it's closely related to Q2. (Though you might want to consider not specifying 'is considered disruptive editing' or other statements of exactly how it should be dealt with. That kind of turns it into a two-part proposal, and they're often doomed because some don't object to the first part but do object to the second and end up joining those who oppose the entire idea. If it passes, something can be proposed at the AfD RfC like 'Proposed: develop a CSD to handle creations covered by proposal 2A, which passed at the previous RfC' or whatever. But don't take my word for it, ask for other input.) Valereee (talk) 17:25, 5 October 2022 (UTC)

I understand where you're coming from, but I don't see a rational reason based in the rules of this RfC why the proposal:

All mass-created articles must have a reliably sourced indication of importance. This will be enforced by the creation of a new speedy deletion criterion:
A12: No reliably sourced indication of importance (mass-created articles).
This criterion applies to any mass-created article that does not have a reliably sourced indication of importance. This would apply to any mass-created article that does not indicate why its subject is important or significant. This is a lower standard than notability. If the sourced claim's importance or significance is unclear, you can improve the article yourself, propose deletion, or list the article at articles for deletion.

is out-of-scope, while the proposal:

All mass-created articles must have a reliably sourced indication of importance. This will be enforced by some method determined in the future.

is in-scope. I can seek clarification from ArbCom if you'd like me to, but I really don't see how this distinction proceeds from ArbCom's mandate. I understand that two-part proposals are more likely to have problems gaining consensus, but I also think it's important for the whole solution to be considered as a whole rather than to propose a partial solution that is vulnerable to criticism that it lacks a clear enforcement mechanism. — Red-tailed hawk _(nest) 17:36, 5 October 2022 (UTC)

Because one is focused solely on creation, which is what this RfC is about, and the other is focused primarily on creating a new criterion for speedy deletion and doesn't feel like it's something that is keeping bad articles from being mass created. It's instead about dealing with those mass creations after they've happened, which is what the deletion process does and which I'm planning on covering at the AfD RfC. I get that there's a blurry line here. I'm drawing it where it makes sense to me, but you totally should go get other input at ARCA if you think that's helpful. It very well may be that they'll agree with you. Valereee (talk) 18:07, 5 October 2022 (UTC)

Fair enough. I've opened an amendment discussion at WP:ARCA in light of this. — Red-tailed hawk _(nest) 18:56, 5 October 2022 (UTC)

@Valereee: I've created Question 2A11 along these lines, notwithstanding the pending ARCA. — Red-tailed hawk _(nest) 22:22, 5 October 2022 (UTC) (updated 13:49, 6 October 2022 (UTC))

RTH, are you using 'importance' rather than 'notability' for a particular reason? Valereee (talk) 14:49, 6 October 2022 (UTC)

I'm borrowing from the language of WP:A7. — Red-tailed hawk _(nest) 00:55, 7 October 2022 (UTC)

Are we leaving 'credibly' out intentionally here, to distinguish the language from A7? That is, a claim of importance in this case needn't be credible? Valereee (talk) 10:41, 7 October 2022 (UTC)

Defining "mass creation at scale" with rough numbers (and the ability of any admin to deem other projects in the spirit of the definition)

Latest comment: 2 years ago7 comments4 people in discussion

A few people have argued that we shouldn't have any sort of number associated with the definition of "mass creation at scale", because it will lead to wikilawyering, even if the definition is clear that admins can deem projects which skirt the numbers "mass creation at scale" in the spirit of the definition. (And, presumably, having no clear number to go by will lead to less arguing... like now?). This seems so backwards to me that I'm hoping someone can provide an example.

A major problem we have now is so many people "know it when they see it" and yet see and know very different things. If sourcing is the only issue in play, then the RfC may resolve it, but I see rate come up at least as often when people are complaining about mass creation. How would you have a different sourcing requirement kick in with a particular rate or quantity of articles while refusing to set a rate or quantity? What am I missing? — Rhododendrites ^talk \\ 12:56, 5 October 2022 (UTC)

The issue as I see it is that the issue isn't an approximate number of articles, but an approximate number of articles with similar characteristics over a period of time. 10 articles in 20 days about different subjects is very different to 10 articles in 5 days about the same subject. Numbers do come into it, but never on their own. Thryduulf (talk) 13:01, 5 October 2022 (UTC)

Right, which is why the one I suggested defines it with rate and relatedness both. — Rhododendrites ^talk \\ 13:13, 5 October 2022 (UTC)

My problem is the one I've already articulated in the RfC. The fundamental issue is not the speed, it's that the topics are drawn from a database or off-wiki list that does not ensure notability. This typically happens at a high rate, but the rate isn't the defining feature, it's the lack of effort to ensure notability. Vanamonde (Talk) 16:39, 5 October 2022 (UTC)
+1 Donald Albury 21:07, 5 October 2022 (UTC)

Are you saying if someone creates 500 articles in a week, nobody will mind and they don't need to ask permission, so long as they're not using a database, off-wiki list, or similarly lousy source? — Rhododendrites ^talk \\ 21:19, 5 October 2022 (UTC)
I would have no problem with the creation of multiple articles that each included sufficient reliable sources to establish notability. I think the act of adding suitable reliable sources to each article would effectively slow the rate of creation of new articles. Donald Albury 21:53, 5 October 2022 (UTC)

Q2 options are out of scope

Latest comment: 2 years ago3 comments2 people in discussion

@Rhododendrites: has already made the point before me Whoa. 2B (and to a much lesser extent 2A) extends far beyond the scope of this RfC IMO, applying to all articles. This would be a radical change and should be separated if anyone wants to really propose it., but I have to agree and further argue that this does not belong here. Firstly, you could be someone firmly interested in notability while still not choosing to read the MASSCREATE policy discussion and you'd rightly be pretty irked to find this suddenly in place. If someone wants these options it needs its own RfC. This is especially the case where this RFC (and its deletion counterpart) have specific rules and a specific scope. Nosebagbear (talk) 15:00, 5 October 2022 (UTC)

I wonder if a way forward - if that option gains support here - would be to have a single-issue RFC so that it can more explicitly be considered by all WP:N-interested editors? –xeno^talk 16:10, 5 October 2022 (UTC)

@Xeno: that would certainly be the logical option, but should be made clear in the question that that would be the result. Nosebagbear (talk) 21:58, 5 October 2022 (UTC)

Question about Q2

Latest comment: 2 years ago1 comment1 person in discussion

I see several of the options for Q2 are "requiring" SIGCOV be in the article. Let's say one of those passes. Someone goes and creates an article without SIGCOV. Then what happens? BeanieFan11 (talk) 16:13, 5 October 2022 (UTC)

Andrew Davidson votes

Latest comment: 2 years ago9 comments5 people in discussion

I would like to request the striking of all Andrew Davidson votes in this RFC due to his topic ban from deletion-related activities. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 18:16, 5 October 2022 (UTC)

Hm...is an RfC about creation which is to be followed by an RfC on deletion included in this editing restriction.

The text of the closing/listing at edit restrictions mentions only deletions, but obviously this first RfC's subtext is (mass creations leading to} deletions. It's going to lead directly to the second, which is about deletion, so it's arguably an issue. IMO it's possibly at least a violation of the spirit of the t-ban. Number 57, can you provide any clarification?

I don't think there's any urgency in resolving, but Andrew Davidson, probably best you should stop contributing here until this is settled. Valereee (talk) 19:00, 5 October 2022 (UTC)

I'm struggling to see how this is a violation tbh. Number 5 7 20:13, 5 October 2022 (UTC)

Then I'm going to assume it's not until further arguments are made. Thanks, N57! Valereee (talk) 20:21, 5 October 2022 (UTC)

So, if Andrew is allowed over here, are 7&6=thirteen, TenPoundHammer and Johnpacklambert (who also received similar topic bans) permitted over here too? — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 20:29, 5 October 2022 (UTC)

In this case, I would argue that anything that is making them unable to edit here should also be specifically removed from this RfC, as clearly out of scope of a "mass creation" RfC, as it was divvied (correctly, I feel) up specifically on the create/delete basis. Nosebagbear (talk) 21:28, 5 October 2022 (UTC)
Sorry, NBB...in what case? Valereee (talk) 21:32, 5 October 2022 (UTC)

My main question of "are all four of the topic-banned editors I mentioned allowed to participate in this RFC?" has not been answered. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 22:02, 5 October 2022 (UTC)
To the extent that the discussion overleaf is specifically about article creation, my belief would be they can participate in this RfC (but not the following one), as long as their comments don't drift into discussing deletion. Similar to Number 57, I'm struggling to see how it would be a violation - this is just my personal interpretation: they are free to seek clarification from the arbitration committee if they want to be standing on firmer ground. –xeno^talk 23:15, 5 October 2022 (UTC)

Thought for the next RfC

Latest comment: 2 years ago5 comments3 people in discussion

If there is a better place for this then please move it, but if a definition of what constitutes mass creation arises out of this RfC then the next RfC (or some discussion before then) should have a question asking whether mass deletion should use the same definition or something different. (To avoid getting way off topic, please do not give answers to this question now). Thryduulf (talk) 15:08, 6 October 2022 (UTC)

To clarify, by same definition, do you mean that mass deletion is the deletion of mass-created articles, or just taking the same criteria and swapping in the word "delete" for "create" (leaving any necessary grammar tweaks aside)? (Much like this RfC, I imagine the definition will depend on what each individual proposal is targeting, but as you said, discussion should be deferred to the next RfC.) isaacl (talk) 15:35, 6 October 2022 (UTC)

I was meaning that if this RfC comes up with a definition of what makes something mass creation as opposed to just normal creation (whatever that definition is), we should ask whether we should define the difference between deletion and mass deletion the same way (with, as you say, necessary tweaks to verbs and grammar). Thryduulf (talk) 13:43, 7 October 2022 (UTC)

I'd think it would be a good post at Wikipedia talk:Arbitration Committee/Requests for comment/AfD at scale, which is where we'll be workshopping the next RfC. Xeno and I have discussed the pros and cons of opening that workshopping once things slow down here, whenever that is, versus waiting until the RfC is closed and the closing announcement made here. Valereee (talk) 13:26, 7 October 2022 (UTC)

I've updated the status header box thingie to reflect that. Valereee (talk) 13:33, 7 October 2022 (UTC)

Overlength comment reduced

Latest comment: 2 years ago2 comments2 people in discussion

@Valereee and Xeno: I've pared the prose size of my comment to in Question 2 to 296, excluding signatures. Do I have permission to uncollapse the comment? — Red-tailed hawk _(nest) 14:57, 7 October 2022 (UTC)

Absolutely! You don't need to ask permission -- just pare it and uncollapse! Valereee (talk) 15:06, 7 October 2022 (UTC)

Workshopping of RfC on article deletions at scale

Latest comment: 2 years ago3 comments2 people in discussion

I've added Q7, Q7A, and Q10 to a discussion section at WT:ADAS. We're considering whether starting that discussion is appropriate once discussion here has slowed significantly. If there are comments about that, please take them to WT:ADAS, where I've started a section at Wikipedia_talk:Arbitration_Committee/Requests_for_comment/AfD_at_scale#Timeline_for_workshopping_RfC_on_article_deletion_at_scale. Valereee (talk) 17:57, 7 October 2022 (UTC)

It is unclear what editors are supposed to do about these proposals. Are we supposed to wait until the questions are posted at WP:ADAS? –LaundryPizza03 (d c̄) 09:54, 11 October 2022 (UTC)

Yes, please. Valereee (talk) 12:52, 11 October 2022 (UTC)

Proposal phase

Latest comment: 2 years ago2 comments2 people in discussion

May want to extend this (re: Please make all additional proposals within seven days of the start of this discussion. Subsequent proposals may be brought up in an editor's own section for consideration and inclusion at the discretion of the moderators.). It's not even clear there's agreement on what the subject is at this stage. — Rhododendrites ^talk \\ 15:07, 10 October 2022 (UTC)

Mm. The point was to prevent the addition of tons of late proposals, but I'm not sure we're in danger of that? And, hell, I am starting not to care about the whole damn thing.

This requirement for consensus always ends up meaning the people doing the work, outnumbered by those who aren't doing it (and so don't understand it) and those who simply object to any change, never can get consensus for any solution they come up with. If this ends up back at ArbCom -- who wanted to see if the community could find a solution -- and ArbCom ends up having to find a solution, the same people who are objecting to everything from format to timing to proposals will be accusing them of overreach. Valereee (talk) 13:12, 11 October 2022 (UTC)

RfC name not compelling enough?

Latest comment: 2 years ago13 comments4 people in discussion

@FOARP, per Special:Diff/1115448733, should we reannounce with that title piped here? Valereee (talk) 14:53, 11 October 2022 (UTC)

It can't hurt, can it? "What to do about mass created articles?" or something even more snappy if you can think of it. NSPORTS2022 was also a slow burner though so it could be the community is just digesting the multiple proposals we've got there, or even they think we've argued the subject to death already ;) FOARP (talk) 15:02, 11 October 2022 (UTC)

"What shall we do about the problems caused by mass creation of articles"? Not snappier, but even a slight bit more descriptive and maybe more of an encouragement to participate?

Oh, we've argued the subject to death, that's for sure. I'm sure one of the problems is burnout on the entire subject, and I wouldn't be surprised if a lot of people are taking a look, seeing that very few proposals have any chance of passing, and just moving on. But it can't hurt to make sure it's understood that this is worth at least dropping by. Would you be willing to handle this? Links to all the places it's been posted are in the status box at the top of the page. Valereee (talk) 15:14, 11 October 2022 (UTC)

That wording presupposes that mass creation causes problems. It may well often do so, but it should not be presupposed by an RFC title. Phil Bridger (talk) 17:34, 11 October 2022 (UTC)

Hm, you're so right. Doesn't "What to do about mass created articles" have the same problem, though? It assumes we have to do something. Valereee (talk) 18:24, 11 October 2022 (UTC)

Not if the responder says "nothing". Phil Bridger (talk) 18:36, 11 October 2022 (UTC)

Yes, I can see that argument. Hm. @FOARP, do you have an opinion? Valereee (talk) 18:46, 11 October 2022 (UTC)

I announced it at a few extra places with an emphasis on the wide-ranging discussion. I think one problem is that the folk who create lots of articles unproblematically tend not to frequent all the usual bulletin boards, as they're just a timesink. We need to figure out where they do all hang out, if anywhere, and what kind of wording might make them prepared to wade through all this verbiage to offer their advice. Espresso Addict (talk) 23:42, 11 October 2022 (UTC)
@Espresso Addict, would you add those extra places to the status box, so we have a record of all the places it was announced? Valereee (talk) 17:48, 12 October 2022 (UTC)

Sure, I'll dig them out, but I'm aware of at least one other notifications I didn't send. Espresso Addict (talk) 20:03, 12 October 2022 (UTC)

If you can LMK of any you're aware of, it would be helpful. Valereee (talk) 20:37, 12 October 2022 (UTC)

It doesn't look like NPP was notified directly? I'd be interested in their views on the viable flow rates. Every time I stop by there they seem to be drowning. Espresso Addict (talk) 05:53, 14 October 2022 (UTC)

Thanks! I've posted to Wikipedia talk:New pages patrol/Reviewers. Valereee (talk) 15:09, 14 October 2022 (UTC)

Question 2A

Latest comment: 2 years ago2 comments2 people in discussion

I am not sure how helpful the addition of question 2a to the list of proposed questions 10 days+ into the RfC. Many editors commented on question 2 and this slightly new proposal might get lost in the (now) lengthy and wide-ranging discussion. - Enos733 (talk) 17:01, 12 October 2022 (UTC)

Question 17 was added yesterday and has received a substantial respose, so I figured this one would be OK, but I'll defer to Valereee. I'm not sure how it would get lost since it's in its own section - This doesn't seem to have been an issue for other subproposals intended to address concerns raised in the initial question. — Preceding unsigned comment added by Dlthewave (talk • contribs)

I think adding it in as Q2a probably does make it disappear, at this point. But the proposal is so similar to the one it revises, and (from someone who is very willing to go with the flow on this) the idea that we'd go with sources instead of citations...we've for a long time now asked for inline citations. So I'm not sure this is going to solve the problems the opposers to 2 were seeing. Valereee (talk) 00:07, 14 October 2022 (UTC)

If you are going to create more than X (GNG-based) articles from a database/directory/list, you must get community consensus that the source contributes to GNG and is predictive of further GNG

Latest comment: 2 years ago2 comments2 people in discussion

X can be determined later. RSN could be the place to take such questions. This would not overrule GNG-based guidance that already has minimum requirements (e.g. you can't bypass NSPORT rules etc.). "Contributes to" could include situations in which it's expected every source entry contains links to significant SIRS. JoelleJay (talk) 00:55, 14 October 2022 (UTC)

"If you are going to create..." My question here still applies: How are we to communicate this to a user who decides to create X articles but isn't aware of this RfC? Scolaire (talk) 15:31, 14 October 2022 (UTC)

Database discussion

Latest comment: 2 years ago75 comments9 people in discussion

[moved from RfC on deletions]

@S Marshall, databases that don't contain original prose written specifically about the subject are comparable to specialized web browsers or news aggregators: they may be reliable, but they are not synthesizing anything specifically for any entry, they are merely tools to autocurate and present facts without further analysis. Some of those facts might be from a secondary publication, but they are more likely to be cited to a primary research article or even uploaded directly by a researcher. For the facts that are referenced to secondary sources or have secondary coverage else, those publications should be the basis of the article rather than the database.

Quoting my most recent response, and adding an expansion to what I said there:

As someone whose dissertation utilizes many massive, public, professional scientific databases, and whose published primary research generates many data points that are scraped by special webcrawlers and automatically uploaded into those databases (where they remain primary data), I am going to say interpreting their contents is definitely not something editors should be doing, let alone as the basis for an article. That goes directly against NOT: Information should not be included in this encyclopedia solely because it is true or useful... Verifiable and sourced statements should be treated with appropriate weight How do we know which attributes are DUE? Outside of the general items that show up in any species infobox, what makes any piece of info on the fish page actually encyclopedic? How can we treat any fact with acceptable weight if it only appears as one of thousands of equally-prominent attributes in a database entry? Relying on an editor with specialized knowledge to choose which non-prose items in a db entry should be on WP, and on top of that to expand on those facts, is exactly what NPOV and OR are supposed to prevent: A primary source may be used on Wikipedia only to make straightforward, descriptive statements of facts that can be verified by any educated person with access to the primary source but without further, specialized knowledge. The fish example is indeed making straightforward, descriptive statements of facts--that nothing more can be said is another indicator that the source is primary--but it is not verifiable by anyone without specialized knowledge. And anyway, how do we know the topic is even notable? Just because it shows up in some scientific database? Do you know how many billions of astronomical objects and protein features and chemicals have exactly as much data, in exactly as professional a database, as the fish example? Merely appearing in a database might be a predictor of further secondary coverage, but if it is then that coverage should be used as the basis of the article. JoelleJay (talk) 19:55, 10 October 2022 (UTC)

Is it critical that information be synthesized specifically for any subtopic? I've seen a source that calculates the cost of raw ingredients for some small molecule drugs (per-pill and per-kilogram).

If someone synthesized the data on raw ingredients for a single drug (e.g., in a short peer-reviewed paper), would that feel okay to you? For example: Someone writes that you can buy some salicylic acid ($200/kg at Sigma-Aldrich today, but presumably they'd be getting bulk-price quotes) and some acetic anhydride ($100/L at S-A today) and some cornstarch (price varies depending on purity considerations), and the result is a big pile of Aspirin powder for little more than US$0.01 per 325 mg pill.
If it feels okay to do it for one drug, does it still feel okay to do the same thing for 100 drugs, e.g., in a table in a somewhat longer paper?
If it's okay to do it for 100 drugs in a table, is it okay to do it for 100 drugs in a publicly available spreadsheet?

If not, why not? WhatamIdoing (talk) 20:35, 10 October 2022 (UTC)

...what are you asking? Who is writing a standalone article on the price of a particular drug that would need to source things that way? How would the price of particular ingredients of a drug ever be DUE for inclusion without secondary coverage? Like, NO, of course merely showing up in a peer-reviewed primary paper (although for what possible reason would they note the price of reagents anyway??) would not be sufficient. If what you're really asking is "do we need secondary synthesis for the routine non-controversial background info on the subject that is included in all articles of that type", then no, that's literally the one acceptable use case for primary sources. But obviously such info can't be the ENTIRE article, per PRIMARY and NOT, so I don't see how that's relevant? JoelleJay (talk) 21:27, 11 October 2022 (UTC)

WAID is referring to Wikipedia:Arbitration/Requests/Case/Medicine, which (to save you a massive amount of reading) is a case where certain Wikipedians including important Wikipedians advocated putting drug prices in our articles about drugs and treatments, and used some atrocious methodologies to work out the prices.

I take your point about primary and secondary sources. I think your case is very arguable in policy but, because Wikipedia's definitions of "primary source" and "secondary source" are confused and vary from editor to editor, I also think the policy is hard to apply and needs rewriting.

I feel that the relatively rare cases we're talking about here, where you might want to source a whole article exclusively to databases, all happen in the places where Wikipedia goes outside being an encyclopaedia and takes on its functions as an almanac or gazetteer. Asteroids are good examples: we have a list of minor planets and individual articles on a couple of thousand of them. I'm much more relaxed about this than I am about the list of Olympians because the asteroid articles aren't about biographies of living people.

Another good example is chemicals. Chemicals have an awful lot of properties, so the infoboxes were getting unmanageably large. In the end WikiProject Chemistry decided that important chemicals should have their own data pages and I'm sure many of these are populated from databases. Again, I'm relaxed about this.—S Marshall T/C 22:34, 11 October 2022 (UTC)

I think this is a perceptive comment from S Marshall. The database articles that cause few problems tend to be about chemicals, species, asteroids, or the like. Generally no-one is greatly harmed if the information is sparse, or isn't 100% perfect, or even if the article gets vandalised without anyone noticing. Where we need a great deal more care is for living people, and increasing the number of articles on living (or not clearly deceased) subjects where the notability is borderline greatly increases the risk that someone will be harmed by incorrect information getting in, or vandalism occurring unnoticed. I'd certainly like to explore options for different burdens of proof to be necessary to start (or not delete) articles on living or recently deceased human subjects vs say asteroids. Though perhaps that's the other RfC?

The primary/secondary/tertiary thing is a complete mess; at the moment one can quote policy to say almost anything on this topic. I think we should focus on reliable enough for the fact in question (and the risk of harm) vs not reliable enough. This means BLPs and medical subjects are held to higher referencing standards than asteroids. Espresso Addict (talk) 23:14, 11 October 2022 (UTC)

I agree that, policy aside, there is a lot of nuance to which types of articles the community "accepts" despite their poor sourcing. But for the purposes of at least mass creation and AfDs, I am steadfastly going to insist that the language in WP:SECONDARY should be governing how we treat databases when it comes to any GNG-based subject. That is, no "creation at scale" based on non-prose-containing databases (no author giving their specific analysis of the subject), and if no one can find actual secondary coverage of a db-sourced subject after 7 days at AfD then the article should be deleted. This should be the case for BLPs as well as asteroid articles. I don't think it's useful for us to host what are essentially drug databases when more detailed and most importantly more up-to-date databases already exist; that easily lends itself to real-world harm. I also don't see the value in being a directory for asteroids, either, and that is definitely supported by our guidelines: Coverage must be specific and substantial: notability is not ensured just because an object is listed in a scientific paper or included in a large-scale astronomical survey. To establish notability, the astronomical object must have significant commentary in reliable sources, such as being one of the primary targets of a study with in-depth discussion (beyond discovery and basic parameters). Being listed in a database does not make an object notable. Some astronomical databases and surveys, such as the JPL Small-Body Database, SIMBAD or the Gaia catalogue, list millions or billions of objects. Many objects listed in catalogues and databases have little information beyond their basic parameters and discovery circumstances. Wikipedia does not duplicate content in these databases. This approach should really be taken for every (non-prose-containing) scientific database, all of which suffer from the same issues pointed out in WP:NASTRO, i.e. millions or billions of objects with little information beyond their basic parameters and discovery circumstances. There is just no reason to have standalone articles that simply reproduce/proseify a subset of a database, and I think the "secondary prose" test would provide a meaningful distinction between sources that can count towards GNG and those that cannot. JoelleJay (talk) 02:26, 12 October 2022 (UTC)

When you describe scientific database sources as "poor", I join issue with you. My position is that with an almanac-style article on a chemical or a comet or an annelid worm, which ought to be full of facts and figures, we shouldn't rely on a journalist for information where there's a reliable scientific database maintained by appropriate experts. I mean, where a newspaper article about a star contradicts SIMBAD, it would take strong evidence to convince me that SIMBAD was wrong.

You will recall from previous discussions that nowadays I take a hardline view on notability as it relates to people who're alive or lived in the 20th century. In those topic areas I insist that the GNG should be the main standard and I always advocate for it becoming the only standard. You and I have often agreed about this.

But with the almanac- or gazetteer-style articles that we're considering here, where there's no BLP risk and little or no risk of promotional or COI activity, I get much more relaxed. My understanding is that astronomical features such as craters or asteroids, where they have names, are notable under the fourth limb of WP:GEOLAND and get their own articles where there's more than statistics and co-ordinates to write. And dinosaurs are even worse: they nearly all get their own articles even where all we've got is a fossil of a tooth or fragment of bone.—S Marshall T/C 17:58, 12 October 2022 (UTC)

I don't see how Wikipedia cover species without using taxonomic databases. Every asteroid is covered; i.e. mentioned in a list sourced to databases. I assume every Olympian is covered (mentioned in a list for their countries Olympic team). Presumably every species should be mentioned somewhere in Wikipedia.

Taxonomy has a degree of subjectivity; given the same evidence taxonomists can legitimately disagree on whether something should be a considered species or a subspecies. Taxonomists have a great deal of freedom in naming new species; they do not have to seek consensus from the taxonomic community to describe (what they believe to be) a new species. Provided certain rules are followed, anybody can publish a new species name. That doesn't mean the taxonomic community has to accept every proposed species. The number of species names is several times the number of accepted species.

Describing (proposing) a new species requires a sufficiently detailed description that could provide enough information to write a non-stub Wikipedia article, and designating a type specimen that can be examined by future researchers. Taxonomic experts produce monographs, where they consider every previously published species name (in a genus or family), examine the type specimens associated with the names and come to conclusions as to which species name really represent distinct species and which names should be regarded as synonyms. Taxonomic databases are mostly tertiary sources; the original description of a species is a primary source, a taxonomic monograph is a secondary source, and a taxonomic database compiles the results of dozens or hundreds of monographs to cover all species in a larger group of organisms. The original description of a species can provide enough information for a Wikipedia article, but it cannot address the question of whether that species is currently accepted by the taxonomic community.Plantdrew (talk) 19:39, 12 October 2022 (UTC)

I didn't say databases were "poor", I said articles reliant on just databases are poorly sourced. And I wasn't suggesting we have to rely rely on newspaper journalists for science info: review articles, textbooks, discussion of the primary coverage in the background of a different primary research article are standard and would be perfectly acceptable.

Space objects are not covered by GEOLAND; NASTRO makes this explicit: On Earth, named geographical features are generally notable. This is not true for astronomical objects: the naming of a body in space (such as an asteroid) does not guarantee notability.

@Plantdrew, databases are perfectly fine to use for basic material and certainly mentions in lits -- we just shouldn't have standalone articles that can only be sourced to a database. The problems that the "prose coverage" criterion would address are items that don't have that secondary monograph but still appear in databases. JoelleJay (talk) 20:40, 12 October 2022 (UTC)

Depends on the space object. I stand corrected on asteroids; apparently I last read that page before 2012, which makes me feel really old. But NASTRO thinks in terms of whole bodies. If we needed a guideline on the notability of named craters, mountains, fissures, or other selenological/areological/whatever features, surely it would be GEOLAND.

I think that we might be reaching a consensus here, with JJ dissenting.—S Marshall T/C 22:13, 12 October 2022 (UTC)

GEOLAND explicitly does not cover non-Earth objects (This guideline does not apply to geographical features in fictional works or to the features of other astronomical objects.), and NASTRO does not say it restricts itself to "whole objects". Why would GEOLAND's inherent notability for a "named feature" apply to some mountain on a minor planet when the planet itself isn't considered notable by NASTRO?
A five-person discussion is not even faintly close to the consensus needed to overturn portions of N, NOT, OR, and NASTRO, come on! JoelleJay (talk) 23:35, 12 October 2022 (UTC)

We're not overturning notability rules. What we're trying to decide here is what rules, if any, should apply to mass article creation. We don't define mass article creation because we can't agree what that means, so the general rules for article creation apply, and they are: (1) A biography of a living or recently-deceased person needs to cite at least one source when it's started; and (2) Other articles don't need to cite any sources at all when they're started. Notability doesn't come into play until the article gets to AfD.

Lugnuts' database-only article creations were within all the rules and guidelines at the time. The consensus is that they shouldn't have been.

The proposal on the table is to ban mass-creation of articles based on database sources only. The subject of this talk page discussion is whether to allow an exception for reputable scientific databases.

It's a novel proposal that introduces new rules into the encyclopaedia and not an attempt at an end-run around notability.—S Marshall T/C 22:56, 13 October 2022 (UTC)

@JoelleJay, I'm trying to understand your desire to have information synthesized "specifically".

It seems to me that if I calculate the cost of raw ingredients for aspirin by hand – a mole of this plus a mole of that, mix in a handful of cornstarch, et voilà) – just doing aspirin and nothing else – then that would be an example of "synthesizing specifically".

I know that Wikipedia:I am not a reliable source, and this would obviously support one sentence in an article rather than a whole article, but I'm just trying to figure out what makes a source count as "synthesizing specifically". Is that an example of "synthesizing specifically"?

If so, if I did this a hundred times, for a hundred common molecules, would it still be "synthesizing specifically"? Or is there something about doing these calculations in bulk that makes it more like "synthesizing indiscriminately"? WhatamIdoing (talk) 05:11, 13 October 2022 (UTC)

To ensure that a person has put thought into analyzing the article subject itself. Software that automatically interprets data and spits out "analysis" is not equivalent to someone actually discussing the analysis itself. And I still don't see what relevance your example has--the prose criterion would only be for cases where a database is being used as the sole justification for a standalone article. I cannot imagine how the product of a trivial calculation would ever merit its own article in the absence of secondary sources. JoelleJay (talk) 05:33, 13 October 2022 (UTC)

I'm not sure about other issues, but I don't believe a database should ever be considered reliable. To be clear I mean if we have nothing else to confirm a fact but a single database. The issue is that no database is 100% accurate or reliable. So by basing something on a database entry alone, with no other supporting sources, we run the risk of perpetuating citogenesis. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 00:18, 13 October 2022 (UTC)
Really? Not even the US Census, whose results are delivered to the world in the form of a database? WhatamIdoing (talk) 05:03, 13 October 2022 (UTC)

(edit conflict) I think the point is that the figures for the population in Anytown, Michigan aren't necessarily 100% accurate, even from the US census. Someone will have refused to fill it in, failed to, or there will have been a transcription error etc... This is to be expected, and just because a source isn't 100% accurate doesn't make it something we shouldn't use. That applies to databases in the same way it applies to errors in prose - I've read far too many books with obvious errors in them, some of them caused by synthesising databases, some by mistakes and others by new information coming to light after the book was written. Does that mean we should also reject all books? Or, horror of horrors, webpages? That would be silly.

The point is whether, on the whole, the database is suitably reliable. And then in what context can we just use a database to create a factual article.

Fwiw, databases copy each other. The number of databases that say something doesn't matter that much - it's the overall reliability of the database(s) I'm more interested in. I'm quite happy to point out multiple cases where two databases show or showed the same thing which is clearly not correct. That doesn't make those databases unreliable. And that's before we get to data traps. Blue Square Thing (talk) 05:40, 13 October 2022 (UTC)

My point is less about the population figures of a town being wrong by 30k, and more if we base an article on a database entry (and only that database) in which the name is incorrect. If we created an article for Anytown, Michigan because it appeared in a database without any other supporting sources (not even another database), then there's a genuine risk that the entry should actually be for Anyton, Michigan. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 12:06, 13 October 2022 (UTC)

I agree that sources don't have to be accurate to be reliable. @ActivelyDisinterested, it sounds like your concern (i.e., any single source could be wrong) really has nothing to do with databases. A typo in a newspaper article could have the same result. WhatamIdoing (talk) 14:26, 13 October 2022 (UTC)

No my concern is specific to databaes. If a typo exists in a newspaper at random then it's likely to be noticed because a word is wrong, same is true of books or other sources. But due to the limit state of data stored in databases any error is magnified. "The town of Anytown, Michigan" is only "Anytown" "MI" in a database. A random error in a newspaper may effect any part of the sentence ("The tpwn of Anytown, Michigan"), but in a database any random error can only effect critical information ("Anytopn" "MI"). So the effect of random errors in databases is always critical, while random errors in other sources are more likely to be inconsequential. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 14:49, 13 October 2022 (UTC)

The effect is magnified the less information is held by the database, so the instance of Fishbase below is a bad example. A better one was an article about a first class cricketer I saw, the database held no more than name / date / statistics. I was looking for any other sources to help the article when it occured to me that any simple mistake in the name would make this impossible (old English name with odd spelling, and the entry was for the 1890s). So if that was the case are article was only perpetuating, and amplifying that mistake. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 14:56, 13 October 2022 (UTC)

As I said I thinking my concern is much more narrow than others, but makes me concerned about an editors ability to verify the information. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 14:57, 13 October 2022 (UTC)

If a reliable source says "Anytopn, MI" exists, then that material is verifiable in that source, even though the source is wrong. Verifiability is about whether you can find a (non-Wikipedia) source that says something, not about whether it's true. It looks like you've only been editing for about a year, so you probably haven't run across Wikipedia:Verifiability, not truth. That explains some of the theory. A short version might sound something like "Don't ever just make stuff up, but also don't bother us about whether it's Really True™ that the Earth is round. Wikipedia reports what the sources say, not what's Really True™ According to Editors". WhatamIdoing (talk) 00:51, 14 October 2022 (UTC)

We're talking about creation, not verification. Absolutely if the scientific consensus was that the world is flat, Wikipedia should go with that consensus. But not everything that is verifiable is included in Wikipedia. I'm not saying that such databases can't be used in article to verify details. My point is that we probably shouldn't be using sources that could easily be in error as the only source for article creation. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 10:57, 17 October 2022 (UTC)

All sources could easily have a typo in them. Even official sources, even authoritative sources, could easily have typos in them. I saw news articles last year about a town needing to rename a school because it was named after a prominent citizen, and historians decided the town had been misspelling the name for almost a century. Birth certificates end up with spelling errors. People post the wrong date for births and deaths on social media. These things happen. That's why one of our key signs for identifying reliable sources isn't that they're always correct, but that they're willing to publish corrections when they're inevitably wrong. WhatamIdoing (talk) 15:54, 17 October 2022 (UTC)

Indeed; recently I was amused to find that a set of articles I've been using misspelled J. J. Stevenson the architect as J. J. Stephenson the activist, and that the listing body, a supposedly authoritative source, had cribbed from these articles more than once without correcting the misspelling. Espresso Addict (talk) 23:30, 17 October 2022 (UTC)

Database only

Mass created articles example
No, this is not okay.	Yes, this is okay.
Entomocorus benjamini is a species of driftwood catfish found in the Madeira River system in Bolivia and Brazil.^[1] ^ Froese, Rainer; Pauly, Daniel (eds.). "Entomocorus benjamini". FishBase. December 2011 version.	Entomocorus benjamini is a species of driftwood catfish found in the Madeira River system in Bolivia and Brazil.^[1]^[2] ^ Froese, Rainer; Pauly, Daniel (eds.). "Entomocorus benjamini". FishBase. December 2011 version. ^ Ferraris, C.J. Jr., 2003. Auchenipteridae (Driftwood catfishes). p. 470-482. In R.E. Reis, S.O. Kullander and C.J. Ferraris, Jr. (eds.) Checklist of the Freshwater Fishes of South and Central America. Porto Alegre: EDIPUCRS, Brasil. ISBN 9788574303611

From the above comments and on the main page, I'm gathering that the first is supposed to be bad, because it only cites a database entry, and the second is okay, because it cites the same database entry plus a 700-page-long book. Am I right? WhatamIdoing (talk) 05:38, 13 October 2022 (UTC)

If the main source of a database entry is a book, why not cite the book in the first place (and include the database as an additional ref)? Again, the criterion is meant to prevent creation of articles that can only be sourced to databases/primary research articles or for which there is no way to determine whether any specific fact is DUE. So if every fishbase entry is sourced to lengthy articles in such books, then the books should be cited, and maybe you can make the argument that inclusion in the database itself is a halfway-mark for GNG and thus it deserves to be whitelisted. But as someone whose published primary data is automatically added as attributes to various "reliable" database entries without secondary appraisal, only a very small minority of even scientific databases correlate to GNG for all their entries. Provisions for those that do can always be added to our guidance, that's not unreasonable. JoelleJay (talk) 07:37, 13 October 2022 (UTC)

I also want to reiterate that primary sources like research articles, whether in the original form or as data reproduced elsewhere without analysis, are not supposed to be the bases of articles, and this should be no different in science. Merely repeating the info in a single paper describing a species for the first time doesn't transform it into a secondary source, so any databases that were whitelisted would need to show that each entry was in fact derived from a review article/book or synthesized from multiple studies, and not just from one primary study. JoelleJay (talk) 07:54, 13 October 2022 (UTC)

Why not cite the book in the first place? Well, I'd start by pointing out that the book costs US$48, and the database is free. That fact alone is a major barrier for most editors.

As you have perhaps noticed, the book is cited in the Fishbase entry. It is listed on the main entry page, under the heading "Main reference". So I wonder, in terms of demonstrating that the database isn't the only source, isn't that database entry itself good enough? You click through to the link, and it gives you the full citation for the book. Right next to their "Main reference", there's another link to a list of other sources about the subject, plus the first line of the database entry has a link to the original research paper (a primary source).

Does copying and pasting the book ref into the Wikipedia article actually matter in reality? This database entry identifies 15 sources. It's the same information in our article whether I copy over the main source in the database entry (or all 15 of them) or not. The subject is notable whether I copy over that second source or not. Or is the goal really just a sort of defensive behavior, to ward off reviewers who don't know much (we obviously can't have subject-matter experts for everything) and who don't click through to see that there are many sources cited in the cited database entry?

(As for whether this entry is typical: All Fishbase entries seem to have multiple sources, and most of them seem to cite at least one book chapter or an academic paper of similar quality. One might not want to set a bot to import the database the way Rambot created articles about US cities from the 2000 US Census, but I'd never be concerned about an editor picking and choosing hundreds or thousands of fish to write about from this database. It is "reliable" in the sense that we can safely "rely" on it.) WhatamIdoing (talk) 14:17, 13 October 2022 (UTC)

Like I said, we can make exceptions for databases where a clear secondary source is always available for every entry. The prose criterion is supposed to be for databases where the existence of secondary sourcing is not evident and is not expected to exist for every entry (I guess I should make that clearer in the proposal? I thought the "based on" part was sufficient there). For mass creation from databases, it should be absolutely certain that secondary sourcing can be achieved through visiting the database, whether through actual secondary prose in the database or through the database itself being essentially a 95% predictor of GNG (which would have exempted them from the criterion regardless since that's basically an SNG...). JoelleJay (talk) 17:44, 13 October 2022 (UTC)

I suspect that, to achieve a meeting of the minds, you will have to spell it out in detail, with a long list of examples and circles and arrows and paragraphs typed on the back, preferably using real examples.

(Also, WP:Based upon might be useful to you.) WhatamIdoing (talk) 00:55, 14 October 2022 (UTC)

I think one problem here is that in the case of the fish, the likelihood is that the article has been created purely from online sources. The creator may well not own the book and can't check that the details are indeed verified. I tend to address this using Further reading, which asserts the further source exists, but makes no claims to have looked at it. In 9/10 of the green examples, the creator (or someone else helpfully adding the book) will merely be assuming that the book supports the article as written. Espresso Addict (talk) 10:35, 13 October 2022 (UTC)

As I've qualified above my only issue would be if that specific database was the only source for Entomocorus benjamini that we had. I'm not saying databases are unreliable or even in this example that Fishbase is unreliable, only that using only Fishbase and no other sources (in the entire article) is problematic. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 12:10, 13 October 2022 (UTC)

Since the database entry itself cites another 15 sources, is it really important that the article here also cite some of those sources?

As @Espresso Addict notes, it would be easy for an editor to just copy those over. The fact that they exist demonstrates notability. There is no way for you to know whether the editor read any of those 15 sources. What do we really gain by adding another citation, aside from reviewers not needing to click through to the database and see that there are 15 sources listed there with their own eyes? WhatamIdoing (talk) 14:20, 13 October 2022 (UTC)

I'd argue that by forcing people to add additional sources, we potentially lose reliability from the project - we all know that there are cases where sources have been, err, manufactured or assumptions have been made. If you insist that I add book sources, I'll add them, but it's much better that I add the ones I've actually checked - and maybe picked up where they disagree with the database entry or where errors exist somewhere along the line. So which is more reliable: an article where I cite the one database I used, or an article where I cite two other sources that I've not actually got access to but were cited in the database? The later would appear to a causal reader to be more reliable - and that's misleading. Blue Square Thing (talk) 15:08, 13 October 2022 (UTC)

I don't understand this line of reasoning, yes people do manufacture references. Most the if the time because they have something they want to write, and only then find sources for it. That problem won't be effected one way or another, it's already back to front. If you don't have references don't write it, and if you can't find references that support you then read what references say and summarise that instead. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 15:33, 13 October 2022 (UTC)

I missed this early, as per my explanation you point has nothing to do with the example I'm concerned about. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 10:59, 17 October 2022 (UTC)

Everyone agrees the version with the second source is preferable. But the first version shouldn't be banned and would I think survive AfD.—S Marshall T/C 12:35, 13 October 2022 (UTC)
If the first version came to AfD, I'd !vote to keep it. The given source appears both in-depth and reliable, and it offers possibilities for expansion. It collects information from primary sources, making it a secondary source. One might develop a reasonable antipathy to superficial databases, but I don't think that aversion translates over to here. XOR'easter (talk) 16:18, 13 October 2022 (UTC)
Except the same can be said for many of the databases explicitly excluded by NASTRO and by NOTINDISCRIMINATE...and anyway, merely reproducing primary data without additional context still runs afoul of PRIMARY. The factor that would exempt this database is the citation to a clearly secondary SIGCOV source within the entry and the assurance that this is available for each entry; that will not be the case for the vast majority of databases. JoelleJay (talk) 17:52, 13 October 2022 (UTC)
NASTRO discourages certain databases specifically because of the paucity of information in most of the entries – the many entries that "little information beyond their basic parameters and discovery circumstances", to quote NASTRO on databases. Those individual entries don't, to use @XOR'easter's words, appear to be in-depth and reliable and offer possibilities for expansion. The entries that do offer substantial information do not appear to be banned by NASTRO.

The only databases mentioned in WP:NOTINDISCRIMINATE are lyrics databases, and it is actually objecting to Wikipedia becoming a lyrics database, not to editors citing them. The rest of it says: "To provide encyclopedic value, data should be put in context with explanations referenced to independent sources. As explained in § Encyclopedic content above, merely being true, or even verifiable, does not automatically make something suitable for inclusion in the encyclopedia."
Putting information in context means that you say things like "Entomocorus benjamini is a species of driftwood catfish found in the Madeira River system in Bolivia and Brazil." That's "<Subject> is a kind of <category> in <part of the world>." That's all that "putting in context" requires. By contrast, an article that says only "He makes people laugh" has no context. WhatamIdoing (talk) 01:08, 14 October 2022 (UTC)
On the primary/secondary question:

WP:N says that it doesn't matter what the article says. Notability is judged on real-world sources, not the sources cited in the article.

WP:N does not require any sources to be cited. This is an important point, so I repeat it: The GNG does not require that sources be cited in the article. None. Actually zero.

But it sounds like you object to an article being created about a subject that is notable, and for which secondary sources are known (at least to you and me, though not necessarily to others) to exist, unless a secondary source is cited.

Over here, I see a guideline that says zero sources are (technically) acceptable. Over there, I see an editor arguing that not only is at least one source required, but also that this one source had better live up to her standards for sourcing, and not merely be a source that contains a bibliography to sources that live up to her standards.

Can you understand why I'm thinking that One of These Things (Is Not Like the Others)? WhatamIdoing (talk) 01:14, 14 October 2022 (UTC)

Re: "basic parameters", I think you are underestimating the number of "severable facts" each entry in the NASTRO-excluded databases contains--generally 11 orbit parameters and numerous "miscellaneous details" for the small-body database.

Re: putting info into context, I am talking about database sources, not our own articles. We require the secondary source to be the site of context, which can't happen if it just lists or reproduces attributes from primary sources.

WP:N still requires sources to exist. If an article creator does not know that sufficient sources exist (either to demonstrate notability through coverage, i.e. GNG, or to verify the subject meets some non-coverage-based SNG), they should not be creating the article. And if they do know with certainty the subject is notable, they surely have access to the sources and can cite them. Wikipedia does not allow zero-knowledge proof of notability. This is why it is extremely important that mass creation sourced to databases has a high confidence in notability for each article; this is not possible if the database does not contain original analysis (to conform with WP:SECONDARY) or direct references to SIGCOV in SIRS. JoelleJay (talk) 17:11, 15 October 2022 (UTC)
JoelleJay It's not true that one will always have access to secondary sources. For example, English listed buildings have one obvious online source (the listing – not a database, in the sense you are talking about, but pretty close) but in my experience any building or major structure (not so much the milestones & bollards) always has other sources: the Pevsner guides (expensive & not available online), other architecture books, architecture journals (not online), local/national newspapers, and local history books (mainly only available via local libraries or by picking up second hand). For any listed building I can be 99.9% sure there will be other sources, but won't be able to check that without spending money, going to a particular library (hundred of miles from my current library-free ___location), or asking at the resources exchange. Now I personally don't find it satisfying to just use the listing, so I have not created an article unless I'd first bought book(s) or, in the past, been able to visit an appropriate library. But there's no problem with doing so in principle.

The same (hopefully) applies to all the entities that have an assumed notability; certainly it's true of species, drugs, populated settlements in the UK; I don't know anything about astronomical entities. Detailed subject-specific notability guides are essential to guiding whether sources are likely to exist or not. If they are inaccurate, then that's a problem with the subject-specific notability guide that needs to be addressed, as I gather has happened with some of the sports guidelines. Espresso Addict (talk) 00:37, 16 October 2022 (UTC)
And if they do know with certainty the subject is notable, they surely have access to the sources and can cite them.
This is a statement of alleged objective facts, and it is untrue. Knowing that there is a book all about _____ does not mean that I "surely" have access to that book.

Wikipedia does not allow zero-knowledge proof of notability.
Yes, we do. Hang out at AFD for a while, and see people posting sources. Nobody stops them and says, "Whoa, that's an 800-page book titled The Sun is Really Big by the noted Sun expert, Alice. But you haven't read it, so you have zero knowledge of its contents, and we don't allow zero-knowledge proof of notability for subjects like the Sun."

Even more relevantly, we technically allow zero-source non-BLP articles up until the moment that the article's existence is challenged. You might want to change that, but the existing rule is that the sources must exist in the real world, regardless of whether they are cited in the article yet. User:WhatamIdoing/Christmas candy has zero sources, I am personally unaware of any in-depth secondary sources on the subject, and it is still extremely unlikely that the article would be deleted at AFD. There would probably be some grumbling about a lazy editors and lousy zero-source substub, and maybe even what a spam magnet an article like that could be, but nobody from the Western world is going to suggest that nothing's ever been written on the subject. I sometimes feel like you have a somewhat rose-tinted view of how the notability process works. In both theory and practice, we accept completely unsourced articles when the subject feels "obviously" notable to most editors. I recommend that every time you're tempted to say that sources are required, or that some particular type of source is required, you first ask yourself how certain you are that you could get an unsourced article on Christmas candy deleted at AFD.

This is why it is extremely important that mass creation sourced to databases has a high confidence in notability for each article
Is it more important for mass-created articles (e.g., 1000 articles by a single editor) than for non-mass-created articles (e.g., 1 article each by 1000 inexperienced editors)? Or do you think that it's about the same importance for mass-created and non-mass-created articles?

WhatamIdoing (talk) 02:23, 16 October 2022 (UTC)
By "access" I mean "have enough info on the source to confidently name it as an SIRS with SIGCOV".

That's not what a zero-knowledge proof is. Providing a source is still providing information as to how one "knows" or could know a subject is notable. What WP doesn't allow is claims of notability that could never be validated. By our definition of notability, a source must exist somewhere for something to be notable; therefore, any assertion that a subject "is notable" must be theoretically falsifiable with sourcing even if it's not cited in the article.

I think your Christmas candy page would very likely be deleted or merged into or redirected to some other page if it was brought to AfD and no one could find any secondary sources on the topic. Out of nearly 700 AfDs I have never seen an article kept on the basis that "editors feel the topic is notable" despite nothing being found to support it as a standalone and no one arguing that sources exist offline somewhere.

I guess in theory if the article state and creation timing are identical these would be equivalent? But clearly in practice that is not the case, which is why we have this RfC... JoelleJay (talk) 03:23, 16 October 2022 (UTC)
Iff is was brought to AfD (which is unlikely) and iff nobody could find suitable sources on the subject (which is extremely unlikely, since something around a billion dollars' worth of Christmas candy is sold each year just in North America), then perhaps it would end up merged to the larger topic of Christmas food. But do you really imagine that both of those unlikely events would actually happen in practice?

I'd first expect nobody to nominate it for deletion unless someone is taking a WP:POINTY action (or let's imagine a gentler form of that, e.g., an autistic teen trying to make every article conform to The Rules™ as best as he understands them). In the unlikely event that it did get nominated, I'd expect a whole lot of scornful claims that There must be sources, followed by someone finding the sources. WhatamIdoing (talk) 20:50, 16 October 2022 (UTC)
To be clear, you think the only editors who would object to "Christmas candy" being an unsourced stub in mainspace are people with an axe to grind and... autistic children? Never mind that it's not even an example of a zero-knowledge proof. And never mind that things like Halloween candy and Valentine's Day flowers are redirects despite being billion dollar industries. JoelleJay (talk) 23:35, 18 October 2022 (UTC)
More precisely, I would expect experienced editors to know that when they object to an article on an obviously notable subject being an unsourced stub, the appropriate solution is to find the edit button instead of finding the AFD directions. Wikipedia:Deletion is not cleanup, and when your only concern is "not enough words on the page" and "not enough sources on the page" – both of which have nothing to do with whether the subject qualifies for a separate page – you normally shouldn't be trying to get the page deleted.

Halloween candy is a notable subject that seems to have been created as a redirect because someone wanted the link not to be red, rather than because someone judged it to be an inappropriate subject for an article. We could write a whole article there, just like we already have a sourced, non-stub article on Halloween cake, whose claim to notability is far less obvious to me. We also have an article about Poisoned candy myths, which is a sub-topic of Halloween candy. As a dyed-in-the-wool mergeist (also as someone who has given non-candy treats to the neighbor kids for several decades), I'm personally in no hurry to split that, but it could be done, and I believe it would be uncontroversial.

Valentine's Day flowers was neither a stub nor unsourced at the time it was merged, and none of the comments at AFD suggested that it wasn't a notable subject. They did suggest that it was a spam magnet. WhatamIdoing (talk) 19:28, 19 October 2022 (UTC)

@Espresso Addict, are "listed buildings" not covered by the first bullet of NBUILD? I'm not talking about requiring secondary sources for subjects under SNGs that confer notability (per (either to demonstrate notability through coverage, i.e. GNG, or to verify the subject meets some non-coverage-based SNG)). JoelleJay (talk) 02:31, 16 October 2022 (UTC)
JoelleJay Sure, but I was attempting to refute your assertion that "And if they do know with certainty the subject is notable, they surely have access to the sources and can cite them." Espresso Addict (talk) 02:34, 16 October 2022 (UTC)
Re WhatamIdoing's comments, if one tries to clean up unsourced articles from the early years they often turn out to have been written from a very reliable encyclopedia entry, just not explicitly sourced to that. Espresso Addict (talk) 02:40, 16 October 2022 (UTC)

But if they're writing from an encyclopedia entry then they do have access to the source and so could cite it, they just don't/didn't. I still don't really know where this point is going? JoelleJay (talk) 03:29, 16 October 2022 (UTC)
This is meant to be discussion about mass deletion efforts; it's often been written/implied in my hearing that all long-unsourced articles should just be deleted, because if they have not been sourced yet, they "clearly" cannot be sourced, yet in my experience many are trivially sourceable to an encyclopedia available in the WL bundle. Espresso Addict (talk) 03:43, 16 October 2022 (UTC)
Oh, this was only mistakenly put in the deletion RfC talk, per the section above. I have many Thoughts on the argument you bring up now, but those should be saved for another place and time. JoelleJay (talk) 03:58, 16 October 2022 (UTC)

I know we're supposed to be concentrating on output not editors, but I find there's a growing disjunction between editors who create/improve articles and those who primarily participate in deletion or policy discussions. Article creation, in my experience, is often a complex balancing act, involving multiple compromises. I fear this is one of the reasons why AfDs are often perceived as acrimonious. Espresso Addict (talk) 01:32, 14 October 2022 (UTC)

You guys are incredibly impressive, but I'd encourage you to think about who you're persuading. Valereee (talk) 01:38, 14 October 2022 (UTC)

I'm so sorry, I didn't mean to post anything cryptic! I meant the greater !voter who is either misinformed or uninformed. Most come in with preconceived ideas, often wrongheaded, or with almost no understanding of the issue. Many fear any change. You are workshopping not only for yourselves but for them.

Again, apologies, I didn't meant that as any kind of moderator comment, just a comment from someone who isn't an expert but would really love to see us find something that can gain consensus. Valereee (talk) 12:53, 14 October 2022 (UTC)

Taking stock

I have questions about where we are now.

Have we decided to create a new rule that people who mass-create articles should cite a source in every article, starting from the first revision that appears in the mainspace, even if the article isn't a BLP?
If so, have we also decided that, where the mass-created article is a BLP, the source may not be a database?
And, have we also decided that, where the mass-created article isn't a BLP, the source can be a database if that database is scholarly and reliable?
And if so, do we need to think about enforcement mechanisms?

I'm conscious that the moderators might be wanting to move on.—S Marshall T/C 17:31, 14 October 2022 (UTC)

Discuss to your heart's content. The only thing the moderators want to move on with is the actual starting of the RfC, which we'd like to do soon, but since we're not doing a workshop phase (that didn't seem to help much with the first one) we're going to skip it, so at this point we're just trying to decide on format, then we'll start. Valereee (talk) 17:34, 14 October 2022 (UTC)

My problem with this is it seems to be relating to the creation RfC not the deletion one; "we" have no power to decide anything, all we can do is provide proposals on which the community votes, and it's probably now too late to do that in the other RfC. Espresso Addict (talk) 22:41, 14 October 2022 (UTC)

@Espresso Addict:, this thread absolutely pertains to the creation RfC. It is on the wrong page. In the previous thread on this page, there was concern about how to format comments for the deletion RfC, with the observation that some discussions in the creation RfC were getting very long and would benefit by being threaded (with the comments about databases by JoelleJay and WAID being given as example). Valereee suggested moving the database discussion to "this talk" and then made subsequent comments that indicated she wasn't sure which page she had just been editing. Plantdrew (talk) 00:46, 16 October 2022 (UTC)

@Plantdrew, Valereee, and Xeno: I think it might make sense to move it to the other talk page. Espresso Addict (talk) 00:52, 16 October 2022 (UTC)

Agreed. JoelleJay (talk) 01:53, 16 October 2022 (UTC)

@Espresso Addict, go right ahead, and apologies it ended up here in the first place! Valereee (talk) 14:37, 16 October 2022 (UTC)

@S Marshall, I'd answer your questions like this:

No. I think that's a very simple statement that "a source is required" is a lot of editors could support, except that the last time I checked, nobody seems to have proposed it at WP:ACAS. Instead, the proposals have been not been "a source"; they've been "a source of a particular type", just like yours here is not "a source" but "a source at a particular point in time". In "Question 2", the most popular response (40% of participants) is no change to the rules.
No. I don't think that particular combination has been discussed much by anyone except you (not much opposition, but not much support, either). I'm personally open to that, pending some definition of what constitutes "a database" and how you would know which databases would be acceptable. Imagine, e.g., the various 40 under 40 awards. A magazine article like that counts towards notability, and it should continue to count even if they organize it into a database format.
I wouldn't say that's "decided", but I think there is some support for this view.
Since it's not clear that we'll be making any changes, I think it's premature to plan for enforcement.

To expand a little on that last point: If we don't agree on a definition of "mass creation", then very little can be enforced anyway that isn't already being handled.

I think that we get something on the order of 500 non-redirect articles per day. The current rules suggest that a single editor should not take up more than 5% or 10% of the queue (anything more than that, and you should get bot approval). Only a handful of people have done that during the last year, even for a single day. When you look at the whole year, we don't even see editors taking up 0.5% of the queue. So at some level, as mass creation is currently defined, it's already not happening, and therefore there's nothing to enforce.

If we could get editors to agree to have a clear definition, and it was substantially different from the current one, then there might be some reason to talk about enforcement mechanisms. But as it is, we don't win a definition, and nobody's "violating" the old one, so there's no need for any additional enforcement mechanisms. WhatamIdoing (talk) 03:08, 16 October 2022 (UTC)

I rather like the idea of defining mass creation in terms of proportion of the average daily non-redirect articles submitted. Are really running at only 500 a day? That doesn't seem a lot. Espresso Addict (talk) 03:49, 16 October 2022 (UTC)

The top of Wikipedia:Statistics says 566 articles per day, or something above 200K per year. That feels about right, since it took us about four and a half years to get from Wikipedia:Five million articles to Wikipedia:Six million articles. Special:NewPagesFeed shows 530 non-redirect articles from yesterday, of which a little less than half are from auto-patrolled editors and an unknown (to me) fraction are dab pages like Santa Maria Materdomini. Less than 24 hours after creation, only 17 (3.2%) of yesterday's articles remain in the unreviewed queue, and it's very likely that all 17 of those have been "reviewed", in the sense that every one of them, including those created at the end of the day (so we only have a few hours' page view data) has received between 8 and 98 page views, even if nobody was willing to push the approval button.

I'd say that clearing 97% of the queue less than 24 hours after creation is pretty good. WhatamIdoing (talk) 20:39, 16 October 2022 (UTC)

Thanks for this, WhatamIdoing. Time has obviously flown, I hadn't realised it had taken as much as 4.5 years to get from 5 to 6 million! The 50% autopatrolled seems correct based on my perceptions from intermittently eyeballing the queue. I suspect the high processed rate might be to do with the current NPP drive? The backlog is at a low at the moment. Espresso Addict (talk) 22:51, 16 October 2022 (UTC)

It might, but it might not. Most articles are pretty easy to process, so under normal circumstances one expects most of them to be reviewed during the first day. I haven't seen any recent numbers, but I understand that the typical time from creation to tagging for {{db-g3}} is normally less than 10 minutes. The problem is that it's too easy to focus on the borderline articles, which hang out in the queue for weeks and months because people are afraid to risk being the person who "approved" something that isn't great. WhatamIdoing (talk) 16:32, 18 October 2022 (UTC)

Re: Imagine, e.g., the various 40 under 40 awards. A magazine article like that counts towards notability, and it should continue to count even if they organize it into a database format. If it's a database where one can just access whichever magazine article was giving SIGCOV to the subject, then it's operating basically as an archives search and there wouldn't be any need to cite the database (aside from as a ref parameter) rather than the magazine directly. If it's guaranteed that receiving such an honor corresponds to GNG-meeting SIGCOV existing, then an argument can be made to add that criterion to some GNG-predicting SNG. If there is consensus that receipt suffices for ANYBIO, then it's already bypassing GNG coverage requirements. In the latter two cases, we wouldn't be using inclusion in the award database as SIGCOV itself, we would be using it as an indication of or alternative to GNG, which is distinct from the database source contributing to GNG. JoelleJay (talk) 03:53, 16 October 2022 (UTC)

The need to cite the database, rather than the thing the database is pointing you towards, is called WP:SAYWHEREYOUGOTIT. WhatamIdoing (talk) 20:40, 16 October 2022 (UTC)

Context. I said If it's a database where one can just access whichever magazine article was giving SIGCOV to the subject, then it's operating basically as an archives search and there wouldn't be any need to cite the database (aside from as a ref parameter) rather than the magazine directly. This was an indirect reference to Note: The advice to "say where you read it" does not mean that you have to give credit to any search engines, websites, libraries, library catalogs, archives, subscription services, bibliographies, or other sources that led you to Smith's book. JoelleJay (talk) 23:00, 18 October 2022 (UTC)

500 a day is a lot, given our ludicrous NPP process and the minuscule number of volunteers trying to implement it.—S Marshall T/C 08:19, 16 October 2022 (UTC)

Attempting to draft a definition of mass creation/article creation at scale

Latest comment: 2 years ago71 comments15 people in discussion

Attempting to kickstart collaborative drafting here... Skimming at the discussions, I think we need a definition that (informally):

Allows admins/experienced editors to identify mass creation without endless argument AND allows good-faith content contributors to breathe a sigh of relief, and think "not me".
Contains some numerical guidance BUT is not limited to numerical guidelines that can be gamed.

Thoughts, anyone? Pinging (hopefully) everyone who contributed in this discussion: @Rhododendrites, Vanamonde93, ActivelyDisinterested, Thryduulf, Red-tailed hawk, Paradise Chronicle, Devonian Wombat, Graeme Bartlett, Rlendog, Aquillion, Lurking shadow, WhatamIdoing, Blue Square Thing, Seraphimblade, LessHeard vanU, ONUnicorn, Ovinus, Scolaire, Nabla, Boca Jóvenes, XOR'easter, Steven Walling, and FOARP: Espresso Addict (talk) 01:35, 17 October 2022 (UTC)

Since you pinged me... there is pretty clear consensus above against any kind of numerical threshold, because it's both arbitrary and easily gamed. I would say don't bother. We don't need a definition and policy about how many articles you can make. We need to clarify our sourcing and verifiability requirements for some types of articles—people keep bringing up places for instance. It would be much easier and more targeted to do something like expand WP:NLAND to get specific about what kind of sources are required to make an article about that subject. For other things, like species, it's really obvious that there are some people here who don't like them, but there is very obviously a longstanding consensus for keeping them. Outside of places, I haven't heard a single compelling example where there is clear consensus that there is a pattern of stub creation that lots of editors agree are problematic. Steven Walling • talk 01:55, 17 October 2022 (UTC)

Thanks, Steven -- I pinged everyone, irrespective of whether they supported or opposed creating a definition, to avoid bias. I do think we need to mention some sort of numbers if only to reassure low-output content creators who might otherwise worry that the guideline applies to them and be put off contributing at a level that's entirely harmless. The potential harm of putting off productive editors who don't want to ask permission seems to me serious and hard to measure. I tend to agree with your other point that amending subject-specific guidelines to clarify what sort of sources are appropriate is a more useful way to proceed. Espresso Addict (talk) 02:21, 17 October 2022 (UTC)

I think maybe the larger issue is the creation of articles with poor sourcing to start with, and that doing so en masse is an exacerbation of that issue, not the problem in and of itself. But that may be out of scope to resolve here. (And "species" shouldn't really be an exception, nor should, well, anything. If some species are non-notable, that could be handled in List of species in genus ''Examplum''). That aside, I would say that at least to start with, "Mass creation" refers to the creation of a large number of articles from a single source or small number of sources, most or all of which are relatively small in size, whether through manual, semi-automated, or automated means. Simply being prolific in article creation is not in and of itself mass creation, though it may indicate it. Would that be useful as a starting point? Seraphimblade ^{Talk to me} 03:09, 17 October 2022 (UTC)

Seems overly broad and vague. It could easily apply to creation of census designated places based on US census data or other things that are inarguably notable. BTW there is really no such thing as a non-notable species, making it a great example of how a few editors participating here are trying to bother the hell out of helpful editors based on opinions about stubs which aren't founded in policy, guidelines, or consensus. Steven Walling • talk 03:25, 17 October 2022 (UTC)

It would absolutely apply to that. Those aren't "inarguably notable". Schools used to be "inarguably notable" until they, well, weren't. People who had played half a minute in a pro sports game used to be "inarguably notable" until they, well, weren't. Ultimately, all that stuff has to be cleaned up. The same will eventually happen with "census designated places" (I've already seen a lot of discontent there, and that's probably soon), species (that's probably farther out), and so on. But in the meantime, we don't need thousands of permastubs from database entries which ultimately will have to be dealt with. That's exactly what this discussion is about. Seraphimblade ^{Talk to me} 03:58, 17 October 2022 (UTC)

The definition of mass creation should not depend on the subject area. Nor should it depend on the quality of the pages. The definition should stand by itself. And mass creation may not be a problem. So I don't think we have to say whether it applies to people, places, species or chemicals, as it should apply to any topic area. The earlier discussion can decide on what kind of mass creation under what circumstances is acceptable, as it does not have to be totally ruled out. Graeme Bartlett (talk) 04:20, 17 October 2022 (UTC)

They weren't "inarguably notable" until they weren't; they were "presumed to be notable", which is an important difference. ~ ONUnicorn^{(Talk|Contribs)}problem solving 10:47, 17 October 2022 (UTC)

There is no difference in practice; see Wikipedia:Articles for deletion/John Charlton (footballer) and Wikipedia:Articles for deletion/John Charlton (footballer) (2nd nomination) (It was finally deleted at Wikipedia:Articles for deletion/John Charlton (footballer) (3rd nomination), after the NSPORTS RfC removed the presumption of notability). BilledMammal (talk) 11:06, 17 October 2022 (UTC)

That sports example is exactly how things should be working. Instead of writing an overly broad policy to try and restrict all mass creation of articles, we addressed a problem by discussing the notability of footballers and developing a consensus on a specific application of notability. That's a targeted and manageable debate to have, and it avoids enacting a policy that violates our fundamental principles of acting boldly and not assuming you need someone's permission to create articles. Steven Walling • talk 20:30, 17 October 2022 (UTC)

This whole process has been derailed by this issue such that it looks like nothing will be achieved. Catastrophising groundlessly about the consequences of having policies in place to deal with mass creation (which we already have anyway...) has led to people opposing any measure so long as there is not a definition of it that conforms to their preferred one. I'm very happy to simply state examples of what is clearly mass creation:

C46's Iranian census/GNIS "villages".
Lugnuts' Olympian articles based on Olympedia, Turkish neighbourhood articles, all the articles based on sports-reference.com.
Dr. Blofeld's Pakistani/Indian/Bangladeshi bot-created "village" articles based on GEOnet Names Server.
Rambot's US census articles.
The bot/hand-created minor planets articles created by ClueBot II that were cleaned up when WP:NASTRO was amended to remove the automatic notability for minor planets.

And frankly I don't feel any need to go any further than this. WP:MEATBOT already explains why these are bad (i.e., carelessness). All the "But what about the series of well-referenced articles Editor X is writing?" talk really needs to be rebuffed because looking at the above cases it is clear that if care is being taken and the articles are being written individually then there is no way that this is ever going to be a problem.

Obviously I am not opposed to any definition ever being developed, but I think this just best being done organically, through handling cases like the above ones. It quite simply isn't a good reason for obstructing any and all efforts to do anything about mass creation. FOARP (talk) 07:50, 17 October 2022 (UTC)

Point 2 of Espresso Addict comment is key for me. The last thing we should do is say (as a bad example) that 25 articles a day is mass creation, as inevitably someone will just limit themselves to 24 a day and argue that the rules don't apply to them.
I oppose both definitions, but supported some definition for this exact example. Strict limits would only allow problematic mass creators to wikilawyer their way out of the definition. Instead these should be part of a " may include but not limited to" kind of statement -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 11:08, 17 October 2022 (UTC)

With my admin hat on I think what we need is something that will be of help when editor A says, "editor B's contributions are mass creation and should be regulated", and admin C looks at B's contributions and has to decide whether or not to intervene. There's got to be adequate room for admin (or other trusted editor) discretion, but also so that an admin who hasn't participated in these discussions isn't having to re-invent the wheel. Espresso Addict (talk) 23:20, 17 October 2022 (UTC)

How about if one of the top (10?) article creators was not able to show their notability in 10 deletion discussions following which their articles were deleted?Paradise Chronicle (talk) 23:17, 18 October 2022 (UTC)

@ActivelyDisinterested, I'd like to agree with you, but then there's that sprawling discussion about whether someone creating just one or two articles per day, on subjects that are basically guaranteed to be kept at AFD, using sources that contain lists of other reliable sources, is culpable "mass" creation, as opposed to a mere 0.2% of the non-redirect articles created that day – such a tiny volume that no reviewer will notice if the number doubles or halves. We need something that protects the m:100wikidays folks from accusations of "mass creation" even if they happen to write short, sourced articles about notable subjects that certain editors aren't interested in.

I'd be happy to define this in terms of an overall x articles per y days. I'd also be willing to define it in terms of a percentage of the review queue. If you're taking up 10% of reviewers' time and effort each day, we should at least have a chat about Wikipedia:Autopatrolled in advance. When there's no definition, editors make up their own, and suddenly we discover than someone declares #100wikidays to be "illegal". We need a shared understanding, not different editors trying to enforce their own ideas on others. WhatamIdoing (talk) 02:57, 24 October 2022 (UTC)

We absolutely have to protect efforts like #100wikidays and they are obviously not mass creation, but hard limits will only see someone declare their creations "legal" because they only take up 9% of the queue not 10% (or whatever limit is in place). -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 14:15, 24 October 2022 (UTC)

So what if they do? Where to draw the line will always be a bit arbitrary, but if we draw it at 9% of the review queue, or 100 articles a month, or any other place, and people stay just underneath the speed limit, then that is a good thing. Draw the line at the point where it's not going to be a problem if someone stays on the "legal" side of the limit. WhatamIdoing (talk) 21:21, 25 October 2022 (UTC)

@FOARP I'd support the finding of some sort of definition to have a shortcut for the rather long discussions we are having since months. I am just chipping in from time to time, because sincerely its just too much to read and keep up with the developments. Just check your one edit up there. This is one edit of hundreds in this one discussion and there have been several discussions on the issue before. I'd like to have a WP:MASSCREATE where then anyone can read what is meant with it.Paradise Chronicle (talk) 22:18, 25 October 2022 (UTC)

People driving at the speed limit is a good thing, people always driving at the speed limit regardless is bad. If the definition was that 10% or X per day was to much, but being below this isn't an immediate exclusion that would be fine. We have the same situation with edit warring, the bright line isn't an excuse to continually edit war up to it. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 11:35, 26 October 2022 (UTC)

Definition of mass creation

In response to Espresso Addict, if we create a definition of mass creation I suggest it is done as a bright line rule to prevent gaming similar to WP:3RR; actions that meet the definition are mass-creation, but that doesn't mean that actions that don't meet the definition are not.

It may also be appropriate to use multiple definitions as there are multiple types of mass-creation.

For one of these definitions I suggest:

Creating more than ten articles through the use of a clearly identifiable template.

For example Edmund Chadwick and Francis Melhuish share a clearly identifiable template (NAME (BORN – DIED) was an COUNTRY cricketer active from START to FINISH who played for TEAM. He was born in LOCATION and died in LOCATION. He appeared in NUMBER first-class matches as a righthanded batsman, scoring NUMBER runs with a highest score of NUMBER, and held NUMBER catches.) as do Juan Carlos Ruíz and Aleixo Pereira (NAME (born BORN) is a COUNTRY footballer. He played in NUMBER matches for the COUNTRY national football team from YEAR to YEAR. He was also part of COUNTRIES squad for TOURNAMENT.). BilledMammal (talk) 02:00, 17 October 2022 (UTC)

Thanks, BilledMammal. So, to take an example relating to my creations (citations & images removed for clarity), if I were to write:

31 Madingley Road is a Modernist red-brick house in Madingley Road, west Cambridge, England, designed by Marshall Sisson for the classical archaeologist A. W. Lawrence in 1931–32. It is one of the first Modernist-style houses in Cambridge, and is listed at grade II.

White House is a Modernist concrete house in Conduit Head Road, west Cambridge, England, designed by George Checkley for the chemist Hamilton McCombie in 1930–31. It is considered the first Modernist-style house in Cambridge, and is listed at grade II.

and then more articles to a total of >10, with no further content (aside from sources), that would be the kind of thing you were hoping to regulate? Essentially writing out the content of a line of a table in words. (For clarity, I've conflated two houses by Checkley here, to make the template more obvious.) Espresso Addict (talk) 02:42, 17 October 2022 (UTC)

Yes; such mechanical creations can be appropriate, but they should be discussed first to address any issues. I would also suggest that if they are appropriate they should be done with a bot that won't make mistakes than by a human who might. BilledMammal (talk) 10:27, 17 October 2022 (UTC)

Where do I discuss them? Fwiw, BlackJack and Lugnuts (and others iirc) discussed the creation of articles such as the two cricketers above before creation. There was consensus that they were appropriate for creation at that time. Iirc only one editor suggested otherwise and there was a clear and obvious consensus. That discussion took place at the obvious wikiproject and reflected the notability standards as they existed at that time. Blue Square Thing (talk) 17:10, 17 October 2022 (UTC)

WP:VPR. WikiProjects are not suitable, per WP:LOCALCONSENSUS. Can you link those discussions? BilledMammal (talk) 22:16, 17 October 2022 (UTC)

I don't really mind where these end up being discussed, but those discussions show why we need a central ___location for them -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 18:39, 19 October 2022 (UTC)

@BilledMammal: I didn't see the query here - apologies for the delay. There's an argument that the wiki project, certainly at the time as none of this was that controversial then, was exactly the right place to discuss it. I don't have a link handy, but it would have been in March 2017. Blue Square Thing (talk) 08:48, 25 October 2022 (UTC)

10 is far too small a number. 10 a week is, arguably, too small a number - if I went for it I could easily do that using what might appear to other editors to be a "template" - also known as "my style of writing articles about xxx".

Further to that, should we be in any way concerned about editors creating 10 articles? The list FOARP suggests above is more appropriate wrt the sorts of things we probably should be concerned about. Blue Square Thing (talk) 09:07, 17 October 2022 (UTC)

Could you give some examples of articles that you have created manually that could be seen as being created with a template?

I think ten is ideal because once an editor reaches ten "templated creations" they are clearly engaged in mass creation, and we want to review those creations while they are still a manageable number. BilledMammal (talk) 10:27, 17 October 2022 (UTC)

I'd rather see this as part of a list of characteristics of mass creation. e.g. The definition mass creation creation may include:

The creation of a set of articles using a template,

Creating more than xx articles in a xx,

Etc,
-- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 11:21, 17 October 2022 (UTC)

Willingham St Mary, Ellough, Weston, Suffolk, Sotherton, Uggeshall, Stoven, Brampton with Stoven, Ringsfield, Redisham, Bixley Heath, Berner's Heath, 2007 Danmark Rundt, 2008 Danmark Rundt, 2009 Danmark Rundt, 2014 Danmark Rundt. For starters - and I don't create many articles at all. Clearly I'd need to ask permission. And that's before we get to list articles, which I've created a few of and are, by definition almost, template-like. Blue Square Thing (talk) 17:27, 17 October 2022 (UTC)

Looking at those examples, I see they started off with a template, but you quickly expanded them beyond that. I think Scolaire's proposal of a grace period to expand template articles is a good idea, and will address your concerns. BilledMammal (talk) 22:16, 17 October 2022 (UTC)

Honestly, not in each case. At least half of those have barely, if at all, been developed. Some have two edits to the entire article. Apparently I'm a mass creator who doesn't develop articles? Yet I barely create any articles. Bizarre. Blue Square Thing (talk) 08:48, 25 October 2022 (UTC)

31 Madingley Road is an excellent example of an article created using a template, or rather, its original version is, but it was soon expanded. That's the difference between it and the mass-created articles we're discussing here. I like BilledMammal's definition, but I would add: "...that have not been significantly expanded within a reasonable time (one week?)". Scolaire (talk) 11:55, 17 October 2022 (UTC)

I'm not knocking articles created using a template, good articles are good articles. The purpose is to head off the argumentative threads that plague this issue. The hope would be that by settling any challenge beforehand we don't end up with in the redirects, prods, AfDs, ANI, Arbcom situations that have happened in the past. -- LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 15:32, 17 October 2022 (UTC)

I don't see anything wrong with using a template to start with, and I think we need to distinguish between "prolific creators of reasonable quality articles", obviously something we should want to encourage, and "dumping databases wholesale into Wikipedia as 'articles'", which is really the more questionable practice. Another question may be, though, if an editor wants to initially use a database-populated template as a foundation to later write an actual article, why not use userspace or draft space to host the templated proto-articles, and then move to mainspace once they're ready? That seems a relatively straightforward solution. Seraphimblade ^{Talk to me} 16:09, 17 October 2022 (UTC)

Neither am I knocking articles created using a template. The point I am making is that mass-created stub articles of the kind we're talking about tend to remain just that. Stubs that are expanded are obviously not a problem, hence "that have not been significantly expanded within a reasonable time". Userfying mass-created stubs is something I suggested at the workshop stage. My suggestion didn't make it to the RfC, but I might try again at the AfD one. Scolaire (talk) 16:31, 17 October 2022 (UTC)

What's a reasonable timeframe? And how does that fit against the concept of there being no deadline? I expanded a stub this morning that was created in 2014 and had barely been changed since. Is that too long a time period? Blue Square Thing (talk) 17:10, 17 October 2022 (UTC)

I'm assuming the article you edited this morning wasn't one of a hundred articles mass-created in 2014. For mass-created articles, I'm saying that they should be defined as a number of stubs created following a template, and not expanded within a week. Scolaire (talk) 18:31, 17 October 2022 (UTC)

It was one of the two cricketers that BilledMammal picked out above, so, yes, it absolutely fits the mass created bill. The editor created 26 over a two day period using the same sort of style, and, it seems, 104 in a single month in the same way - probably all Lancashire cricketers by the looks of it. Experience suggests that I would be able to work through all 104 and find references for almost all of them and show clear notability for about 50%, depending exactly on the era involved. I'll see if I can work through that list and confirm the proportions - I've done that before and gotten the 50% figure from sampling similar sorts of groups. Blue Square Thing (talk) 19:11, 17 October 2022 (UTC)

However bot-like my first edit on the listed buildings appear, they actually needed multiple sources, some offline, some paywalled, to just get to that state. I always now create articles in my userspace and don't move them to mainspace until they get to at least a decent start; personally I think this is good practice, but not everyone works the same way, nor should do.

I'd agree that giving editors a reasonable grace period to expand is a good idea. If I want to create 10+ articles on a topic, I always do them one by one, wringing as much as I can out of the sourcing before moving on, but it's perfectly reasonable and almost certainly more ergonomic to do short stubs on all 10+ first, then expand all of them in stages, eg by adding what a particular source says on each of them. A single week seems quite short, especially as many people only edit at weekends.

And I agree with Blue Square Thing above that it's often easy to expand unpromising-looking stubs. I've rarely found a stub on a Brit non-sports bio that I couldn't expand without too much trouble, particularly since the Wikipedia Library started; eg say Marshall Sisson. Espresso Addict (talk) 22:49, 17 October 2022 (UTC)

Based on the above discussion, per Scolaire's and ActivelyDisinterested's suggestion, I would suggest:

Mass creation includes but is not limited to:

Creating articles through the use of a clearly identifiable template and not expanding them beyond the template within a fortnight
Creating more than xx articles in a xx
Etc

BilledMammal (talk) 22:16, 17 October 2022 (UTC)

Broadly agree with this, for given values of xx (50 articles in a week? 10 articles in a day?). I'd suggest adding "in a narrow topic area" to the first bullet. I think we could add a third bullet stating something about sources, eg all using the same or very similar sources. Espresso Addict (talk) 22:57, 17 October 2022 (UTC)

For "in a narrow topic area", I don't think that is necessary; a template suitable for sportspeople isn't suitable for villages. BilledMammal (talk) 23:03, 17 October 2022 (UTC)

With my admin hat on, I think it would make it clearer for an admin to interpret. One article on a Australian cricketer from one source, followed by another article on an English footballer from a different source, followed by a third article on a French Olympic shot-putter from another source might all use a rather similar template, but would presumably not result from a mass run (unless someone had found a sports database and was running purely alphabetically?). Espresso Addict (talk) 23:13, 17 October 2022 (UTC)

(unless someone had found a sports database and was running purely alphabetically?) - Lugnuts did that, as well as using different sources for the same template. I wouldn't want to exclude those situations. BilledMammal (talk) 23:24, 17 October 2022 (UTC)

Lugnuts was a rather extreme case; I don't recall anyone else being quite so prolific after the very early days of the 'pedia. Perhaps "usually in a narrow topic area" ? Espresso Addict (talk) 23:44, 17 October 2022 (UTC)

Using different databases for each type, and a slightly different format, would be a simple way of gaming this definition; the creator simply switching between databases (or downloading out of sequence). As I have said previously, it is intent and not the content that needs to be examined. LessHeard vanU (talk) 16:16, 18 October 2022 (UTC)

I am going to suggest something that may make it more difficult to judge the content, but which should not be easily gamed; The creation of a number of articles, from the same source or using the same format and especially both, without intent to expand or improve the encyclopedia. The kernel of this definition is that anyone so challenged can be considered liable for admin action if they do not answer, or do so in such a way that gives an indication that the do not have the projects best interest at heart. This also allows people to address the issue of the reliability of the source. Good answers and positive discussion will remain on talk pages, so well intentioned editors will be less liable to be questioned further. LessHeard vanU (talk) 15:56, 18 October 2022 (UTC)

@LessHeard vanU, I think you need a small copyedit. Creating an article is inherently "expanding the encyclopedia", so it's not really possible to have "the creation of a number of articles...without the intent to expand or improve the encyclopedia". See also Wikipedia:Arbitration Committee/Requests for comment/Article creation at scale#Question 12: Editors who mass-create stubs have a duty to expand the articles later, which you opposed. Here, you seem to say that editors must "intend" in advance to expand the articles, whereas there you opposed stopping them from creating more articles until they actually expanded them. WhatamIdoing (talk) 16:37, 18 October 2022 (UTC)

@WhatamIdoing No, I mean without intent to expand... the encyclopedia. While mass creation expands the encyclopedia, it can be argued that vandalism in adding nonsensible or vulgar text does the same. The point is the intent. If it is to increase edit counts, or article creation count, then the effect on increasing the words in the project is secondary or even no consideration - and yes, I do not believe you can insist on requiring an editor to expand a stub per WP:STUB. I am open to a better wording, perhaps dropping the expanding bit, but increasing the number of articles without the intent to improve the project is what I am aiming for. LessHeard vanU (talk) 19:33, 18 October 2022 (UTC)

Purely pragmatically I can't see this sort of intention-based guideline helping admins to differentiate acts of mass creation that require intervention from harmless/tolerated behaviours. I can decide on a speedy for nonsense or vandalism without having to understand the intent of the creator. Espresso Addict (talk) 23:05, 18 October 2022 (UTC)

I'd tend to agree. I think we get into rough waters when we start trying to divine someone's intentions, and it is possible for even people with truly good intentions to nonetheless behave in a very disruptive way. I'm more in favor of a guideline saying that mass creation is creating a substantial number of very short articles with little to no attempt to improve them beyond that. At that point, that needs some community oversight, regardless of the intentions of the individual doing it. Seraphimblade ^{Talk to me} 23:23, 18 October 2022 (UTC)

I'd be satisfied with a definition that says "mass creation is creating a substantial number of articles", regardless of length or quality (a million new FAs overnight would be undeniably "massive", after all), but I would prefer that the "substantial number" was defined as a rate: x articles per y timespan. It should not be left up to the imagination or subjective beliefs of whomever happens to be bothered by it. We had a humongous discussion last month about whether just one or two articles per day counts as "mass creation". I don't think there's any reason to believe that two articles in one day is "mass creation", but if that's the definition we want, then let's have that be the definition, and apply it to everyone. WhatamIdoing (talk) 05:37, 19 October 2022 (UTC)

Number alone won't do it. To start, that's susceptible to gaming (imagine if our only rule on edit warring were 3RR; we'd see tons of cases where people revert exactly three times, then wait 24 hours and one minute to do it again), and secondly, quality matters. While I certainly don't ever expect to see it, if someone actually could crank out 100 FA-quality articles in a week, I don't think anyone would object to that (hell, we'd be giving them all kinds of pats on the back for that). Even if they were above stubs, I think most would be alright with that. The objection is largely to massive creation of stubs, especially when that's done just from some kind of a database dump with little or no attempt to individually evaluate the suitability of each subject for an article. Seraphimblade ^{Talk to me} 07:27, 19 October 2022 (UTC)

I think someone could very easily post 100 top-quality articles in less than an hour, and I'd expect it to trigger a panic about copyvios. ("How else could someone create more than one article like this per minute?" We tend to forget that you can do the writing offline.)

"The objection" doesn't exist. There are instead "two completely unrelated objections":

The people who review articles do not want to have their work massively expanded with no warning and no opportunity to consider giving the editor WP:AUTOPATROLLED in advance. They normally see ~500 new articles per day and need to review only ~250 of them. If you dump 100 top-quality articles in an hour, you're taking up an unfair amount of space in their workload. These people care about mass creation per se. These are the people for whom quantity is its own quality. To use an analogy, they don't mind a bit of snow, but a blizzard causes practical problems for them.
The people who object to the addition of what they call "poor quality" articles. These people don't actually care about mass creation per se, except to the extent that if you hate people creating two-sentence stubs, then you really hate people creating a lot of two-sentence stubs. To repeat that analogy, they dislike any snow at all, and they really, really hate blizzards.

It sounds like you are in the second group. Please try to understand the perspective of the first group. The first group does not want the disruption of high-volume article creation at all, even for amazing articles. The first group is vulnerable to an Email bomb in the form of high-volume article creation. The "mass" part of "mass creation" is their concern. WhatamIdoing (talk) 14:52, 19 October 2022 (UTC)

The phrase "clearly identifiable template" won't do. For one thing, the word template has a specific meaning on Wikipedia but it seems that is not what is meant here. And for another, all our articles are expected to follow the guidance of MOS:LAYOUT which is such a standard structure. Because this is so stereotyped, I usually create a new article by copying the bones of an existing article so that I don't have to retype all the boilerplate. So the idea that our articles are or should be sui generis is mistaken.

As an example, see my latest creation: Ian Hay Davison. I started this as a stub using a familiar source – a Times obituary. I start such articles small because my initial focus is on establishing the core skeleton. Once one has the basic foundation laid down, then it's easy to develop the body incrementally. But before doing that, I add a web of cross links to knit the article into our overall structure. It's important to do that early because sometimes, when you do this, you find that the topic exists already under another title. And it's also good to get the seed planted early to signal to other editors that the topic has been started, because they may have been inspired by the same news item.

When one starts such an article then, even if you're auto-patrolled, it tends to attract gnomes and patrollers. These may butt in and start to change things. Depending on how it goes, I may or may not follow up. If I feel I've done enough or want to avoid conflict then I may walk away for a while. Or I may get busy with something else, on or off the project. Per WP:CHOICE, WP:DEADLINE and WP:OWN, we should not impose a duty to expand on creators as this will tend to discourage them from starting such promising topics. The way to get a task done is to start it.

As another example, consider the current FA, Yusuf I of Granada. This was started in 2007 as a stubby translation from the Spanish equivalent. A year later, it wasn't much bigger and still had no citations. But here we are now, 15 years later and it has been featured on the main page. "Good things come to those who wait..."

Andrew🐉(talk) 09:43, 19 October 2022 (UTC)

Does that mean, though, that if Yusuf I of Granada hadn't existed as a stub, the user that made these edits in 2009, or these in 2020, effectively writing a new article twice, would not have created an article at that time? The "stubs are useful because they encourage editors to expand them" argument is dubious, in my opinion; certainly for the mass-created stubs we're talking about here. Scolaire (talk) 13:19, 19 October 2022 (UTC)

It's hard to predict alternate futures, but we might not have such a good article now if it hadn't existed. Sometimes the existence of the article is what produces improvements. This is the "Oh, I've got to clean up that mess" response, which doesn't happen when there isn't a mess – or when people go to other websites, because their search engine results didn't include a Wikipedia page at all. WhatamIdoing (talk) 15:03, 19 October 2022 (UTC)

I have expanded stubs. I have also created articles. I have also completely re-written fairly lengthy articles. My motivation was always the same: the subject interested me and I wanted to have a half-decent Wikipedia article on it. I have never, for instance, found a stub by clicking "Random article" and expanded it. I can't imagine someone saying "Oh, I've got to clean up that mess" without having gone to that article because they were interested in the topic. Scolaire (talk) 16:02, 19 October 2022 (UTC)

Can you imagine someone being interested in a subject, seeing that Wikipedia doesn't have an article, and concluding from its non-existence that it must not be an appropriate or wanted subject for Wikipedia, and moving along?

Can you imagine someone being interested in a subject, searching for it their favorite web search engines, never ending up at Wikipedia, and never noticing that we were missing an article on that subject? WhatamIdoing (talk) 18:42, 19 October 2022 (UTC)

In the first instance, that conclusion might be the right one. How does one then determine if it is an appropriate subject? Well, by going and seeing if a substantial quantity of reliable and independent source material is available about it. If not, they're right—it isn't an appropriate subject. Seraphimblade ^{Talk to me} 14:38, 20 October 2022 (UTC)

In the second instance, someone who doesn't even notice that Wikipedia is missing an article on the subject is unlikely to be the person who would significantly expand a stub if it existed. Scolaire (talk) 11:56, 24 October 2022 (UTC)

The Ian Hay Davison article also demonstrates some of the issues here. For example, the article asserts, twice, that he was a "bell ringer". What in the hell is a "bell ringer"? I don't know, nothing in the article explains that (except that maybe it has something to do with a train station, but that increases, not decreases, the confusion), and the only source used for that assertion is paywalled, so I can't find out there either. The references leave notability plausible but questionable (the obituary is just fine, but the next is from an "inside baseball" publication and appears to be largely an interview), and there's just not much there. That would do better to have done a little more "baking" prior to going into mainspace. Seraphimblade ^{Talk to me} 13:27, 19 October 2022 (UTC)

I had wondered whether to wikilink bell ringer but thought it was so commonplace and obvious that it didn't require linking. Perhaps it's a regional thing as, when I wrote St Stephen's Church, I learnt that most churches in Scotland don't have sets of bells, for example. Of course, this was the subject's hobby when he retired but I liked the detail and so made a point of including it at the outset to help give the subject some personality. There's a lot more to say about his storied career, of course, but you have to have a taste for dry subjects like accountancy and corporate governance. Andrew🐉(talk) 14:57, 19 October 2022 (UTC)

Okay, gotcha, so it's meant very literally there. I figured it was something having to do with finance, trains, or maybe both, given the context. Seraphimblade ^{Talk to me} 00:16, 20 October 2022 (UTC)

It even has a 'scientific' name... LessHeard vanU (talk) 16:07, 26 October 2022 (UTC)

Have editors suggest a number and take the geometric mean. Ovinus (talk) 19:34, 19 October 2022 (UTC)

See the changes made in response to Scolaire's suggestions; they should address most of your concerns. The rest, if you believe there is better terminology than "template" then I have no objection to changing it. BilledMammal (talk) 06:59, 24 October 2022 (UTC)

How about "format" or "set format"? Scolaire (talk) 12:08, 24 October 2022 (UTC)

Or "pattern"? Espresso Addict (talk) 17:03, 24 October 2022 (UTC)

Also, I think Ovinus's suggestion is a very clever one. Even BilledMammal's proposed definition includes "Creating more than xx articles in a xx", and people's idea of xx seems to differ by orders of magnitude. Scolaire (talk) 12:32, 24 October 2022 (UTC)

Format for second RfC

Latest comment: 2 years ago1 comment1 person in discussion

I've posted a proposed format at WP:ADAS, feel free to comment at WT:ADAS. Valereee (talk) 17:21, 18 October 2022 (UTC)

Simple proposal regarding questionable databases

Latest comment: 2 years ago15 comments8 people in discussion

This is a bit late and the discussion is bloated, but maybe I'm missing something here. Why not just require consensus that a particular database is reliable enough for mass creation from it? That is, before the creation, other editors opine on whether the database is sufficiently reliable to avoid concerns of WP:V and accuracy? Of course there will always be a contingent of editors who reject mass creation on principle, but these could be weighted lower than arguments which, say, find factual errors or other sloppiness. It's pretty clear (although surprising—I thought this opinion would be widely held) that there isn't consensus to strongly restrict mass creation. So I think this may be a compromise better than nothing. Ovinus (talk) 18:06, 18 October 2022 (UTC)

I'd support that.Paradise Chronicle (talk) 21:22, 18 October 2022 (UTC)

I believe we already have that under the general requirement for sources to be reliable, though I suppose that doesn't require agreement before creation. I agree that the really troublesome mass creations have all involved databases that later turned out not to be sufficiently reliable. Perhaps the realistic way out of this tangle is to bring all the databases that are being used for this sort of creation to the reliable sources noticeboard, and get the truly unreliable ones formally deprecated? Or is the point that there are some that are reliable enough for ordinary creation, but not for mass creation? Espresso Addict (talk) 23:11, 18 October 2022 (UTC)

The problem with sports stub mass creation wasn't that the database sources were unreliable, it was that inclusion in the databases did not predict GNG with anything approaching the accuracy expected by NSPORT and did not itself contribute to SIGCOV at all. So when it was finally acknowledged that many NSPORT sport-specific inclusion criteria were NOT compliant with overall NSPORT requirements, we were left with tens of thousands of near-identical stubs with no way of telling which ones actually were on notable subjects and no way of expanding encyclopedic material from the sources that were present. If the creators (the biggest ones being Lugnuts and Blackjack) had been forced to find GNG-contributing sources for each subject, we would have orders of magnitude fewer stubs going through AfD. This is the whole reason we now require sportsperson bios to include at least one SIGCOV SIRS to benefit from the further coverage presumptions afforded by meeting an SSG criterion. Editors can still make as many stubs as they want, they just have to demonstrate GNG halfway (when unchallenged; if brought to AfD it is still expected that GNG be demonstrated "eventually") so that other editors can feel confident sources do exist and won't waste their time taking the article to AfD. JoelleJay (talk) 00:44, 19 October 2022 (UTC)

Sports stubs, I guess; I'm not familiar with Cricinfo or whatever. But for many of the GEOLAND stubs, not so much, yeah? As Espresso Addict says, it's okay to use meager databases as supplementary sources, but for mass creation we need high standards. Idk, I just feel like we need something, but there doesn't seem to be anything garnering consensus. Ovinus (talk) 01:11, 19 October 2022 (UTC)

I think there's a fundamental divide in the community on whether uncontroversial sourced articles on topics whose notability is unclear are a benefit, a neutral, or a negative. Whereas no-one sensible is going to disagree with not having articles whose content is simply wrong. Espresso Addict (talk) 02:14, 19 October 2022 (UTC)

The weak point in this is that the tendency has been to assume that the databases were reliable enough because they had some official authority behind them, and because a casual glance didn't refute that. Not every "populated place" in GNIS is not a settlement, and in some areas, most of them pan out as representing real towns. There are very few areas where one can guess ahead of time that everything in them is notable and adequately characterized: historic registries are one example, and another would be the old USCG lighthouse historical pages. It's somewhat ironic that in both cases, stub creation was abjured. The presenting problem has always been reviewing these; the expectation that an article will say enough that notability speaks for itself saves a lot of work in that regard. Mangoe (talk) 03:16, 19 October 2022 (UTC)

I don't think anyone really is suggesting that an entry in a database is sufficient to meet our community's notability guidelines. A challenge is that editors can easily transform a database entry into prose and create short article and then it takes some effort to delete those articles from the mainspace. However, what I think some editors here want to do is to figure out a way skip (or prevent) the AfD processes, suggesting that it is inappropriate to only cite or refer to a database (which is just a "collection of discrete values that convey information", which can be expressed as a simple list). However, because the community has determined, in some instances, all subjects that posses certain characteristics "warrants its own article," entry on a list that contains those characteristics could be sufficient as a start for an article. And I do not see the harm in creating a stub article on a structure on the national register of historic places, just based on its landmark status, an entry on a member of the 1st Utah House of Representatives, or an Academy Award winning producer. If editors disagree that an article doesn't meet GNG, that is what AfD is for - the problem isn't the database per se, but whether the community perceives the subject as meeting our (evolving) notability standard - Enos733 (talk) 05:19, 19 October 2022 (UTC)

@Enos733, I am convinced that some entries in some databases are entirely sufficient to demonstrate notability. Take a look at https://omim.org/entry/609423 This "entry in a database" has 62 inline citations and many thousands of words of prose in some 400–500 sentences. For comparison purposes, the median Wikipedia article probably has two or three inline citations and about a hundred words, and the sort of news article we routinely cite has zero inline citations and 600 words.

I agree with you that the problem isn't the database per se. We should spend less time focusing on how the source's content is stored, and more time on what the content is and whether it could be used to produce a decent-ish encyclopedia article. WhatamIdoing (talk) 05:50, 19 October 2022 (UTC)

No disagreement here. - Enos733 (talk) 05:52, 19 October 2022 (UTC)

The taxonomy WikiProjects have had discussion about which databases to follow. As I said in a prior thread, Wikipedia can't really cover species without relying on taxonomic databases; the databases answer the question of "which published species names are accepted as real species by the taxonomic community, and which species names are regarded as synonyms of real species". There are multiple good taxonomic databases for fish, amphibians, birds and plants. There have been discussion in the relevant WikiProjects that have resulted in consensus to base Wikipedia's taxonomy on one of these databases. For reptiles, spiders and gastropods, there is really only one good database, and there has been discussion about following that database. Fungi and moths/butterflies have multiple databases, but haven't had discussions about which to follow. Bacteria and viruses have a clear "best" database, but no discussion about following it.

FishBase is the database to follow according to consensus at WikiProject Fishes. But some editors have expressed opposition to creating articles sourced to FishBase.Plantdrew (talk) 15:56, 19 October 2022 (UTC)

One of the strengths of our WikiProjects is that they assemble editors who know something about the subject they're talking about. I might have been significantly meaner than @Plantdrew, and finished that message a different way: FishBase is the database to follow according to the editors who know what they're talking about, but people who don't know what they're talking about have expressed opposition to it.

This is one of the points the English Wikipedia always has to balance. We want site-wide consensus, with every editor treated equally, but the inescapable fact is that we don't have have equal knowledge. We will always have editors who judge sources according to their appearance, because they simply don't have the necessary knowledge to do any better than that. I've seen this happen multiple times with newspaper websites, with editors literally saying that a website is too ugly or flashy to be a "real" newspaper. It happens with databases, too. For example, I know enough to recommend OMIM. I know (barely) enough to recognize FishBase as a good source. I could not make a valid judgment about a similar database for popular music or athletes or motor vehicles or many, many other subjects. WhatamIdoing (talk) 16:23, 19 October 2022 (UTC)

Hence my suggestion for an evidence-based approach. Editors contesting the reliability of a database should find factual errors, sloppiness, errors of omission, etc.—they must prove a positive. If consensus indicates a database is unreliable or otherwise of low-quality, mass creation from that source should be prohibited. Ovinus (talk) 17:19, 19 October 2022 (UTC)

And maybe plain old ordinary "creation" as well. Unreliable sources do not make a sound basis for an article. WhatamIdoing (talk) 18:45, 19 October 2022 (UTC)

Sure, in principle... but we should have zero tolerance on mass creation. We can, and do, tolerate the creation of a few unreliable stubs, which are eventually deleted or fixed. We cannot tolerate tens, hundreds, thousands of them. Ovinus (talk) 18:50, 19 October 2022 (UTC)

Tall poppies

Latest comment: 2 years ago13 comments7 people in discussion

Jesswade88 has been featured in the media again, starting with the Washington Post, it seems: She's made 1,750 Wikipedia bios for women scientists who haven't gotten their due. Now this is obviously article creation at scale but we don't want to be interfering with this, do we?

I've worked with Jess at many editathons here in London and so can testify that she is an exceptional individual – highly competent, energetic and personable. So, one would expect good work from such a person and we should mainly look for ways to help or thank her. But the trouble is that most editors are not so proficient. She represents an exceptional acme or ideal, right?

As a counter-example, consider Nikolai Kurbatov. According to Quarry, this editor has created over 21,000 articles in the last five years – second only to Lugnuts. These seem to be mostly stubs about places in Russia or old Russian movies. I don't get the impression that they do much to expand them and so, if we have a problem, this is it.

Now 21,000 articles is quite a lot, but it's still a drop in the ocean when spread over five years, when we might expect a million or more new articles. So, should we be doing anything about this? I'm not inclined to mess with this because my command of Cyrillic and Russian is weak. I have once created an article about a place in Russia – Ambarnaya – which showed that there's work to be done but it's not a job for me as I'm soon out of my depth. If there's a Russian-speaker who's prepared to tackle it them I'm inclined to let them get on with it.

Now, here's the issue. Nikolai Kurbatov is not a chatty person but does describe himself on his user page:

I am a disabled with bipolar disorder. The last years i have constant depression, which destroys the desire to live, and daily productive work is the only joy in my life. I have created thousands articles about Russian rural localities and 1,195 articles about Russian and Soviet films and filmmakers in English Wikipedia.
— User:Nikolai Kurbatov

So, this editor is sadly suffering from a handicap and they also have the burden of working in a foreign language. So, it seems unrealistic to be loading them up with high expectations about the quality of their prose and other such details. If we start hassling them, then it seems likely that we will mainly make them miserable and eventually drive them off, as happened with Lugnuts.

Perhaps there's ways of helping such a person with training, mentoring, a support group or the like. But, in the meantime, what are the implications for this RfC? The question is whether Wikipedia is actually the encyclopedia that anyone can edit or whether it's only for superstars like Jess Wade?

Andrew🐉(talk) 12:15, 19 October 2022 (UTC)

Since there are two separate, unrelated objections to mass creation of articles, I believe there are two different responses:

The New Page Patrollers don't care about either case. Neither of those editors contribute materially to their workload.
The "high standards" folks love Jess Wade's work but at least some would object to Nikolai Kurbatov's work (examples: 1, 2, 3, 4).

Readers will appreciate both, and probably approximately equally. As an example, I sometimes look up place names while I'm reading, and these stubs are perfectly adequate for that need. Readers, especially in developed countries, are usually looking for "what's the name of the guy who...?" or "where's that place that was in the news?" Most readers don't get past the opening screen's worth of information anyway. They're not really thirsting for a detailed history of a small town in Dagestan. WhatamIdoing (talk) 16:13, 19 October 2022 (UTC)

Jess Wade's work is not mass creation. With regards to Kurbatov, while it's important to be empathetic and appreciative, there is no harm in simply asking him to format his references. Heck, if {{Cite web}} proves to be too confusing (fair enough) I'd be totally willing to help go through his recent creations and tidy them. However, I'm not familiar enough with his case to say anything definitive. Ovinus (talk) 17:34, 19 October 2022 (UTC)

He is formatting the references. Citation templates are not required, or even officially encouraged. But if you like them, then references formatted that way can be converted to citation templates in three clicks in the visual editor (example). If that matters to you, you can be one of the "Others will improve the formatting if needed" that WP:CITE mentions. WhatamIdoing (talk) 18:49, 19 October 2022 (UTC)

Great! Ovinus (talk) 18:58, 19 October 2022 (UTC)

As I said above "All the "But what about the series of well-referenced articles Editor X is writing?" talk really needs to be rebuffed because looking at the above cases [Lugnuts, C46, Dr. Blofeld, Rambot, Cluebot II etc.] it is clear that if care is being taken and the articles are being written individually then there is no way that this is ever going to be a problem.". It's OK to look at a list of potential article subjects and, exercising discretion about which to write about, find other sources to support articles on topics taken from that list. This really isn't what we're talking about when we talk about "Mass creation".

I really hope that this process is not just doomed to come to no conclusions at all, after all the time taken over it, simply because of WP:BURO-style discussions over specific definitions and potential forums. FOARP (talk) 07:52, 20 October 2022 (UTC)

One of the reasons why I have stopped commenting on this RfC. Donald Albury 18:59, 20 October 2022 (UTC)

There's a discussion about draft space at the Village Pump and it seems that some editors use this space for activity which might be considered mass creation.
I would just like to say for the record that I have over a thousand drafts in draft space right now, and I know of other editors who have something in that range. That's down from my high of more than 1,600 drafts, with 600+ having become articles so far...Most of my drafts are one or two line substubs on U.S. state supreme court justices...
— User:BD2412
I'm not sure that draft space helps get the work done but it's interesting that there's this substantial workflow happening there and so it seems a good example to add to this section.

Andrew🐉(talk) 08:31, 26 October 2022 (UTC)

Another different kind of example is the work of the Open Knowledge Association which is sponsoring editors to systematically translate featured articles from other languages. For example, see Kassite dynasty. This is obviously at the other end of the spectrum from the perfunctory stubs but it has some relevance in that there's a semi-manual mechanical process which plans to create lots of articles. And articles like that are so large and complex that they are difficult to review quickly and so that's another issue of scale.

Andrew🐉(talk) 08:39, 26 October 2022 (UTC)

I suppose it depends on what you're reviewing for. If your goal is to discover whether the subject is notable and/or whether the page qualifies for CSD, then I suggest that "translated from a Featured Article at the French Wikipedia" pretty much answers your questions right there. WhatamIdoing (talk) 23:36, 30 October 2022 (UTC)

The crux of it is how do we prevent problematic mass creation (non notable topics, unreliable/misinterpreted sources) without creating big hurdles for legitimate mass creation? I don't think prolific writers like Jess Wade fall under any of these proposals, but does seem reasonable to ask Nikolai Kurbatov to come to a noticeboard to make sure that writing a brief template-style article about every Russian selo in a certain region is something that would help build a quality encyclopedia. This doesn't mean that their work is suspect or unwanted (just as we're not suspicious of bots even though they're approved through a noticeboard), it just means that we need them to demonstrate the notability and verifiability of these topics before creating thousands of articles. Likewise, asking a group to explain their process for translating and ensuring notability of articles is a simple quality control measure that shouldn't be hard for them to meet and would prevent the possibility of flooding us with machine-translated junk. I would think that editors who want to help build a quality encyclopedia would be glad to cooperate with these small asks. –dlthewave ☎ 12:36, 26 October 2022 (UTC)
@Andrew Davidson: There is some discussion on the project page about requiring mass article creation to be performed in draft space, with moves to mainspace being subject to higher standards. This is precisely the reason why I have created my substantial set of drafts, as I have reasonably high standards for moving my own drafts to mainspace (at least two sources, and for the justices that make up the bulk of them, sourced information on their birth, death, dates of service, predecessors, and successors). BD2412 T 23:26, 26 October 2022 (UTC)

Which project is that, please? Possibilities seem to include WikiProject Missing encyclopedic articles; WikiProject Law; WikiProject United States courts and judges and the 50 projects for each of the states such as WikiProject Mayland. That's an awful lot of projects and this level of organisation seems unfeasible for other parts of the world such as Russia. The proliferation of namespaces, noticeboards, projects and policies seems to be excessive per WP:CREEP and so I'm not convinced we should add to it. This is another issue of scale. Andrew🐉(talk) 11:33, 27 October 2022 (UTC)

Potential for trial periods where a tenuous consensus can be found

Latest comment: 2 years ago14 comments8 people in discussion

It's tempting to see this as a trainwreck. The only question there's a clear consensus for is that we should have a definition, but then we can't agree on the definition... or what that definition should encompass... or what's most important... or how such a definition would apply in practice (permissions/restrictions). I'd like to encourage the closers to attempt a holistic view of the RfC to find threads of consensus for something. If that something feels tenuous, slap a trial period on it and call it an experiment, asking if we want to keep it after a year goes by.

We're heading into an RfC on deletion at scale, and I think many of us will find ourselves very uneasy supporting much on that topic if we don't have some kind of definition and process for mass creation to work with. It would be a difficult and not uncontroversial task, but I suspect you can find something to extract from this. — Rhododendrites ^talk \\ 12:59, 24 October 2022 (UTC)

Unfortunately, I agree with the trainwreck part. I'm not quite sure of the specific cause: From my perspective it's mostly what FOARP mentions, that a certain set of editors oppose on nitpicks, rigid principles, or theoretical questions of implementation that can be hashed out, rather than the broad idea. I also wonder what might help the deletion RfC. I already suggested the use of a talk-page-based evidence section (which seems to be gaining a bit of traction), but also wonder whether a "principle" or "proposed problem" section may be of use. For example: Principle/proposed problem: Mass deletion through many simultaneous AfDs are inefficient and often disruptive. (Support/oppose). This may allow editors to make stronger arguments, i.e., "Consensus is that X is a problem, so we need something to deal with X, and opposers do not propose an alternative." That said, too many principles may lead to bloat. Ovinus (talk) 18:50, 24 October 2022 (UTC)

We certainly need fewer proposals. The one you quote here is utterly reasonable - I don't think all that many people would argue against it. Perhaps we should take this one step at a time - OK, can we agree that. If so, here are three - **and no more** - things we might do about it. Can we agree any of these? Blue Square Thing (talk) 08:28, 25 October 2022 (UTC)

Like Less is more. Ovinus (talk) 21:08, 25 October 2022 (UTC)

Are you thinking of something like this?

Principle/proposed problem: It is a problem when some editors claim 'mass creation' for less than two articles per day, and others claim 'mass creation' doesn't apply until at least 25 articles per day

followed by:

Things we could do about it:
A. Define mass creation in terms of articles per time period.
B. Define mass creation in terms of the percentage of articles requiring review.
C. Eliminate the concept of mass creation entirely, and replace it with a concept such as "Article creations practices that require pre-approval".

I don't think this would work for all problems, but I think it's doable for others. WhatamIdoing (talk) 21:32, 25 October 2022 (UTC)

In terms of creation, probably. Blue Square Thing (talk) 08:10, 26 October 2022 (UTC)

Yeah, it's not necessarily helpful for all problems. But I think it may better guide discussion in most cases. The question is how to avoid the bloat associated with every new section. Ovinus (talk) 17:18, 26 October 2022 (UTC)

I have seen this kind of thing before. There can be mistrust about finding a compromise, and people are afraid that simple things like "define mass creation" will be used in the most bad faith way. I don't think en masse actions are inherently bad. But I think people can at least admit that doing it en masse is more controversial than doing it more incrementally. And we can find a way to make those processes breathe better. Shooterwalker (talk) 00:42, 26 October 2022 (UTC)

You can't invent consensus where there isn't one, even if you call it a trial period. A trainwreck is a trainwreck. Interestingly an earlier version of the arbitration remedy that spawned the RfC that spawned this RfC experimented with making it mandatory that the closers find a consensus one way or the other, but this was rightly rejected as a contradiction in terms that would inevitably lead to supervotes. I don't think the failure of this RfC has anything to do with "nitpicks", which are a feature of any Wikipedia discussion and don't seem more common here than usual. If there's any general conclusion that can be drawn, it's that the project as a whole curently doesn't see "mass creation" as a pressing problem in need of solutions, and that's precisely why this RfC failed. That's always a risk when you approach policy-making as a legislative process based on consultation, rather than an documentation process based on consensus. – Joe (talk) 05:23, 26 October 2022 (UTC)

In hindsight, two reasonable initial questions would have been: "Is mass creation sometimes a problem?" and "Is it often enough a problem to warrant special rules?" Ovinus (talk) 16:49, 26 October 2022 (UTC)

Fwiw: yes and no (well, very occasionally, but these can be sorted at ANI - see below). Does that sound about right? Blue Square Thing (talk) 12:57, 28 October 2022 (UTC)

I have recently discovered two other kinds of mass creation which I have detailed in the previous section, along with the other examples. These indicate that there's a variety of ways of creating articles at scale and that different approaches are being tried. We should be careful not to create some draconian new bureaucracy based on a simplistic or vague definition which might stifle such experiments.

My impression is that what's needed is a guideline which documents best practice and suggests alternatives so that editors have some successful models to follow. Andrew🐉(talk) 09:22, 26 October 2022 (UTC)

That sounds both reasonable and doable. It might need to include the "threat" that an editor's ability to create new articles can be limited via ANI - as has happened in the past: I'm fairly certain that I've seen restrictions on the number of articles that can be created, the ways in which they can be created and a minimum length for articles, all of which seem to have worked to limit issues when they've been identified. Blue Square Thing (talk) 12:57, 28 October 2022 (UTC)

FWIW, I was initially enthusiastic about this RfC, because I think mass creations of stub generally aren't very productive, and are positively destructive when they use poor sources and add large quantities of misinformation (e.g., the Carlossuarez stubs) that then require the time and energy of useful editors to clean up. After seeing the behavior on display in the Village Pump fish stubs discussion, my opinion altered. I'm now very wary about laying down hard guidelines here, because I think they'll be used in conjunction with "isolated demands for rigor" in applying other, general content policies to cut down on our coverage of subjects that a few people deem overrepresented.

I also think that trying to make an airtight guideline that would have prevented all of the problematic mass creations of the past is probably not an achievable goal. The Lugnuts stubs are probably the most egregious in terms of scale. But there was also an SNG covering them, with ostensible presumptions of general notability, and that SNG had many editors backing it. (Indeed, I wonder if that SNG and its presumptions would have been revised if it hadn't been for the assiduous mass creation of articles for which those presumptions were obviously untrue.) I think it would be better to develop guidance that encourages editors to consult with relevant stakeholders (WikiProjects or otherwise) and generally exert a higher degree of due diligence than we might expect of a casual editor. It's very hard to get users at AN/I to dig in to any dispute that requires significant content-specific understanding, but something like that would make it easier for concerned editors to say "We've tried to engage this creator, but they keep ignoring us and doing things that don't have support from other participants in the discussion, this is becoming disruptive and a behavioral issue" and get some response in those forums. Choess (talk) 14:39, 1 November 2022 (UTC)

New proposal

Latest comment: 2 years ago5 comments5 people in discussion

Xeno, Valereee, and MJL: I have created a new proposal (Prop 18). Thanks. NotReallySoroka (talk) 02:39, 30 October 2022 (UTC)

Thank you for letting people know that a new one has been added. There appears to be a proposal 2A as well, but I'm not sure when it was added and by whom as there's no signing going on.

In general terms, there are so many proposals now that I would actually be concerned if any of these were allowed to "pass" without being confirmed by wider consensus. I wrote somewhere on the project page that we're very much at the stage where people give up trying to unpick what the heck is going on and at that stage, imo, it's dangerous to find a consensus. If something looks like it might pass without overwhelming support, it just needs to be taken as a stand alone, un-editable proposal somewhere to check that there's wider consensus for it. Blue Square Thing (talk) 09:05, 30 October 2022 (UTC)

I stopped looking at the page after a few days because of the churning. I suspect a number of others have also stopped following the discussion. If new proposals have been added after many Wikipedians have expressed an opinion, then I do not see how such proposals have a chance of being accepted as consensus. I pity the closers. - Donald Albury 13:36, 30 October 2022 (UTC)

Agree with Donald Arbury. My hope shifted to RfC on AfD and in expectancy this one has opened, I at times check again and leave a vote or comment. Some editors are also way over the 300 word limit, which doesn't really encourage to read their arguments.Paradise Chronicle (talk) 14:18, 30 October 2022 (UTC)

Blue Square Thing is probably right in that the ideal way forward is to identify any proposals that appear to still have legs, and then hold a well-scoped single-issue RfCs on those (at a later time). –xeno^talk 16:12, 30 October 2022 (UTC)

How much time do you think we have to spend on this?

Latest comment: 2 years ago2 comments2 people in discussion

Aargh, too many questions! SpinningSpark 18:36, 1 November 2022 (UTC)

Spinningspark: Since it is meant to close soon, it might make sense to focus on the questions / proposals that seem close(r) to acceptance. –xeno^talk 15:12, 2 November 2022 (UTC)

[1] Froese, Rainer; Pauly, Daniel (eds.). "Entomocorus benjamini". FishBase. December 2011 version.

[2] Froese, Rainer; Pauly, Daniel (eds.). "Entomocorus benjamini". FishBase. December 2011 version.

[3] Ferraris, C.J. Jr., 2003. Auchenipteridae (Driftwood catfishes). p. 470-482. In R.E. Reis, S.O. Kullander and C.J. Ferraris, Jr. (eds.) Checklist of the Freshwater Fishes of South and Central America. Porto Alegre: EDIPUCRS, Brasil. ISBN 9788574303611

[1]

[1]

[2]