Wikipedia:Bots/Requests for approval/ScannerBot
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: 0xDeadbeef (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 01:48, Thursday, May 5, 2022 (UTC)
Function overview: Removes tracker tags in Twitter links.
Automatic, Supervised, or Manual: Automatic
Programming language(s): Python
Source code available: gist
Links to relevant discussions (where appropriate):
Edit period(s): One time run
Estimated number of pages affected: Probably 10000+
Namespace(s): Mainspace
Exclusion compliant (Yes/No): Yes
Function details: Finds twitter.com URLs and remove parameters named as s
or t
.
Discussion
Comments before change
|
---|
Comment: if a bot account is needed, I will probably use ScannerBot. 0xDEADBEEF (T C) 01:51, 5 May 2022 (UTC)
|
- Note: The functionality and the scope of the bot was made more specific. See page history for more details. 0xDeadbeef (T C) 06:28, 14 May 2022 (UTC)
- Regex? Primefac (talk) 15:13, 14 May 2022 (UTC)
- @Primefac: You can look at the gist I linked.
https://twitter.com/\w+/status/\d+\?[^\s}<|]+
is used to match the URL, and then urllib is used to parse, and then remove the parameters. 0xDeadbeef (T C) 15:19, 14 May 2022 (UTC)
- @Primefac: You can look at the gist I linked.
- Regex? Primefac (talk) 15:13, 14 May 2022 (UTC)
- You'll want to detect primary URLs, or skip archive URLs, changing those will break them. Archive URLs can be 20+ types, it's probably easiest to detect if the twitter URL starts with "/" (example in Brandon Clarke). -- GreenC 16:15, 14 May 2022 (UTC)
- Yeah, I should probably match
[^/]
or[\s=>]
for it to be primary. 0xDeadbeef (T C) 02:07, 15 May 2022 (UTC)- Great, thanks. Also WebCite like
https://www.webcitation.org/6d0sXMyOT?url=https://twitter.com
.. couple others use?url=
vs. "/" as the break point. -- GreenC 03:12, 15 May 2022 (UTC)- @GreenC: Hmm, then it would be hard to distinguish a template parameter from a URL parameter in an URL...
{{Foo|1=https://twitter.com}}
https://www.webcitation.org/6d0sXMyOT?url=https://twitter.com
0xDeadbeef (T C) 04:03, 15 May 2022 (UTC)
- Great, thanks. Also WebCite like
- Yeah, I should probably match