SchlurcherBot
Iscritto il 11 feb 2025
![]() | This user account is a bot that uses C#, operated by Schlurcher (talk). It is not a sock puppet, but rather an automated or semi-automated account for making repetitive edits that would be extremely tedious to do manually.
Administrators: if this bot is malfunctioning or causing harm, please block it. |
Emergency bot shutoff button Administrators: Use this button if the bot is malfunctioning (direct link) |
Function overview: Convert links from http://
to https://
Programming language: C#
Source code available: Main C# script: commons:User:SchlurcherBot/LinkChecker
Function details: The link checking algorithm is as follows:
- The bot extracts all http-links from the parsed html code of a page
- It searches for all href elements and extracts the links
- It does not search the wikitext, and thus does not rely on any Regex
- This is also to avoid any problems with templates that modify links (like archiving templates)
- Links that are subsets of other links are filtered out to minimize search and replace errors
- The bot checks if the identified http-links also occur in the wikitext, otherwise they are skipped
- The bot checks if both the http-link and the corresponding https-link is accessible
- This step also uses a blacklist of domains that were previously identified as not accessible
- If both links redirect to the same page, the http-link will be replaced by the https-link (the link will not be changed to the redirect page, the original link path will be kept)
- If both Links are accessible and return a success code (2xx), it will be checked if the content is identical
- If the content is identical, and the link is directly to the host, then the http-link will be replaced by the https-link
- If the content is identical but not the host, it will be checked if the content is identical to the host link, only if the content is different, then the http-link will be replaced by the https-link
- This step is added as some hosts return the same content for all their pages (like most ___domain sellers, some news sites or pages in ongoing maintenance)
- If the content is not identical, it will be checked if the content is at least 99.9% identical (calculated via the en:Levenshtein distance)
- This step is added as most homepages use dynamic IDs for certain elements, like for ad containers to circumvent Ad Blockers.
- If the content is at least 99.9% identical, the same host check as before will be performed.
- If any of the checked links fails (like Code 404), then nothing will happen.
Source for pages: The bot works on the list of pages identified through the external links SQL dump. The list was scrambled to ensure that subsequent edits are not clustered from a specific area.
Area | Language | Request | Pages | Status |
---|---|---|---|---|
Commons | Commons | Approved | 31'145'089 | ![]() |
Wikipedia | De | Approved | 1'888'381 | ![]() |
Wikipedia | En | Approved | 8'570'327 | ![]() |
Wikipedia | Es | Pending | 2'191'542 | ![]() |
Wikipedia | Fr | Approved | 2'970'187 | ![]() |
Wikipedia | It | Approved | 2'359'233 | ![]() |
Wikipedia | Ja | Allows global bots | 994'375 | ![]() |
Wikipedia | Pl | Approved | 1'527'763 | ![]() |
Wikipedia | Pt | Pending | 1'214'889 | ![]() |
Wikipedia | Ru | Allows global bots | 1'797'992 | ![]() |
Wikipedia | Zh | Allows global bots | 1'105'051 | ![]() |