Talk:Spam blacklist
- Proposed additions
- Please provide evidence of spamming on several wikis and prior blacklisting on at least one. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Please check back after submitting your report, there could be questions regarding your request.
- Proposed removals
- Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
- Other discussion
- Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
- Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
- #wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.
- Whitelists There is no global whitelist, so if you are seeking a whitelisting of a url at a wiki then please address such matters via use of the respective Mediawiki talk:Spam-whitelist page at that wiki, and you should consider the use of the template {{edit protected}} or its local equivalent to get attention to your edit.
Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.
Completed requests are marked as {{added}}
/{{removed}}
or {{declined}}
, and are generally archived quickly. Additions and removals are logged · current log 2025/05.
- Information
- List of all projects
- Overviews
- Reports
- Wikimedia Embassy
- Project portals
- Country portals
- Tools
- Spam blacklist
- Title blacklist
- Email blacklist
- Rename blacklist
- Closure of wikis
- Interwiki map
- Requests
- Permissions
- Bot flags
- New languages
- New projects
- Username changes
- Translations
- Speedy deletions
- snippet for logging
- {{sbl-log|12940357#{{subst:anchorencode:SectionNameHere}}}}
Proposed additions
This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived. |
edcoatescollection.com
edcoatescollection.com
Link is located on the SURBL list. Take note: SURBL Blacklist lookup and WOT Scorepage. Anarchyte 10:28, 7 July 2015 (UTC)
- Copied form enWP blacklist requests: site is linked from multiple projects. JzG (talk) 12:32, 16 July 2015 (UTC)
la-alopecia-areata.com
la-alopecia-areata.com
See here for IPs. Used cross-wiki. Jr Mime (talk) 00:16, 19 July 2015 (UTC)
- Added, reverted once, not loosing the time reverting over and over again drug spam. Thanks. —MarcoAurelio 14:11, 21 July 2015 (UTC)
megavision.co.in
megavision.co.in
User:NotebookSpares
Waiting until report is generated to decide what to do with it. —MarcoAurelio 14:08, 21 July 2015 (UTC)
nflhistory.net & mediaupdate19.com
nflhistory.net
mediaupdate19.com
The first ___domain redirects to the second which is a malicious site containing malware. I removed 140 links from articles yesterday. Please see this thread at WP:AN on en.wiki.
⋙–Berean–Hunter—► ((⊕)) 12:48, 22 July 2015 (UTC)
- Added --Herby talk thyme 12:55, 31 July 2015 (UTC)
url2it.com
url2it.com
URL shortener. MER-C (talk) 12:42, 31 July 2015 (UTC)
- Added --Herby talk thyme 12:50, 31 July 2015 (UTC)
Proposed additions (Bot reported)
This section is for domains which have been added to multiple wikis as observed by a bot.
These are automated reports, please check the records and the link thoroughly, it may report good links! For some more info, see Spam blacklist/Help#COIBot_reports. Reports will automatically be archived by the bot when they get stale (less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot).
|
The LinkWatchers report domains meeting the following criteria:
- When a user mainly adds this link, and the link has not been used too much, and this user adds the link to more than 2 wikis
- When a user mainly adds links on one server, and links on the server have not been used too much, and this user adds the links to more than 2 wikis
- If ALL links are added by IPs, and the link is added to more than 1 wiki
- If a small range of IPs have a preference for this link (but it may also have been added by other users), and the link is added to more than 1 wiki.
COIBot's currently open XWiki reports
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Proposed removals
This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.
Remember to provide the specific ___domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as See also /recurring requests for repeatedly proposed (and refused) removals. Notes:
|
syriadirect.org
syriadirect.org
This site offers good independent information about the Syrian civil war and it doesn't contain spam or anything like that. Honestly, I don't even know why it was put on the list in the first place but I would like to see it on the green so people can use it as reference in their articles about this conflict.
- Seems to be collateral damage for a regex response to spam. We can probably do a lookbehind regex fix for this. — billinghurst sDrewth 09:01, 18 December 2014 (UTC)
jerseyusa.net
jerseyusa.net
Collateral damage from some of the Chinese knockoff regexes -- whitelisting request. I'd rather have this addressed here. MER-C (talk) 12:29, 27 January 2015 (UTC)
- The rule is 'jerseys?(mvp|-)?(nba|shops?|goods|whole|wholesale|soho|release|zones|sale|com|pick|cn|export|supply|trade|site|warehouse|stop|faves|4u|kk|cc|ab|usa|outlets?|clubhouse|only|buy|planet|911)\.(com|us|org|net)\b' (my bolding) - not sure how to exclude jerseyusa.net from such a complicated rule - except if we split it into 'jerseys?(mvp|-)?(nba|shops?|goods|whole|wholesale|soho|release|zones|sale|com|pick|cn|export|supply|trade|site|warehouse|stop|faves|4u|kk|cc|ab|outlets?|clubhouse|only|buy|planet|911)\.(com|us|org|net)\b' and 'jerseys?(mvp|-)?(nba|shops?|goods|whole|wholesale|soho|release|zones|sale|com|pick|cn|export|supply|trade|site|warehouse|stop|faves|4u|kk|cc|ab|usa|outlets?|clubhouse|only|buy|planet|911)\.(com|us|org)\b'. --Dirk Beetstra T C (en: U, T) 08:48, 28 January 2015 (UTC)
- The whole thing gets regex'd anyway when the blacklist is applied, so just pull it out and do it separately. — billinghurst sDrewth 05:37, 1 February 2015 (UTC)
effd39.org
effd39.org
No idea why this website is blacklisted and I am new to this process but the link in question is the official website for the en:East Fishkill Fire District.--Zackmann08 (talk) 21:30, 16 June 2015 (UTC)
- This is caught by a complex regex, I'll work this one out. Approved. --Dirk Beetstra T C (en: U, T) 03:32, 17 June 2015 (UTC)
sysoon.com
sysoon.com
The website was blacklisted a few years ago - including regex term "sysoon" ( globally blacklisted by \bsysoon\b ) becose there is more international websites worldwide sysoon.com, sysoon.uk, sysoon.be, sysoon.de, etc... Please check if blacklist is necessary anymore, becose there is many userful information to use (funeral and cemeteries resource, more languages suport, easy and fast research). My research show that new owner is not using any bad practices. In 2012 and 2015 - WebbyAwards honoree, or see article Article: The rise of the e-funeral—The preceding unsigned comment was added by 88.212.54.54 (talk) 11:58, 11 July 2015
- @88.212.54.54: We are not going to blanket unlist, one or several of these were spammed, hence the blanket blacklisting. It is possible to exclude one, or two, but there should be evidence that the sites are of use for Wikipedia, not whether blacklisting is still necessary - that is very difficult evidence to gather (if even possible): we've had spammers return shortly after blacklisting rules were removed from the list. --Dirk Beetstra T C (en: U, T) 03:20, 12 July 2015 (UTC)
- Declined Not actually required on projects as far as I can see - could always be whitelisted locally if a project had a need for a link. Herby talk thyme 13:20, 31 July 2015 (UTC)
Troubleshooting and problems
SBHandler broken
SBHandler seems to be broken - both Glaisher and I had problems that it stops after the closing of the thread on this page, but before the actual blacklisting. Do we have someone knowledgeable who can look into why this does not work? --Dirk Beetstra T C (en: U, T) 04:08, 30 April 2014 (UTC)
User:Erwin - pinging you as the developer. --Dirk Beetstra T C (en: U, T) 04:16, 30 April 2014 (UTC)
- FYI when you created this section with the name "SBHandler", you prevented SBHandler from being loaded at all (see MediaWiki:Gadget-SBHandler.js "Guard against double inclusions"). Of course, changing the heading won't fix the original issue you mentioned. But at least it will load now. PiRSquared17 (talk) 15:30, 18 June 2014 (UTC)
- Another issue is that there's a bogus "undefined" edit summary when editing the SBL log. The customization of the script via our monobooks looks also broken. Thanks. — M 10:57, 06 December 2014 (UTC)
error block
jxlalk.com
www.jxlalk.com/888/d17/2013-11-06/1349.html 祝融八姓的新研究
- I do not find jxlalk.com in our global blacklist. — billinghurst sDrewth 12:45, 7 January 2015 (UTC)
- Test the link, it's matched by xlalk.com (without j). VT offers "clean". –Be..anyone (talk) 08:18, 15 January 2015 (UTC)
- The rule is 'xlal[0-9a-z-]*\.com' .. that is going to be a very difficult one to whitelist, as I don't know what the original intent was to block (which may very well have included 'jxlala.com', so excluding only the 'j' is not the way forward). I would suggest that you get '\bjxlalk\.com\b' added to your local whitelist. --Dirk Beetstra T C (en: U, T) 08:38, 15 January 2015 (UTC)
- Closed Please seek local whitelisting for the ___domain in question. — billinghurst sDrewth 11:26, 16 January 2015 (UTC)
Bit of reference:
xlale.com
This was a rule that was removed after the 'blanket' xlal[0-9a-z-]*\.com was imposed. None of the additions are logged, but the '*' in the xlal-rule suggests that the string occured in multiple links with multiple characters after the xlal. --Dirk Beetstra T C (en: U, T) 04:31, 20 January 2015 (UTC)
xlalu.com
Another one, multiple ___domain spamming using open proxies back in the days. The problem with these wide rules is that it is difficult to see what should be covered and what we can 'let through'. Xlalu.com is defined as a malicious site by my internet filter. --Dirk Beetstra T C (en: U, T) 05:33, 20 January 2015 (UTC)
- I reviewed the history of the blacklisting. See Talk:Spam_blacklist/About#Old_blacklisting_with_scanty_history. Xlalu.com is not registered at this time. Xlale.com is registered in China, and may not be connected with the original spammer, but the site is not functional anyway, it does not respond to pings.
- jxlalk.com shows no sign of being connected. The original spam was in a short period over 8 years ago, and was apparently not cross-wiki. This was before en.wiki had a spam blacklist, so all blacklisting was done here. This should not be on the spam blacklist. In 2008, the admin who added it attempted to remove all his listings, and that was reverted. Maybe it's time to clean this one up. If nothing else the original regex could be eliminated and the specific sites added, so that jxlalk.com is not caught. But I simply removing it is easiest and harmless. --Abd (talk) 01:35, 21 January 2015 (UTC)
- xlalu is defined as a malicious site by my internet filter, and I am still not sure if it is only xlale and xlalu, it may have also been xlalk and xlulabd that have been spammed. I AGF on the admin who added the rules that there was a reason for its broadness. You may be very right that the current owner of the site may have nothing to do with the spam, it may also be that the past abuser did not have anything to do with the site, but was abusing the site to make some point here on Wikipedia. It is not always a site-owner that is the spammer, and for the more sophisticated spammers, it is likely not the site-owner that spams.
- I know that jxlalk.com does not show any sign of being connected, what I do see is just ONE wiki where ONE link is needed, and here a regex where exluding jxlalk.com specifically is giving a too difficult rule. And excluding specific domains from a rule is exactly why Wikis have a whitelist. --Dirk Beetstra T C (en: U, T) 05:53, 21 January 2015 (UTC)
xlala.com
... --Dirk Beetstra T C (en: U, T) 05:58, 21 January 2015 (UTC)
This is a can of worms, I find now also dozens of 'domains' spammed by dozens of SPAs (open proxies?) - both deleted and existing diffs on many different pages. This shows that the IPs were active globally (this edit is clearly of the same type of the spamming edits on en.wikipedia, though here no links were included). My advice stays: whitelist it locally, or show me exactly what was abused and we might consider a narrowing-down, excluding some collateral damage (as I said earlier, xlulu.com gives me a 'malicious site'-warning, I don't think that wholesale removal is a good idea). --Dirk Beetstra T C (en: U, T) 06:13, 21 January 2015 (UTC)
- Thanks, Beetstra. xlala.com would fit a reasonable pattern used by the spammers. It's a "___domain for sale." Dropped twice. However, xlalu.com is dead, and was dropped four times (nobody owns the ___domain, so your browser is relying on something old, that probably was not the original spammed ___domain owner. After all, it's been over eight years). xlale.com is not an active site, and is unlikely to be the original owner either. It was dropped four times as well.
- The wikibooks link cited shows 61.64.115.253 as editing Wikibooks once, no spam. Global contributions show 5 edits with no visible relation to the xlale, xlalu, and xlala spammers. No spam, except one link [1] adding ひきこもり. How is this edit "of the same type of spamming edits on en.wikipedia"? We only have seen Contributions/71.207.33.167. The IP on enwiki added "Cool site!" followed by a series of links that could be highly deceptive, i.e., imitating real insurance company pages. Obviously, you may have seen more than that, but the Wikibooks IP added "Hi, you have very good site! Thank you!" This IP shows zero sign of being a spammer. The single link added is not blacklisted.
- Yes, if there were a substantial reason to continue the blacklisting, then the whitelisting solution would make some sense. However, whitelisting is quite impractical for most users. I've researched problem blacklistings, and saw what happened with users. Very few will request a whitelisting, they just give up. And for some of those who do, the request sits for months. Many admins are clueless about the whole process, not to mention users.
- Beetstra, I think you have never understood this. You see one request from one wiki. For every one request, there are likely dozens or more users frustrated who just give up. Ordinary users do not come to meta for delisting requests, especially if English is not their language. --Abd (talk) 21:11, 21 January 2015 (UTC)
- The diff I gave uses the same language, same edit summary as the spammers, that editor is the cross-wiki spammer, the same person. The diff you gave on ja.wikipedia is not the spammer, it is a physically different person. --Dirk Beetstra T C (en: U, T) 03:22, 22 January 2015 (UTC)
- "I've researched problem blacklistings, and saw what happened with users. Very few will request a whitelisting, they just give up." .. HOW do you know that a user gives up in stead of requesting whitelisting/deblacklisting. If you know it is a 'problem blacklisting' (as you call it), someone must have .. gone through the point of mentioning that it was a problem (e.g. ask for de-listing/whitelisting), otherwise you do not know what the intention was of the editor who ran into the blacklisting, you don't even know whether users ran into the rule. 'For every one request, there are likely dowzens or more users frustrated who just give up'. Yet, ZERO evidence that even one user ran into the rule, or, if users ran into the rule, that they were not actually spammers trying to add the link. --Dirk Beetstra T C (en: U, T) 05:22, 22 January 2015 (UTC)
- We're done again, you found it necessary to make it personal again (as usual defense in these threads), ánd make yet more sweeping statements without evidence. If someone else has something substantial to add we can discuss further. --Dirk Beetstra T C (en: U, T) 03:22, 22 January 2015 (UTC)
- Removed, per provided evidence of no 'spammy hits'. --Dirk Beetstra T C (en: U, T) 07:06, 22 February 2015 (UTC)
palace.com
palace.com
- For reference: m:Talk:Spam_blacklist/Archive/Ukrainian_paper-writing_spam. That rule may indeed be too broad, maybe the two requests below should go on meta for an exclusion onto the rule. --Dirk Beetstra T C 17:04, 26 April 2015 (UTC)
www.lausanne-palace.com
lausanne-palace.com
For the article Lausanne Palace, I would like to use the official website www.lausanne-palace.com which is blocked because it contains "palace.com". Can you please allow this page? Johndrew Andson (talk) 18:11, 15 April 2015 (UTC).
www.host-palace.com
host-palace.com
For the article List of Internet exchange points, I would like to use the official website www.host-palace.com which is blocked because it contains "palace.com". Can you please allow this page? --Never stop exploring (talk) 01:53, 26 April 2015 (UTC)
Discussion regarding palace.com
Above copied from en:MediaWiki talk:Spam-whitelist#palace.com. Suggest to 'whitelist' these through exclusion of the two prefixes 'lausanne-' and 'host-' onto \bpalace\.com\b. Posting here to get reports for analysis. --Dirk Beetstra T C (en: U, T) 17:13, 26 April 2015 (UTC)
- I think that we would do well to either just remove the term or make it so that it is the whole term, ie. look to better blacklisting if required. It looks to be a pretty crude attempt to manage a dictionary word that could be prepended with many terms. — billinghurst sDrewth 03:28, 30 April 2015 (UTC)
"4gk.com"'s regexp to be clarified
4gk.com
4gk.com.ua
Hi. Please clarify \b4gk\.com\b
to something which won't be triggered by "4gk.com.ua". Similar to this case. Thanks. --Base (talk) 12:49, 17 May 2015 (UTC)
- Any progress? Come on guys it's just edit with example given and you are already doing it for over 2 weeks. --Base (talk) 16:00, 3 June 2015 (UTC)
- Done Negative lookahead added to the regex. — billinghurst sDrewth 13:11, 18 June 2015 (UTC)
derefer.unbubble.eu deblock
This authority is used 24.923 times in main space in dewiki!. It is used to clean up Special:Linksearch from known dead links, by redirecting them over this authority. It is hard to find a better solution for this task. --Boshomi (talk) 16:38, 24 July 2015 (UTC) Ping:User:BillinghurstBoshomi (talk) 16:49, 24 July 2015 (UTC)
- Please notice Phab:T89586, while not fixed, it is not possible to find the links with standard special:LinkSearch. in dewiki we can use giftbot/Weblinksuche instead.--Boshomi (talk) 18:04, 24 July 2015 (UTC)
- afaics derefer.unbubble.eu could be used to circumvent the SBL, is that correct? -- seth (talk) 21:30, 24 July 2015 (UTC)
- I don't think so, the redircted URL is unchanged, so the SBL works like the achive-URLs to the Internet Archive. --Boshomi (talk) 07:44, 25 July 2015 (UTC)
- It is not a stored/archived page at archive.org, it is a redirect service as clearly stated at the URL and in that it obfuscates links. To describe it in any other way misrepresents the case, whether deWP uses it for good or not. We prevent abuseable redirects from other services due to the potential for abuse. You can consider whitelisting the URL in w:de:MediaWiki:spam-whitelist if it is a specific issue for your wiki. — billinghurst sDrewth 10:09, 25 July 2015 (UTC)
- what I want to say was that the SBL-mechanism works in the same way like web.archive.org/web. A blocked URL will be blocked with unbubble-prefix to the blocked URL.--Boshomi (talk) 12:54, 25 July 2015 (UTC)
- It is not a stored/archived page at archive.org, it is a redirect service as clearly stated at the URL and in that it obfuscates links. To describe it in any other way misrepresents the case, whether deWP uses it for good or not. We prevent abuseable redirects from other services due to the potential for abuse. You can consider whitelisting the URL in w:de:MediaWiki:spam-whitelist if it is a specific issue for your wiki. — billinghurst sDrewth 10:09, 25 July 2015 (UTC)
- I don't think so, the redircted URL is unchanged, so the SBL works like the achive-URLs to the Internet Archive. --Boshomi (talk) 07:44, 25 July 2015 (UTC)
- afaics derefer.unbubble.eu could be used to circumvent the SBL, is that correct? -- seth (talk) 21:30, 24 July 2015 (UTC)
Discussion
This section is for discussion of Spam blacklist issues among other users. |
Expert maintenance
One (soon) archived and rejected removal suggestion was about jxlalk.com matched by a filter intended to block xlalk.com. One user suggested that this side-effect might be as it should be, another user suggested that regular expressions are unable to distinguish these cases, and nobody has a clue when and why xlalk.com was blocked. I suggest to find an expert maintainer for this list, and to remove all blocks older than 2010. The bots identifying abuse will restore still needed ancient blocks soon enough, hopefully without any oogle matching google cases. –Be..anyone (talk) 00:50, 20 January 2015 (UTC)
- No, removing some of the old rules, before 2010 or even before 2007, will result in further abuse, some of the rules are intentionally wide as to stop a wide range of spamming behaviour, and as I have argued as well, I have 2 cases on my en.wikipedia list where companies have been spamming for over 7 years, have some of their domains blacklisted, and are still actively spamming related domains. Every single removal should be considered on a case-by-case basis. --Dirk Beetstra T C (en: U, T) 03:42, 20 January 2015 (UTC)
- Just to give an example to this - redirect sites have been, and are, actively abused to circumvent the blacklist. Some of those were added before the arbitrary date of 2010. We are not going to remove those under the blanket of 'having been added before 2010', they will stay blacklisted. Some other domains are of similar gravity that they should never be removed. How are you, reasonably, going to filter out the rules that never should be removed. --Dirk Beetstra T C (en: U, T) 03:52, 20 January 2015 (UTC)
- By the way, you say ".. intended to block xlalk.com .." .. how do you know? --Dirk Beetstra T C (en: U, T) 03:46, 20 January 2015 (UTC)
- I know that nobody would block
icrosoft.com
if what they mean ismicrosoft.com
, or vice versa. It's no shame to have no clue about regular expressions, a deficit we apparently share. –Be..anyone (talk) 06:14, 20 January 2015 (UTC)- I am not sure what you are referring to - I am not native in regex, but proficient enough. The rule was added to block, at least, xlale.com and xlalu.com (if it were ONLY these two, \bxlal(u|e)\.com\b or \bxlal[ue]\.com\b would have been sufficient, but it is impossible to find this far back what all was spammed, possibly xlali.com, xlalabc.com and abcxlale.com were abused by these proxy-spammers. --Dirk Beetstra T C (en: U, T) 08:50, 20 January 2015 (UTC)
- I know that nobody would block
- xlalk.com may have been one of the cases, but one rule that was blacklisted before this blanket was imposed was 'xlale.com' (xlale.com rule was removed in a cleanout-session, after the blanket was added). --Dirk Beetstra T C (en: U, T) 04:45, 20 January 2015 (UTC)
- The dots in administrative domains and DNS mean something, notably
foo.bar.example
is typically related to an administrativebar.example
___domain (ignoring well-known exceptions likeco.uk
etc., Mozilla+SURBL have lists for this), whilefoobar.example
has nothing to do withbar.example
. –Be..anyone (talk) 06:23, 20 January 2015 (UTC)- I know, but I am not sure how this relates to this suggested cleanup. --Dirk Beetstra T C (en: U, T) 08:50, 20 January 2015 (UTC)
- If your suggested clean-ups at some point don't match jxlalk.com the request by a Chinese user would be satisfied—as noted all I found out is a VirusTotal "clean", it could be still a spam site if it ever was a spam site.
- The regexp could begin with "optionally any string ending with a dot" or similar before xlalk. There are "host name" RFCs (LDH: letter digit hyphen) up to IDNAbis (i18n domains), they might contain recipes. –Be..anyone (talk) 16:56, 20 January 2015 (UTC)
- What suggested cleanups? I am not suggesting any cleanup or blanket removal of old rules. --Dirk Beetstra T C (en: U, T) 03:50, 21 January 2015 (UTC)
- I know, but I am not sure how this relates to this suggested cleanup. --Dirk Beetstra T C (en: U, T) 08:50, 20 January 2015 (UTC)
- The dots in administrative domains and DNS mean something, notably
- I have supported delisting above, having researched the history, posted at Talk:Spam_blacklist/About#Old_blacklisting_with_scanty_history. If it desired to keep xlale.com and xlalu.com on the blacklist (though it's useless at this point), the shotgun regex could be replaced with two listings, easy peasy. --Abd (talk) 01:42, 21 January 2015 (UTC)
- As I said earlier, are you sure that it is only xlale and xlalu, those were the two I found quickly, there may have been more, I do AGF that the admin who added the rule had reason to blanket it like this. --Dirk Beetstra T C (en: U, T) 03:50, 21 January 2015 (UTC)
- Of course I'm not sure. There is no issue of bad faith. He had reason to use regex, for two sites, and possibly suspected additional minor changes would be made. But he only cited two sites. One of the pages was deleted, and has IP evidence on it, apparently, which might lead to other evidence from other pages, including cross-wiki. But the blacklistings themselves were clearly based on enwiki spam and nothing else was mentioned. This blacklist was the enwiki blacklist at that time. After enwiki got its own blacklist, the admin who blacklisted here attempted to remove all his listings. This is really old and likely obsolete stuff. --Abd (talk) 20:07, 21 January 2015 (UTC)
- 3 at least. And we do not have to present a full case for blacklisting (we often don't, per en:WP:BEANS and sometimes privacy concerns), we have to show sufficient abuse that needs to be stopped. And if that deleted page was mentioned, then certainly there was reason to believe that there were cross-wiki concerns.
- Obsolete, how do you know? Did you go through the cross-wiki logs of what was attempted to be spammed? Do you know how often some of the people active here are still blacklisting spambots using open proxies? Please stop with these sweeping statements until you have fully searched for all evidence. 'After enwiki got its own blacklist, the admin who blacklisted here attempted to remove all his listings.' - no, that was not what happened. --Dirk Beetstra T C (en: U, T) 03:16, 22 January 2015 (UTC)
- Hi!
- I searched all the logs (Special:Log/spamblacklist) of several wikis using the regexp entry /xlal[0-9a-z-]*\.com/.
- There were almost no hits:
- Of course I'm not sure. There is no issue of bad faith. He had reason to use regex, for two sites, and possibly suspected additional minor changes would be made. But he only cited two sites. One of the pages was deleted, and has IP evidence on it, apparently, which might lead to other evidence from other pages, including cross-wiki. But the blacklistings themselves were clearly based on enwiki spam and nothing else was mentioned. This blacklist was the enwiki blacklist at that time. After enwiki got its own blacklist, the admin who blacklisted here attempted to remove all his listings. This is really old and likely obsolete stuff. --Abd (talk) 20:07, 21 January 2015 (UTC)
w:ca: 0 w:ceb: 0 w:de: 0 w:en: 1: 20131030185954, xlalliance.com w:es: 1: 20140917232510, xlalibre.com w:fr: 0 w:it: 0 w:ja: 0 w:nl: 0 w:no: 0 w:pl: 0 w:pt: 0 w:ru: 0 w:sv: 0 w:uk: 0 w:vi: 0 w:war: 0 w:zh: 1: 20150107083744, www.jxlalk.com
- So there was just one single hit at w:en (not even in the main namespace, but in the user namespace), one in w:es, and one in w:zh (probably a false positive). So I agree with user:Abd that removing of this entry from the sbl would be the best solution. -- seth (talk) 18:47, 21 February 2015 (UTC)
- Finally an argument based on evidence (these logs should be public, not admin-only - can we have something like this in a search-engine, this may come in handy in some cases!). Consider removed. --Dirk Beetstra T C (en: U, T) 06:59, 22 February 2015 (UTC)
- By the way, Seth, this is actually no hits - all three you show here are collateral. Thanks for this evidence, this information would be useful on more occasions to make an informed decision (also, vide infra). --Dirk Beetstra T C (en: U, T) 07:25, 22 February 2015 (UTC)
- I am not sure that we want the Special page to be public, though I can see some value in being able to have something at ToolLabs to be available to run queries, or something available to be run through quarry. — billinghurst sDrewth 10:57, 22 February 2015 (UTC)
- Why not public? There is no reason to hide this, this is not BLP or COPYVIO sensitive information in 99.99% of the hits. The chance that this is non-public information is just as big as for certain blocks to be BLP violations (and those are visible) ... --Dirk Beetstra T C (en: U, T) 04:40, 23 February 2015 (UTC)
- I am not sure that we want the Special page to be public, though I can see some value in being able to have something at ToolLabs to be available to run queries, or something available to be run through quarry. — billinghurst sDrewth 10:57, 22 February 2015 (UTC)
- So there was just one single hit at w:en (not even in the main namespace, but in the user namespace), one in w:es, and one in w:zh (probably a false positive). So I agree with user:Abd that removing of this entry from the sbl would be the best solution. -- seth (talk) 18:47, 21 February 2015 (UTC)
Now restarting the original debate
As the blacklist is long, and likely contains rules that are too wide a net and which are so old that they are utterly obsolete (or even, may be giving collateral damage on a regular basis), can we see whether we can set up some criteria (that can be 'bot tested'):
- Rule added > 5 years ago.
- All hits (determined on a significant number of wikis), over the last 2 years (for now: since the beginning of the log = ~1.5 years) are collateral damage - NO real hits.
- Site is not a redirect site (should not be removed, even if not abused), is not a known phishing/malware site (to protect others), or a true copyright violating site. (this is hard to bot-test, we may need s.o. to look over the list, take out the obvious ones).
We can make some mistakes on old rules if they are not abused (remove some that actually fail #3) - if they become a nuisance/problem again, we will see them again, and they can be speedily re-added .. thoughts? --Dirk Beetstra T C (en: U, T) 07:25, 22 February 2015 (UTC)
- @@Hoo man: you have worked on clean up before, some of your thoughts would be welcomed. — billinghurst sDrewth 10:53, 22 February 2015 (UTC)
- Doing this kind of clean up is rather hard to automatize. What might be working better for starters could be removing rules that didn't match anything since we started logging hits. That would presumably cut down the whole blacklist considerably. After that we could re-evaluate the rest of the blacklist, maybe following the steps outlined above. - Hoo man (talk) 13:33, 22 February 2015 (UTC)
- Not hitting anything is dangerous .. there are likely some somewhat obscure redirect sites on it which may not have been attempted to be abused (though, also those could be re-added). But we could do test-runs easily - just save a cleaned up copy of the blacklist elsewhere, and diff them against the current list, and see what would get removed.
- Man, I want this showing up in the RC-feeds, then LiWa3 could store them in the database (and follow redirects to show what people wanted to link to ..). --Dirk Beetstra T C (en: U, T) 03:30, 23 February 2015 (UTC)
- Hi!
- I created a table of hits of blocked link additions. Maybe it's of use for the discussion: User:lustiger_seth/sbl_log_stats (1,8 MB wiki table).
- I'd appreciate, if we deleted old entries. -- seth (talk) 22:12, 26 February 2015 (UTC)
- Hi, thank you for this, it gives a reasonable idea. Do you know if the rule-hits were all 'correct' (for those that do show that they were hit) or mainly/all false-positives (if they are false-positive hitting, we could based on this also decide to tighten the rule to avoid the false-positives). Rules with all-0 (can you include a 'total' score) would certainly be candidates for removal (though still determine first whether they are 'old' and/or are nono-sites before removal). I am also concerned that this is not including other wikifarms - some sites may be problematic on other wikifarms, or hitting a large number of smaller wikis (which have less control due to low admin numbers). --Dirk Beetstra T C (en: U, T) 03:36, 8 March 2015 (UTC)
- Hi!
- We probably can't get information of false positives automatically. I added a 'sum' column.
- Small wikis: If you give me a list of the relevant ones, I can create another list. -- seth (talk) 10:57, 8 March 2015 (UTC)
- Thanks for the sum-column. Regarding the false-positives, it would be nice to be able to quickly see what actually got blocked by a certain rule, I agree that that then needs a manual inspection, but the actual number of rules with zero hits on the intended stuff to be blocked is likely way bigger than what we see.
- How would you define the relevant small wikis - that is depending on the link that was spammed? Probably the best is to parse all ~750 wiki's, make a list of rules with 0 hits, and a separate list of rules with <10 hits (and including there the links that were blocked), and exclude everything above that. Then these resulting rules should be filtered by those which were added >5 years ago. That narrows down the list for now, and after a check for obvious no-no links, those could almost be blanket-removed (just excluding the ones with real hits, the obvious redirect sites and others - which needs a manual check). --Dirk Beetstra T C (en: U, T) 06:59, 9 March 2015 (UTC)
- Hi!
- At User:Lustiger_seth/sbl_log_stats/all_wikis_no_hits there's a list containing ~10k entries that never triggered the sbl during 2013-sep and 2015-feb anywhere (if my algorithm is correct).
- If you want to get all entries older than 5 years, then it should be sufficent to use only the entries in that list until (and including)
\bbudgetgardening\.co\.uk\b
. - So we could delete ~5766 entries. What do think? Shall we give it a try? -- seth (talk) 17:06, 18 April 2015 (UTC)
- The question is, how many of those are still existing redirect sites etc. Checking 5800 is quite a job. On the other hand, with LiWa3/COIBot detecting - it is quite easy to re-add them. --Dirk Beetstra T C (en: U, T) 19:28, 21 April 2015 (UTC)
- Hi, thank you for this, it gives a reasonable idea. Do you know if the rule-hits were all 'correct' (for those that do show that they were hit) or mainly/all false-positives (if they are false-positive hitting, we could based on this also decide to tighten the rule to avoid the false-positives). Rules with all-0 (can you include a 'total' score) would certainly be candidates for removal (though still determine first whether they are 'old' and/or are nono-sites before removal). I am also concerned that this is not including other wikifarms - some sites may be problematic on other wikifarms, or hitting a large number of smaller wikis (which have less control due to low admin numbers). --Dirk Beetstra T C (en: U, T) 03:36, 8 March 2015 (UTC)
COIBot Report Saving update
After another scheme update in the api, the bot was starting to throw a double error in saving, invalidating/blanking reports and not being able to fill the XWiki table. Upon finding the mistakingly closed reports, I noticed that that also happened back in 2010. I have therefore reverted a good handful of XWiki reports to the pre-error state (effectively reopening them) and have the bot reparse them (most will automatically close). In repairing the problem I did also note another 'error' - the bot would close reports for redirect sites if autoclose criteria were met, while some of those should simply be blacklisted anyway (real redirect sites), or LiWa3 should be told on-IRC to ignore the redirect detection for these domains ('link nr add ..' for the 'official domains' which nonetheless are redirected like 'facebook.de' to 'facebook.com') - in other words, they need always human interaction). I think I have solved these problems now.
There is now a category Category:Problematic XWiki reports, where the {{LinkStatus}} is 'unset'. For the reports in there, I look at the history and revert to the last 'open' state (having the bot close it; reverting reinstates some old data like users who added the link, that the bot will retain but which otherwise will be lost, and it adds blacklist rules and advertising IDs for the old reports), or if there is just one revid, I manually consider to open or immediately close it.
Please consider to have COIBot clear out its XWiki backlog, it will probably autoclose most of the ~95 open records (though there is no real harm in closing them manually). --Dirk Beetstra T C (en: U, T) 19:44, 21 April 2015 (UTC)
- The bot is doing its job correctly now I think, so feel free to close the reports, especially the ones where COIBot is the last editor (it does leave more open now than usual). --Dirk Beetstra T C (en: U, T) 06:28, 24 April 2015 (UTC)
Blacklist on topic
Hi Not sure about all the rules across this ___domain however we would like to link some external videos https://www.youtube.com/watch?v=sWo-hKTQ1wE on a specific page to show hydrographics techniques and product that can be found in Australia. It seems similar company around the work have done the same with no issues and some how we are not able to do. Can we please discuss this request with an admin so we can submit the information and links
- You should seek local whitelisting at English Wikipedia w:Mediawiki talk:Spam-whitelist for these urls to resolve your issue. The filter is to stop the gross and continual abuse that occurs, not to prevent the more limited educational material. — billinghurst sDrewth 00:27, 12 April 2015 (UTC)