User:AnomieBOT/source/tasks/TaskRedirectChecker.pm: Difference between revisions

Content deleted Content added
AnomieBOT (talk | contribs)
Updating published sources: TaskRedirectChecker: * New task, no BRFA required as it only affects the bot's own userspace.
 
AnomieBOT (talk | contribs)
Updating published sources: General: * Update for the addition of 'rvslots'. DatedCategoryDeleterTest: * Disable. It's clear that task won't be needed. BrokenRedirectDeleter: * Handle pages with newlines before the <code>#REDIRECT</code>.
 
(11 intermediate revisions by 3 users not shown)
Line 1:
{{ombox|type=notice|text= Per [[WP:BOT#Approval]], any bot or automated editing process that only affects only the operators' user and talk pages (or subpages thereof), and which are not otherwise disruptive, may be run without prior approval.}}
<sourcesyntaxhighlight lang="perl">
package tasks::TaskRedirectChecker;
 
Line 11:
BRFA: N/A
Status: Begun 2010-06-16
Rate: Once per day, as needed
Created: 2010-06-16
 
Check the permanent redirects under [[Special:PrefixIndex/User:AnomieBOT/req/|User:AnomieBOT/req/]] to validate the anchor still exists in the target page. If the anchor can be found in an archive subpage, the redirect will be updated. Otherwise, the bot will ask for help on its talk page.
 
Note this doesn't handle {{tl|archiveanchor}} or the like, just TOC headers.
 
=end metadata
Line 29 ⟶ 28:
@ISA=qw/AnomieBOT::Task/;
 
use POSIX qw/strftime/;
use Data::Dumper;
 
Line 55 ⟶ 53:
sub run {
my ($self, $api)=@_;
my $res;
 
$api->task('TaskRedirectChecker', 0, 10, qw(d::Timestamp d::Redirects d::Talk));
Line 65 ⟶ 64:
my $re=$api->redirect_regex();
my $base=$api->user.'/req/';
my $iter=$api->iterator(generator=>'allpages',gapprefix=>$base,gapnamespace=>2,gapfilterredir=>'redirects',prop=>'info|revisions',rvprop=>'content',rvslots=>'main');
my @whine=();
while(my $page=$iter->next){
Line 73 ⟶ 72:
}
 
my $txt=$page->{'revisions'}[0]{'slots'}{'main'}{'*'};
next unless $txt=~/$re\[\[([^]#]+)#([^]]+)\]\]/;
my ($title,$anchor)=($1,$2);
 
# Ask MediaWiki to canonicalize the title for us, because the actual
# normalization can depend on various factors.
$res=$api->query(titles=>$title);
if($res->{'code'} ne 'success'){
$api->warn("Failed to get canonical name for $title: ".$res->{'error'}."\n");
return 60;
}
$title=$res->{'query'}{'normalized'}[0]{'to'} // $title;
 
# Add a "dummy" section for the anchor we're actually looking for,
# because the encoded anchors returned in "sections" varies based on
# server settings. Note this doesn't support {{anchor}} or the like.
my$anchor =~ s/\{/&#x7B;/g;
$res=$api->query(action=>'parse',title=>$title,text=>"__TOC__\n== XXX $anchor ==\n\n{{:$title}}",prop=>'sections');
if($res->{'code'} ne 'success'){
$api->warn("Failed to retrieve section list for $title: ".$res->{'error'});
Line 87 ⟶ 96:
my @s=map $_->{'anchor'}, @{$res->{'parse'}{'sections'}};
my $anchorenc=shift @s; $anchorenc=~s/^XXX_//; # Pull out the dummy
$anchorenc=~s/^XXX_//;
next if grep($_ eq $anchorenc, @s);
 
Line 102 ⟶ 110:
}
if(exists($res->{'query-continue'})){
$q{'rvstartidrvcontinue'}=$res->{'query-continue'}{'revisions'}{'rvstartidrvcontinue'};
} else {
delete $q{'rvstartidrvcontinue'};
}
 
Line 132 ⟶ 140:
last if defined($newtitle);
}
} while(!defined($newtitle) && exists($q{'rvstartidrvcontinue'}));
 
if(defined($newtitle)){
Line 181 ⟶ 189:
1;
 
</syntaxhighlight>
</source>