Wikipedia:Database queries: Difference between revisions

Content deleted Content added
log
m // Edit via WikiMirror
 
(134 intermediate revisions by 76 users not shown)
Line 1:
{{nutshell
[[fr:Wikipédia:Requêtes SQL]] [[nl:Wikipedia:SQL opdrachten]] [[de:Wikipedia:Datenbank-Abfragen]]
|shortcut1=WP:DQ
As an [[Wikipedia:Administrators|Administrator]], you can directly query the [[MySQL]] database. <font color="red">'''Please be careful with this; our database is very large, and overly complex queries can seriously bog down the server.'''</font>
|shortcut2=WP:SQL
|Queries can be run against the Wikipedia database in a variety of ways.}}
 
== Overview ==
Queries are entered on the [[special:Asksql]] page.
 
Much of the data that makes up the Wikipedia encyclopedia is stored in a [[SQL database]]. It can sometimes be useful to run queries against this database to extract information that is otherwise hard to find. For example:
* When copying a query from this page, some browsers copy a '*' in front of the SELECT statement. This has to be removed!
* Most of these queries return article titles. These do ''not'' contain the namespace!
**The namespaces have numeric codes in the separate cur_namespace field (0 is the regular article namespace):
**#Talk
**#User
**#User talk
**#Wikipedia
**#Wikipedia talk
**#Image
**#Image talk
 
* Articles with ''H.M.S.'' in their title that have not been edited for 12 months.
* [[WP:Redirect|Redirects]] with fewer than 20 incoming links that redirect to [[WP:Category|categories]]
* All [[WP:Red link|red links]] on pages within the scope of a particular [[WP:WikiProject|WikiProject]]
 
Many simple queries can be run directly using the tools built into the [[MediaWiki]] package used by Wikipedia. Some of the most popular and useful queries are run regularly and can be found at [[Wikipedia:Database reports]]. If neither of these suits your query, you can [[Wikipedia:Request a query|request]] that someone run a query for you, or download your own copy of the database to work on.
To avoid time-consuming or malformed queries, here are some useful suggestions (''please add more'' - some requests might be found on this page's talk page):
 
== Using the MediaWiki tools ==
=== Modifiers ===
Some queries can be satisfied through the MediaWiki interface:
To limit results:
: "SELECT ... LIMIT 20".
You can also add an offset;
: "SELECT ... LIMIT 100, 20" will give you 20 records starting at the 101st.
 
* [http://en.wikipedia.org/w/index.php?title=Special:Search&search=&fulltext=Search&profile=advanced&redirs=1 Search in advanced mode]
=== Queries to find new users ===
* [[Special:PrefixIndex|All pages with prefix]]
* [[Special:CategoryTree|Category tree]]
* [[mw:API|MediaWiki API]] (see [[Special:ApiSandbox]])
 
== Queries that are already run regularly ==
Here's a couple of queries to find the 20 most newly created users - useful
Useful queries that are regularly run can be found at:
to find people to give welcome messages. It only counts users that have made edits (most users
who create a login never make a single contribution and are just passing by,
so it's not really worth welcoming them).
 
* [[Special:SpecialPages]]
*<nowiki>SELECT user_name, COUNT(*) FROM user, cur WHERE user_id=cur_user
* [[Wikipedia:Database reports]]
GROUP BY user_id ORDER BY user_id DESC LIMIT 20</nowiki>
* [[Wikipedia:Statistics]]
 
And many other places.
The above query gives the user names of the 20 most recently created users
and the number of edits they have made. The query is not perfect - only
edits that have not been 'overwritten' by other edits are counted here. To
do the same thing looking only at 'overwritten' edits use:
 
== Run a query ==
*<nowiki>SELECT user_name, COUNT(*) FROM user, old WHERE user_id=old_user
All Wikipedians have the ability to run queries on a copy of the live database on the [[WP:Wikimedia Cloud Services|Wikimedia Cloud VPS]] called [//quarry.wmflabs.org/ Quarry].
GROUP BY user_id ORDER BY user_id DESC LIMIT 20</nowiki>
 
=== QueriesRequest toa findquery stub articles ===
* [[Wikipedia:Request a query]]
 
== See also ==
For short pages containing "see" (takes about 2 seconds):
*[[wikitech:Nova Resource:Tools/Tools/Query service]]
*<nowiki>SELECT cur_title FROM cur WHERE cur_is_redirect=0 AND LENGTH(cur_text)<50 AND cur_text LIKE '%see%'</nowiki>
finds all pages with less than 50 chars.
 
=== Queries to find inappropriate links ===
 
For "double redirects", try this (takes about 4 seconds):
*<nowiki>SELECT l_from,l_to,cb.cur_text AS rt,cb.cur_title AS ti FROM links,cur AS ca, cur AS cb WHERE ca.cur_is_redirect=1 AND cb.cur_is_redirect=1 AND l_to=cb.cur_id AND l_from=ca.cur_title LIMIT 250</nowiki>
where l_from is the first, ti is the second redirect; rt shows the content of ti.
 
List all articles that link to a file on http://meta.wikipedia.com (takes 2-3 seconds):
*<nowiki>SELECT cur_title FROM cur WHERE cur_text LIKE "%http://meta.wikipedia.com/upload/%" AND cur_namespace=0</nowiki>
 
To find articles that link to jpegs, both external and internal, with the old non-<nowiki>[[Image:]]</nowiki> style (this isn't perfect, it also finds a few other pages):
*<nowiki>SELECT cur_title FROM cur WHERE cur_text LIKE "%http://%.jpg%" AND cur_namespace=0</nowiki>
 
For articles containing a link to itself (!!24 seconds!!)
*<nowiki>SELECT cur_title FROM cur,links WHERE cur_is_redirect=0 AND cur_namespace=0 AND l_from=cur_title AND l_to=cur_id LIMIT 20</nowiki>
 
For redirects to non-existing pages (note: when a page is found that is redirecting correctly, it usually means that there is text going with the redirect, 2-4 seconds):
*<nowiki>SELECT cur_title FROM cur,brokenlinks WHERE bl_from=cur_id AND cur_is_redirect=1</nowiki>
 
=== Query to find articles without bold markings ===
Which means they don't have bold titles in the first sentence!
:<nowiki>SELECT cur_title FROM cur WHERE cur_namespace=0 AND cur_is_redirect=0 AND cur_text NOT LIKE "%'''%" AND cur_text NOT LIKE "%<b>%" LIMIT 50</nowiki>
 
=== Query to find articles without bold markings, that are not disambiguation pages ===
Same as above only removes disambiguation pages which don't need bolds.<nowiki>
SELECT cur_title FROM cur WHERE cur_namespace=0 AND cur_is_redirect=0 AND cur_text NOT LIKE "%'''%" AND cur_text NOT LIKE "%<b>%" AND cur_text NOT LIKE "%disambiguation%" LIMIT 50</nowiki>
 
=== Query to find sub-page articles ===
 
This finds all articles containing a "/" character; most of these are subpages.
 
* <nowiki>SELECT cur_title FROM cur WHERE cur_namespace=0 and cur_title like "%/%" and cur_is_redirect=0</nowiki>
 
=== Queries to give statistics ===
 
For a count of how many articles are more than 1500 characters long, use:
*<nowiki>SELECT COUNT(*) FROM cur WHERE LENGTH(cur_text)>1500 AND cur_namespace=0</nowiki>
 
=== Queries about blocked users ===
 
* <nowiki>SELECT DISTINCT ipb_address, COUNT(*) AS times, MIN(ipb_reason) AS reason, user_name AS blocker, DATE_FORMAT(MIN(ipb_timestamp), '%b %e %Y %k:%i') AS date FROM ipblocks, user WHERE user_id=ipb_by GROUP BY ipb_address ORDER BY ipb_timestamp</nowiki>
 
returns the blocked IP addresses, and the first entry for the blocking (some IP addresses have been blocked by multiple people), ordered by date.
 
This is mostly unnecessary (but may be of interest as an example query), as the [[Special:Ipblocklist]] returns similar information.
 
=== it's... ===
Find all pages with "it's" for spellchecking purposes. WARNING: takes a long time. May want to apply a limit (see above)
 
 
* <nowiki>SELECT cur_title from cur where cur_namespace=0 and cur_text regexp "[[:<:]]it's[[:>:]]" order by cur_title</nowiki>
 
=== Vandalism ===
When a certain IP-number or user has been found to be vandalising, it might be that some of their problems may already have been solved by someone else and others not. The following query shows those pages which the vandal (or otherwise problematic user) was the '''last''' to edit:
 
* <nowiki>SELECT cur_title FROM cur WHERE cur_user_text="name" LIMIT 100</nowiki>
 
If the vandal is a registered user, use their Username for name, if not, use their IP-number.
 
=== Query to find sysops/developers ===
 
* <nowiki>SELECT CONCAT("*[[User:",user_name,"|",user_name,"]]") from user where user_rights != "" ORDER BY user_name LIMIT 100</nowiki>
 
----
 
A detailed description of the database schema and the meaning of various fields is available [http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/%2Acheckout%2A/wikipedia/phpwiki/newcodebase/docs/schema.doc?rev=HEAD&content-type=text/plain here].
 
==Log==
 
http://www.wikipedia.org/upload/sqllog_mFhyRe6