User talk:Dr pda/generatestats.js

This is an old revision of this page, as edited by Dr pda (talk | contribs) at 01:42, 24 November 2007 (creating documentation). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

The purpose of this script is to generate some statistics about articles which transclude a given template, namely a list of the ten longest and ten shortest articles, the mean and median length, and a histogram of the article lengths. The original motivation was to find out what were the longest and shortest {{featured articles}}, but could also be used for your favourite stub or infobox.


Installation

Add {{subst:js|User:Dr_pda/generatestats.js}} to your monobook.js, and save it.

After saving, you have to bypass your browser's cache to see the changes. Mozilla/Safari: hold down Shift while clicking Reload (or press Ctrl-Shift-R), Internet Explorer: press Ctrl-F5, Opera/Konqueror: press F5.

Usage

Once you have installed the script, go to http://en.wikipedia.org/w/index.php?title=User:Dr_pda/generatestats&action=edit. A dialog box will pop up, asking you to enter the name of the template, without the word "Template:", i.e. featured article, instead of Template:featured article. The script will then retrieve the necessary information, 500 pages at a time, showing the progress within the edit window on that page. You can stop this at any time by navigating away from the page (e.g. clicking the back button in your browser). Once it is done the script will copy the output into the edit window and preview the page. If you desire you can then copy the wiki-text and save it somewhere else.

Example output

Ten longest articles

  1. Intelligent design (163 kB)
  2. 2005 Texas Longhorn football team (146 kB)
  3. Byzantine Empire (129 kB)
  4. Che Guevara (125 kB)
  5. Campaign history of the Roman military (125 kB)
  6. Bob Dylan (125 kB)
  7. Belgium (124 kB)
  8. Sound film (118 kB)
  9. AIDS (117 kB)
  10. Ronald Reagan (116 kB)

Ten shortest articles

  1. John Day (printer) (8 kB)
  2. Hurricane Irene (2005) (9 kB)
  3. Bam Thwok (11 kB)
  4. Pilot (House) (11 kB)
  5. Warren County Canal (12 kB)
  6. "She Shoulda Said 'No'!" (12 kB)
  7. Common scold (12 kB)
  8. ROT13 (12 kB)
  9. 2000 Sri Lanka cyclone (13 kB)
  10. Cincinnati, Lebanon and Northern Railway (14 kB)

Statistics

  • Number of articles: 1704
  • Mean: 46.772 kB
  • Median: 43.444 kB

Chart

Notes

  • The size of the article is that of the wiki text, i.e. what appears in the edit window. It is NOT the readable proze size. To calculate that you will need to use some other method, such as this script. (If I have time I might try to interface these two scripts, however the prose size script requires loading the HTML version of each page, which is resource intensive.)
  • This script only counts pages which are in the article namespace, so it won't work for talk page templates (e.g. wikiproject banners).
  • The script chooses bin sizes on the horizontal axis such that there are approximately 15 bins, but they use a sensible scale (1,2,5,10,20,50 etc). Due to the limitations of the code used to generate the chart, the labels are in the middle of each bin, rather than the left hand edge. Thus in the example above, the first bin contains articles between 0 and 20 kB, the second bin between 20 and 40 kB, and so on. Note that the upper edge of the last bin is not marked; here it contains articles between 160 and 180 kB.
  • You can see the numbers for the histogram by looking in the edit window.
  • Sometimes the chart doesn't show up in the preview. I'm not sure why; sometimes adding/removing a blank line, or inserting an error then correcting it, made it show up. Maybe it just needs to be saved.