Wikipedia:Bots/Requests for approval/ImageResizeBot

This is an old revision of this page, as edited by Nixeagle (talk | contribs) at 00:13, 18 March 2008 (ImageResizeBot: clearify). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This is a proposal for an automatic process run by Eagle 101 to resize oversized non-free images according to WP:NFC#3b

The concept is very simple. We have ~8,000 - 9,000+ (using higher thresholds, we get 5000+ to cleanup) oversized non-free images. By this I mean the images are obviously too large, and we can make these images smaller by resizing them down to the size of the thumbnail on the image page. For example, lets consider Image:Logchart.jpg. Its 7,808 × 4,448 pixels, very large for an unfree image. As such, this should be resized to a smaller number. I've arbitrarally chosen the size of the thumbnail itself to be the new size, that would be 800 × 456 pixels. Quite a difference in size eh :).

The bot will do all the work of resizing the image to the correct size (a ratio from the thumbnail size), and uploading this new revised image to the encyclopedia. The only tag that I can forsee the bot needing to apply is directly to the image page itself, so that an administrator can come along and delete the old oversized revisions. Thistag is {{furd}}, fair use reduced. If we desire, we can have the bot notify the uploaders, but in this case I feel this is overkill as there will be no noticable differences to the image inthe article.

The bot will resize all images larger then 360,000 500,000 square pixels, this can be changed and adjusted if need be. 360,000 500,000 pixels is equivalent to a 700x700 square image.

Again, the bot should have no effect on how existing images render in the articles themselves, just the full sized image will be reduced. As such I ask that this bot be allowed to operate at about 3 uploads per minute until the backlog is cleared. This will require about 40-50 hours of operation.

I do request that as soon as we agree this task is desired, even if we don't agree to all the parameters, that I be allowed to test the bot on a few fake images. :) In addition, I'll have the perl source released as soon as I finish basic testing (making sure we can upload etc). —— Eagle101Need help? 18:57, 17 March 2008 (UTC)[reply]

Discussion

  • Sounds like a great idea. What program / software will be used to do the reductions? --MZMcBride (talk) 19:23, 17 March 2008 (UTC)[reply]
  • I'd say that you should not apply {{furd}}. First, a human should compare the images to make sure that no technical bug caused a significant change to the image as displayed. Second, taking your example image above, the original uploaded already made an editorial decision to use only a fragment of the true original for the image. That editorial decision is a creative act, so it doesn't seem appropriate to eliminate that authorship history. Is there already a better template that indicates that human evaluation is required? If not, I suggest creating one and using it instead. I think the main thrust of the bot task is reasonable. GRBerry 19:29, 17 March 2008 (UTC)[reply]
    • OK, thats reasonable, I would hope the admin would confirm with the furd, but I can see your point. If someone wants to make an alternate template, please do so, as I suck at templates. XD As far as the reduction goes, that image is clearly too large. We can easily reduce the image size to no effect at all on the article proper. Unless I'm missing something of course. :) Edit: I should be a little clearer, the program's reduction of the image size in this case will not affect the article at all, see Logarithm#See_also, look to the right. —— Eagle101Need help? 19:33, 17 March 2008 (UTC)[reply]
      • Ok, replying to self, I think I picked up why removing the original is bad. We lose the authorship history. A good way to prevent this would be to perhaps have the bot put those details in the description. Is this a possible answer to this concern? Sorry that it took me a second to realize what you were getting at ;). —— Eagle101Need help? 19:41, 17 March 2008 (UTC)[reply]
        • That is indeed the thrust of my second reason, and that solution would satisfy the need to preserve authorship data. We still need a human to compare the two images prior to deletion; just to make sure the bot didn't change the image. Imagine a technical bug that processed a run one file off; so every small image got uploaded on the page for the prior/later image in the run. No article pages would have been updated - but all the images would be wrong. Or a perl bug that fed the wrong parameters to imagemagik and the images came out rotated, or ... Those are the class of bugs I'm still worried about having a human check for prior to deletion of the original. GRBerry 19:50, 17 March 2008 (UTC)[reply]
          • Ok, sounds good, we need a new template. Someone wants to offer their skills? It would be greatly appreciated. As far as that bug you mention, its not possible due to the way things are processed :). However before this does anything serious, as in 1,000's of changes, the source code will be made public under the GPL. —— Eagle101Need help? 19:54, 17 March 2008 (UTC)[reply]
  • I don't see any reason why we'd need a huge image, but I would suggest creating a template that you make known to existing users to include on an image page as part of the rationale of why they need an image that large for non-free works. (Technically, this would help us further machine-readible-ize our NFCs, by saying that any image over a certain number of pixels will have the potential to be reduced unless this template, which should include a more descriptive reason for the large size, is included). Then the bot should ignore such images, though I'd recommend that any image that is tagged as such should be looked at closely by an admin and evaluated if the rationale is sane. --MASEM 19:39, 17 March 2008 (UTC)[reply]
    • Yes, that is a possibility. However I'm hoping to stick to images that are large enough that there really is no reason for a larger image. Remember fair use images only need to be good enough for the article, we don't have to have a higher resolution available. As far as I understand, its better that we don't. If you can elaborate, please do. I also welcome someone actually creating this template and linking that template :). —— Eagle101Need help? 19:43, 17 March 2008 (UTC)[reply]
      • There is a Low_resolution parameter in many existing FUR templates that allow an author to input their reasoning. I am not sure creating another template is the solution here (simply because it would have to be put into common use, which would take a while). I am not sure 600 x 600 is large enough to be definitely too big, myself. - AWeenieMan (talk) 19:50, 17 March 2008 (UTC)[reply]
  • (after many edit conflicts) I am not convinced doing this without an editor request to shrink the image beforehand is a good idea. What about a large non-free image with a very specific rationale that explains why that level of detail is necessary (I could imagine there are pictures over 600 x 600 that includes such a rationale)? Are you prepared to dissect that information in the multitude of ways it could be presented (templates, non templated, etc). I would be more comfortable with you creating a template (or using a variation of {{Non-free reduce}}) that would basically have a human say, this needs to be reduced (maybe even input a size) and have a bot just do the dirty work (you could also add the {{furd}} when you are done, because an admin will have to visit the page to delete the image anyway). Also, what size are you planning on uploading the images at exactly (are you always using the thumbnail size)? For example, I would reckon that your could scale down any album cover to say 400px by 400px (probably even smaller). But if I am reading your proposal correctly, they would most likely end up at varying sizes. Also, is there a reason you have chosen not to use the standard BRFA template? I think the bot task has a lot of merit, I just think some of the details should be hammered out a little more before letting it loose.
    • I did not use the BRFA template, because I felt that writing it as prose was better. Wikipedia is not a bureaucracy. If someone insists on the template, they may go ahead and add it. I figured there would be loose ends, hence the reason why I've cross posted this to 3 different locations :). As far as the size issues, the idea is to nab those that are way too large. If you can show me one example of an image that is greater then 360,000 pixels in size, that should not be reduced, we can consider upping the threshold. If you would like, I can place a list of all affected images to a subpage of the bot's userspace if this would assist you and others in evaluating the task at hand. —— Eagle101Need help? 19:52, 17 March 2008 (UTC)[reply]
      • I only asked about the BRFA template out of curiosity. I don't really care, myself (as I have read your prose and gathered most of the answers for myself). The template simply makes it easier to grab some of the facts quickly. I would be interested to see a list of affected images, but I am not going to lie to you, I have no intention on going through them all. Out of curiosity, how many are there in total? My only concern is that some editor may have felt the need to have a larger image (even explained why), and they may be upset to find a bot reduced it without discussion. - AWeenieMan (talk) 20:06, 17 March 2008 (UTC)[reply]
      • (ec) Musing on the image size issue. I'm uncertain how the Mediawiki software handles image sizing for display on monitors. But I know that available monitor resolutions are growing over time. Will there come a point where monitor resolutions will have grown to a size that a 600x600 image is a standard size thumbnail? If so, we'll want large images again then. I know current top of the line digital monitors display by default in a resolution that is a multiple of the best I can achieve on my home CRT monitor, and it gets a multiple of the monitor resolutions available when I started using computers (an Apple II). On the other hand, I don't know of a Moore's Law for monitor resolutions, but our article says it applies to digital camera resolutions. GRBerry 20:12, 17 March 2008 (UTC)[reply]
        • Well... you have to remember we are dealing with non-free images here. By that I mean, the goal is not super high resolution here, just what is currently needed. :S Discuss :) There is a reason we are asked to keep non-free images small, we really don't need to display much larger then the size required by the article currently. P.S. I can explain to you how mediawiki handles it, but I don't want to go too far off topic, to put it short, it stores thumbnails that are displayed on the article, and stores a full sized version if someone clicks the image. For non-free images we only need the thumbnail. These are usually kept fairly small due to bandwidth usage. You really don't want to load 10 1MB images to read an article :) —— Eagle101Need help? 20:23, 17 March 2008 (UTC)[reply]
  • Question: What do you think about an image such as Image:Cannibalised.jpg? As it stands, it is just above your mark (600 × 601) (I agree, it's much too big in its current state), but shrinking it to thumbnail size makes it just below your mark (599 × 600), which seems like a minor difference. And then there are images like Image:Aroundtheworld.jpg where the thumbnail seems to be the full size of the image. These are really just test cases to me, as I am just wondering how you plan on handling them (not an argument to ignore them in any way). - AWeenieMan (talk) 21:31, 17 March 2008 (UTC)[reply]
    • There is probably some marginal difference in size where the costs exceed the minimal benefits. That particular example would shrink 1 pixel in each dimension, and that is a 0.33% change in total pixel count. If the images are to be reviewed for deletion of older revisions, that marginal difference that is appropriate is higher than otherwise. Does the software to be used indicate any level of noise/deterioration that it might produce by resizing? I'll be shocked if it does (certainly not in the marketing materials) but if so that could inform a benchmark. What are the other costs - data storage of additional versions (deleted versions presumably remaining in the deleted history), download/upload bandwith, bot edits in history, possible human review... Being over the 360K limit by 25% would when shrunk to 360K pixels shrink each dimension 10.56%. Being over by 10% would when shrunk to 360K pixels shrink each dimension 4.65%. (1 - 1/sqrt(1+%over360K)) So 10% to 25% pixel count margins seem plausible to me - and the bot could flag these as "current version too large, please shrink to X by Y when the image is edited for another reason". GRBerry 22:14, 17 March 2008 (UTC)[reply]
      • (ec) Well seeing those, I'll probably increase the minimal size to 400,000 square pixels. Basically that puts a bit of leeyway, and gets rid of the problems you mention. Now, the idea from here would be to do the resizing of the obvious cases, get those down to at least thumbnail size. As far as the costs, this bot is actually operating from the wikimedia toolservers, so all the queries are direct database queries, as far as getting that list. The actual resize requires that the bot download the image (to memory, not to hard disk), change the size, and upload the new image size. This is all done inside of RAM. The point at this stage is really to get the obvious violators, not split hairs, I did not realize that the smaller ones would result in a 1 pixel change. —— Eagle101Need help? 23:20, 17 March 2008 (UTC)[reply]
        • Well, I fear that you might eliminate a lot of the issues, but you will still have borderline cases. For example, Image:ChoosecologotypeTM.jpg will be resized to 410k (and Image:ShereKhanJBSM.jpg will be resized minimally). This is all a factor of the size of the thumbnail being as large as possible to fit inside an 800x600 box (this is configurable in preferences, too, so there are multiple thumbnail sizes on the server, it would seem). I see two potential ways to handle this. One would be to only work with images above size X (eg 400k) but make sure to size them below size Y (eg 360k). The other would be to just define the size you are going to make them (i.e. define a maximum dimension). Am I missing the reason that you are tying the new size to the thumbnail? - AWeenieMan (talk) 23:53, 17 March 2008 (UTC)[reply]
          • Not a very large reason, no but it does give us a base line for what can reasonably be considered a thumbnail. I do understand there are going to be a few edge cases. Not everything will be able to be machine resized, that was and never will be the goal. However both images can use some resizing, especially the second one you mentioned. I'm starting to think a broad first pass on the larger cases will do us best. Perhaps we should start with images with areas larger then 500,000 pixels. At least get those down to an acceptable size, then work on a category by category bases getting things down to an acceptable size. All this will have to be debated as to what the max size is, and there will be exceptions to any rule we put up, these will have to checked by admins before they delete the old revision containing the larger image. Its really easy to undo a resize, I think non-admins can undo the bot as well. We just need to come up with a way that is acceptable to everyone as to how to mark things as an exception to the general rules. Using 500,000 as the minimum size, we still have 5,000+ images to resize. —— Eagle101Need help? 00:07, 18 March 2008 (UTC)[reply]
  • From User:David Shankbone, I understand that deleting the oversize version doesn't actually save any space on the servers, since the oversize version is still kept around (like a deleted article). If this theory is correct, has it been factored into the calculation of benefits? EdJohnston (talk) 23:16, 17 March 2008 (UTC)[reply]

Logos

I've recently discovered that when the Cat:Logos was converted to Cat:Non-Free Logos, the appropriate counter-cat Cat:Free Logos, for simple geometric shapes and words that can't be copyrighted, was not created. One estimate is that about 10% of the 70,000 logo images are actually free. Is there some what you could exclude this cat at first while I try and figure out how to sort and re-tag the logos? Could you give me an intersection of Large Images and Non-Free Logos to see how big an issue this is? MBisanz talk 23:18, 17 March 2008 (UTC)[reply]

About 800 logos will be affected, you can see the full list at User:ImageResizeBot/List2. If needed we can delay actions on this category if what you mention is a real concern. Check the list and see if anything in there should not be resized because the image is possibly free. I appreciate anything you can do to assist. —— Eagle101Need help? 23:31, 17 March 2008 (UTC)[reply]
Bah! Image:Five.png. :S How long do ya think it will take to sort this category out? At least the 800 items I've listed? I'm willing to delay actions on this category, provided that folks fix the issues with the category so we can do it at a later time. Cheers! —— Eagle101Need help? 23:40, 17 March 2008 (UTC)[reply]
Went through the list. These 10 are the only I'd say could be free and even then shrinking wouldn't destroy them. [1],

[2], [3], [4], [5], [6], [7], [8], [9], [10] Given that this would be a rate of error of .01%, I'm not concerned.

How would the bot handle images with 2 license tags. Like a Non-free historic image and an OTRS permission? MBisanz talk 23:57, 17 March 2008 (UTC)[reply]
How should it handle it? —— Eagle101Need help? 00:08, 18 March 2008 (UTC)[reply]
P.S. Lest someone else get fooled by your extremely small percentage, the correct error rate here is 1.25%. (10/800). Is that acceptable to folks? Remember this bot will be fairly easy to undo. :) —— Eagle101Need help? 00:10, 18 March 2008 (UTC)[reply]