Wikipedia:Reference desk/Science: Difference between revisions

Content deleted Content added
Daycd (talk | contribs)
Line 1,640:
 
:I believe most scanners comes with their own OCR (Optical Character Recognition) software, so you could use that for free. With newspaper clippings, you will need to be careful in adjusting the threshold to avoid "bleed thru" from the other side of the page, which will throw off the OCR software. The indexing is a more difficult issue. Just looking for keywords doesn't work very well. If you've used a Google search you see how many unrelated things are found in any search. For eggsample, if you were trying to list "egg dishes" as a category, you would also get things which are not egg dishes, but only contain small amounts of egg as part of the recipe. I think you would do better to manually index and categorize them, say by dragging them into folders for each type of recipe, and then adding indexes for other ways to organize recips, like low-fat, low-carbs, low-protein (for those cult leaders interested in brainwashing their followers), etc. [[User:StuRat|StuRat]] 18:37, 21 November 2005 (UTC)
::[http://jocr.sourceforge.net/ GOCR] is an open-source [[optical character recognition|OCR]] program. –[[User:Mysid|Mysid]] 08:01, 22 November 2005 (UTC)
 
=November 22=