Revision as of 07:55, 22 November 2005 edit Daycd (talk \| contribs) 7,074 edits →Need help about pig's behaviour: delete duplication ← Previous edit		Revision as of 08:01, 22 November 2005 edit undo Mysid (talk \| contribs) Extended confirmed users 13,497 edits →Scanning and filing recipes: Re Next edit →
Line 1,640: :I believe most scanners comes with their own OCR (Optical Character Recognition) software, so you could use that for free. With newspaper clippings, you will need to be careful in adjusting the threshold to avoid "bleed thru" from the other side of the page, which will throw off the OCR software. The indexing is a more difficult issue. Just looking for keywords doesn't work very well. If you've used a Google search you see how many unrelated things are found in any search. For eggsample, if you were trying to list "egg dishes" as a category, you would also get things which are not egg dishes, but only contain small amounts of egg as part of the recipe. I think you would do better to manually index and categorize them, say by dragging them into folders for each type of recipe, and then adding indexes for other ways to organize recips, like low-fat, low-carbs, low-protein (for those cult leaders interested in brainwashing their followers), etc. [[User:StuRat\|StuRat]] 18:37, 21 November 2005 (UTC) ::[http://jocr.sourceforge.net/ GOCR] is an open-source [[optical character recognition\|OCR]] program. –[[User:Mysid\|Mysid]] 08:01, 22 November 2005 (UTC) =November 22=

Wikipedia:Reference desk/Science: Difference between revisions