Content deleted Content added
Citation bot (talk | contribs) Alter: url. URLs might have been internationalized/anonymized. Add: ___location, author pars. 1-1. Removed parameters. Some additions/deletions were actually parameter name changes. | You can use this bot yourself. Report bugs here. | Suggested by AManWithNoPlan | All pages linked from cached copy of User:AManWithNoPlan/sandbox2 | via #UCB_webform_linked 2450/11719 |
m Task 18 (cosmetic): eval 6 templates: del empty params (8×); hyphenate params (1×); |
||
Line 5:
==Manual data entry==
This method of [[data processing]] involves human operators keying in data found on the form. The manual process of data entry has many disadvantages in speed, accuracy and cost.<ref>{{cite web | url = https://www.formhero.com/paperwork | title = Paperwork: The Ultimate Guide | publisher = FormHero}}</ref> Based on average professional [[Data entry clerk|typist]] speeds of 50 to 80 wpm,<ref>{{Citation|author=Teresia R. Ostrach|year=1997|title=Typing Speed: How Fast is Average
==Automated forms processing==
Line 19:
The first step in understanding automated forms processing is to analyze the type of form from which the extraction of data is desired. Forms can be classified as one of two high level categories for the purpose of extracting data. Four categories have been proposed<ref>{{Cite book|url=https://books.google.com/books?id=44arCAAAQBAJ&q=example+of+a+fixed+form+for+extraction&pg=PA425|title=Pattern Recognition and Machine Intelligence: 4th International Conference, PReMI 2011, Moscow, Russia, June 27 - July 1, 2011, Proceedings|last1=Kuznetsov|first1=Sergei O.|last2=Mandal|first2=Deba P.|last3=Kundu|first3=Malay K.|last4=Pal|first4=Sankar Kumar|date=2011-06-25|publisher=Springer|isbn=9783642217869|language=en}}</ref> however the document capture industry has settled up these two:
# Fixed forms. This type of form is defined as one in which the data to be extracted is always found in the same absolute position on a page. This allows a type of lens grid to be applied to the document and every subsequent occurrence of this document in order to extract the data. An example of a fixed form is a typical credit application form.<ref>{{Cite web|url=http://www.bfma.org/resource/resmgr/articles/05_04.pdf|title=CAPTURING SEMI-STRUCTURED FORMS AND DOCUMENTS: CHALLENGES AND AVAILABLE TECHNOLOGIES|last=Vassylyev|first=Artur|date=10 June 2008|archive-url=https://web.archive.org/web/20170428144034/http://www.bfma.org/resource/resmgr/articles/05_04.pdf|archive-date=2017-04-28|url-status=dead|access-date=4 April 2017}}</ref>
# Semi-structured (or unstructured) form. This form is one in which the ___location of the data and fields holding the data vary from document to document. This type of document is perhaps most easily defined by the fact that it is not a fixed form. In the document capture industry, a semi-structured form is also called an unstructured form. Examples of these types of forms include letters, contracts, and invoices. According to a study by AIIM, about 80% of the documents in an organization fall under the semi-structured definition.<ref>{{Cite web|url=https://www.aiim.org/pdfdocuments/MIWP_Forms-Processing_2012.pdf|title=Forms Processing- user experiences of text and handwriting recognition (OCR/ICR)
Although the components (described below) used for the extraction of data from either type of form is the same the way in which these are applied varies considerably based upon the type of document.
|