Forms processing: Difference between revisions

Content deleted Content added
NeilN (talk | contribs)
rm per WP:EL
Changing short description from "Process of converting data from written forms into electronic form" to "Process of converting data from written forms into electronic format"
 
(81 intermediate revisions by 49 users not shown)
Line 1:
{{Short description|Process of converting data from written forms into electronic format}}
{{Article issues|orphan =August 2009|wikify =August 2009}}
'''Forms processing''' is a process by which one can capture information entered into data fields and convert it into an electronic format. This can be done manually or automatically, but the general process is that [[hard copy]] data is filled out by humans and then "captured" from their respective fields and entered into a database or other electronic format.
 
==Overview==
Defining '''Forms Processing''' in simple terms, it is a process by which one can capture information entered into data files and convert it into an electronic form.
In the broadest sense, forms processing systems can range from athe processing of small application formforms to a large scale survey formforms with multiple pages. There are several common issues that are involved in forms processing when done manually. These are a lot of tedious human efforts put in, the data typedkeyed in by the user couldmay result into typoin errortypos, and lots of workmany hours getof consumedlabor inresult from this lengthy process. If the forms are processingprocessed using [[computer software]] driven applicationapplications these common issues couldcan be resolved and minimized to great extendsextent. Most methods for forms processing address the following areas :<br>.
In this process:
*Entered data is “captured” from their respective fields;
*Forms themselves are digitized and saved as images.
 
==Manual data entry==
What it means is that the [[hard copy]] of the data can be scanned in as an image using a scanner. This image is then recognized based on a pre defined configuration. The data is captured from particular zones and stored in an electronic format.
This method of [[data processing]] involves human operators keying in data found on the form. The manual process of data entry has many disadvantages in speed, accuracy and cost. Based on average professional [[Data entry clerk|typist]] speeds of 50 to 80 wpm,<ref>{{Citation|author=Teresia R. Ostrach|year=1997|title=Typing Speed: How Fast is Average|url=http://onlinestudentreadiness.org/documents/TypingSpeed.pdf|archive-url=https://web.archive.org/web/20120502164156/http://onlinestudentreadiness.org/documents/TypingSpeed.pdf|archive-date=2012-05-02|url-status=dead}}</ref> one could generously estimate about two hundred pages per hour for forms with fifteen one-word fields (not counting the time for reading and sorting pages). In contrast, modern [[Image scanner#Document processing|commercial scanners]] can [[Document imaging|scan and digitize]] up to 200 pages per ''minute''.<ref>{{cite web | url = https://www.engadget.com/2006/11/03/kodak-intros-200-page-per-minute-i1860-commercial-scanner/ | title = Kodak intros 200 page-per-minute i1860 commercial scanner | access-date = 2011-11-04 | publisher = [[Engadget]]}}</ref> The second major disadvantage to manual data entry is the likelihood of [[typographical errors]]. When factoring in the cost of labor and working space, manual data entry is a very inefficient process.
 
==Automated forms processing==
<br>'''Overview:'''<br>
This method can automate data processing by using pre-defined templates and configurations. A template in this case, would be a ''map'' of the document, detailing where the data fields are located within the form or document. As compared to the manual data entry process, automatic form input systems are preferable, since they help reduce the problems faced during manual data processing.
In the broadest sense, forms processing systems can range from a small application form to a large scale survey form. There are several common issues that are involved in forms processing when done manually. These are a lot of tedious human efforts put in, the data typed in by the user could result into typo error, and lots of work hours get consumed in this process. If the forms are processing using [[computer software]] driven application these common issues could be resolved and minimized to great extends. Most methods for forms processing address the following areas :<br>
1. Manual Entry<br>
2. Automatic Form Input System<br>
Manual Entry is that method of [[data processing]] where in the data is keyed in by human while Automatic Form Input System is that which can automate the data processing by using predefined templates and configurations.
Manual Process of data entry implies many problems such as delays in data capture as every single data has to be keyed in manually, great amount of operator's misprints, high labor costs, equipment spending, rent-charge, etc.
As compares to Manual Data Entry process, Automatic Form Input System are more preferable, as it helps in eliminating all the above mentioned problems faced during Manual Data Processing.
 
Automatic Formform Inputinput Systemsystems usesuse different types of Recognitionrecognition methods such as [[Opticaloptical character recognition|Optical Character Recognition]] (OCR), Opticalfor Markmachine Recognition (OMR)print, Barcode, and [[Intelligentoptical Charactermark Recognitionrecognition|optical mark reading]] (ICROMR). OMRfor ischeck/mark thesense mostboxes, efficient[[bar waycode]] ofrecognition data(BCR) processingfor asbarcodes, comparedand to[[intelligent OCRcharacter andrecognition]] (ICR.) ICR accuracy is not guaranteed as it completely depends on userfor hand writing patternsprint.
 
With Automatedautomated Formsform Inputprocessing Systemsystem technology users are enabledable to process documents from their scanned images into a [[Machine-readable data|computer readable]] format such as ANSI, XML, CSV, PDF or CSVinput directly into a database.
 
Forms Processing ishas waydeveloped beyond justbasic capture of the data. Recognition of data using OCR / ICR / OMR / [[Barcode|Bar codes]] will help you capture the data as an electronic format while formsForms processing not only givesencompasses youa recognition process but also help youhelps manage the complete [[:wikt:life cycle|life cycle]] of the documentdocuments which starts from scanning of the document to the extraction of the data, and often to delivery into a back-end system. In some cases it may also include processprocessing ofor generating well formatedformatted results through calculations and analysis. An Automatedautomated Formsforms Processingprocessing purchasedsystem can be valuedvaluable if youthere haveis a need to process few hundreds or thousands of images every day.
 
=== First Step: Assessment of the form structure ===
'''Components:'''<br>
The first step in understanding automated forms processing is to analyze the type of form from which the extraction of data is desired. Forms can be classified as one of two high level categories for the purpose of extracting data. Four categories have been proposed<ref>{{Cite book|url=https://books.google.com/books?id=44arCAAAQBAJ&q=example+of+a+fixed+form+for+extraction&pg=PA425|title=Pattern Recognition and Machine Intelligence: 4th International Conference, PReMI 2011, Moscow, Russia, June 27 - July 1, 2011, Proceedings|last1=Kuznetsov|first1=Sergei O.|last2=Mandal|first2=Deba P.|last3=Kundu|first3=Malay K.|last4=Pal|first4=Sankar Kumar|date=2011-06-25|publisher=Springer|isbn=9783642217869|language=en}}</ref> however the document capture industry has settled up these two:
Various Components included in Data Processing using Automatic Form Input System include:
# Fixed forms. This type of form is defined as one in which the data to be extracted is always found in the same absolute position on a page. This allows a type of lens grid to be applied to the document and every subsequent occurrence of this document in order to extract the data. An example of a fixed form is a typical credit application form.<ref>{{Cite web|url=http://www.bfma.org/resource/resmgr/articles/05_04.pdf|title=CAPTURING SEMI-STRUCTURED FORMS AND DOCUMENTS: CHALLENGES AND AVAILABLE TECHNOLOGIES|last=Vassylyev|first=Artur|date=10 June 2008|archive-url=https://web.archive.org/web/20170428144034/http://www.bfma.org/resource/resmgr/articles/05_04.pdf|archive-date=2017-04-28|url-status=dead|access-date=4 April 2017}}</ref>
1. OCR - Optical Character Recognition
# Semi-structured (or unstructured) form. This form is one in which the ___location of the data and fields holding the data vary from document to document. This type of document is perhaps most easily defined by the fact that it is not a fixed form. In the document capture industry, a semi-structured form is also called an unstructured form. Examples of these types of forms include letters, contracts, and invoices. According to a study by AIIM, about 80% of the documents in an organization fall under the semi-structured definition.<ref>{{Cite web|url=https://www.aiim.org/pdfdocuments/MIWP_Forms-Processing_2012.pdf|title=Forms Processing- user experiences of text and handwriting recognition (OCR/ICR)|access-date=4 April 2017|archive-date=28 April 2017|archive-url=https://web.archive.org/web/20170428142430/http://www.aiim.org/pdfdocuments/MIWP_Forms-Processing_2012.pdf|url-status=dead}}</ref>
2. OMR - [[Optical mark recognition|Optical Mark Recognition]]
Although the components (described below) used for the extraction of data from either type of form is the same the way in which these are applied varies considerably based upon the type of document.
3. ICR - Intelligent Character Recognition
4. Barcode
5. MICR - [[Magnetic ink character recognition|Magnetic Ink Character Recognition]]
 
'''===Components:'''<br>===
Optical Character Recognition (OCR) Recognizes machine-printed uppercase/lowercase alphabetic, numeric, accented characters, many [[Currency sign|currency symbols]], digits, arithmetic symbols, expanded punctuation characters and more.
Various Componentscomponents included in Datadata Processingprocessing using Automaticautomatic Formform-input Input Systemsystem include:
#OCR – [[Optical character recognition]]
2. #OMR - [[Optical mark recognition|Optical Mark Recognition]]
#ICR – [[Intelligent character recognition]]
#BCR – [[Barcode]] recognition
5. #MICR - [[Magnetic ink character recognition|Magnetic Ink Character Recognition]]
 
Optical Character Recognition (OCR) Recognizesrecognizes machine-printed uppercase/lowercase alphabetic, numeric, accented characters, many [[Currency sign|currency symbols]], digits, arithmetic symbols, expanded punctuation characters and more.
Intelligent Character Recognition (ICR) Recognizes hand-printed American and [[European English]] characters using pre-defined character sets: uppercase, lowercase, [[mixed case]] alphabetic, digits, currency (including $ (dollar), ¢ (cent) € (Euro) £ (pound), ¥ (Yen)), arithmetic and punctuation characters (including period, comma, [[Quotation mark|single quote]], double quote, ! & ( ) ? @ { } \ # % * + - / : ; < = >)
 
Intelligent Character Recognition (ICR) Recognizesrecognizes hand-printed American and [[European English (disambiguation)|European English]] characters using pre-defined character sets: uppercase, lowercase, [[mixed case]] alphabetic, digits, currency (including $ (dollar), ¢ (cent) € (Euro) £ (pound), ¥ (Yen)), arithmetic and punctuation characters (including period, comma, [[Quotation mark|single quote]], double quote, ! & ( ) ? @ { } \ # % * + - / : ; < = >)
Magnetic Ink Character Recognition (MICR) Recognition technology to facilitate the processing of the MICR fonts of Cheque. This minimizes chances of error in clearing of Cheque. It is also useful in easy and faster transfer of funds. MICR provides a secure, high-speed method of scanning and processing information.
 
MagneticMICR Ink Character Recognition (MICR)is Recognitionrecognition technology to facilitate the processing of the MICR fonts of Chequecheques. This minimizes chances of errorerrors in clearing of Chequecheques. It is also useful infor easyeasier and faster transfer of funds. MICR provides a secure, high-speed method of scanning and processing information.
Optical Mark Recognition (OMR) Identifies hand filled in bubbles on printed forms. Usually OMR supports single and multiple mark recognition. The fields to be recognized can be specified as grids (rows by columns) or single bubbles.
 
Optical Mark Recognition (OMR) Identifiesidentifies handbubbles filled in bubblesby hand or check boxes on printed forms. Usually OMR supports single and multiple mark recognition. The fields to be recognized can be specified as grids (rows by columns) or single bubbles.
Barcode can read more than 20 industry 1D and 2D barcodes including Code39, CODABAR, [[Interleaved 2 of 5]], Code93 and more. It automatically detects all barcodes in an image or specified area within the image
<br>'''Process:'''<br>
The process of Automated Forms Processing includes the following steps:
1. A batch of completed forms is scanned using a high-speed scanner (usually scanners that scan at least 10 [[Computer printer|pages per minute]] are used); recommended scanners like Kodak, Canon and HP could be preferred.
2. Most of the data are recognized automatically using the pre requisites;
3. A few characters about which the program is uncertain are passed on to a human operator;
Verified data are saved into a database or exported as CSV or XML.
 
Barcode Recognition can read more than 20 industry 1D and 2D barcodes including Code39, CODABAR, [[Interleaved 2 of 5]], Code93 and more. It automatically detects all barcodes in an image or specified area within the image.
<br>'''Pre Requisites:'''<br>
The process of Automatic Forms Processing is of a great success if the pre requisites are successfully maintained.
Few of the pre requisites include:
1. Scan Format: It includes the format of scanned file, Resolution and DPI, Color Mode
2. Configuration: The scanned image layout needs to be configured for this automation
3. Recognition: The pre defined out put formats
4. Result / Analyze: Any specific format of result of capture value data presentation.
 
===Process===
The process of Automatedautomated Formsforms Processingprocessing typically includes the following steps:
#A batch of completed forms is scanned using a high-speed scanner
#Images are cleaned with document image processing algorithms to improve accuracy
#Forms are classified based on original template forms and the fields are extracted using the appropriate recognition components
#Fields which the system flagged with a low confidence are queued for verification by a human operator
#Verified data areis saved into a database or exported to searchable text format such as CSV or, XML. or PDF
 
===Prerequisites===
{{Uncategorized|date=June 2009}}
Though automated forms processing has many great advantages over manual data entry, it still comes with some limitations. To achieve the best accuracy, some prerequisites should be followed.
1. #Scan Formatformat: It includes the format of scanned file, Resolution and DPI, Color Mode
2. #Configuration: The scanned image layout needs to be configured for this automation
3. #Recognition: The pre defined out put formats
4. #Result / Analyzeanalyze: Any specific format of result of capture value data presentation.
 
One very important consideration is indexing, determining the [[metadata]] that will be used to describe the data contained within the documents. This attribute perhaps drives the forms processing solution more than any other.
 
==External links==
{{wikiquote}}
* [https://web.archive.org/web/20100529053053/http://www.aiim.org.uk/industrywatch/surveys.asp AIIM market intelligence reports]
 
==References==
{{reflist}}
 
[[Category:Automatic identification and data capture]]