Forms processing

This is an old revision of this page, as edited by Brandon.degraaf (talk | contribs) at 23:51, 8 May 2011 (Added a few internal links). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


Defining Forms Processing in simple terms, it is a process by which one can capture information entered into data fields and convert it into an electronic format.
In this process:

  • Entered data is “captured” from their respective fields;
  • Forms themselves are digitized and saved as images.

What it means is that the hard copy of the data on the document can be scanned in as an image using a scanner. This image is then recognized based on a pre-defined configuration. The data is captured from particular zones and stored in an electronic format.


Overview:
In the broadest sense, forms processing systems can range from a small application form to a large scale survey form. There are several common issues that are involved in forms processing when done manually. These are a lot of tedious human efforts put in, the data keyed in by the user may result in typos, and many hours of labor result from this lengthy process. If the forms are processed using computer software driven applications these common issues can be resolved and minimized to great extent. Most methods for forms processing address the following areas :
1. Manual data entry
This method of data processing involves human operators keying in data found on the form. The manual process of data entry implies many opportunities for errors, such as delays in data capture, as every single data field has to be keyed in manually, a high amount of operator misprints or typos, high labor costs from the amount of manual labor required. Manual processing also implies higher labor expenses in regards to spending for equipment and supplies, rent, etc.

2. Automatic form input system
This method can automate data processing by using pre-defined templates and configurations. A template in this case, would be a map of the document, detailing where the data fields are located within the form or document. As compared to the manual data entry process, automatic form input systems are more preferable, as it helps in eliminating all the above mentioned problems faced during manual data processing.

Automatic form input system uses different types of recognition methods such as Optical Character Recognition (OCR) for machine print, Optical Mark Reading (OMR) for check/mark sense boxes, Bar Code Recognition (BCR) for barcodes, and Intelligent Character Recognition (ICR) for hand print. ICR accuracy depends on user hand writing patterns, but certain recognition engines have been designed specifically for this purpose.

With automated forms input system technology users are able to process documents from their scanned images into a computer readable format such as ANSI, XML or CSV.

Forms Processing has developed beyond simple capture of the data. Recognition of data using OCR/ICR/OMR/BCR will help capture data as an electronic format. Forms processing not encompasses a recognition process but also helps manage the complete life cycle of documents which starts from scanning of the document to the extraction of the data, and often to delivery into a back-end system. In some cases it may also include processing or generating well formated results through calculations and analysis. An automated forms processing system can be valuable if there is a need to process hundreds or thousands of images every day.

Components:
Various Components included in Data Processing using Automatic Form Input System include:
1. OCR - Optical Character Recognition
2. OMR - Optical Mark Recognition
3. ICR - Intelligent Character Recognition
4. BCR - Bar Code Recognition
5. MICR - Magnetic Ink Character Recognition

Optical Character Recognition (OCR) Recognizes machine-printed uppercase/lowercase alphabetic, numeric, accented characters, many currency symbols, digits, arithmetic symbols, expanded punctuation characters and more.

Intelligent Character Recognition (ICR) Recognizes hand-printed American and European English characters using pre-defined character sets: uppercase, lowercase, mixed case alphabetic, digits, currency (including $ (dollar), ¢ (cent) € (Euro) £ (pound), ¥ (Yen)), arithmetic and punctuation characters (including period, comma, single quote, double quote, ! & ( ) ? @ { } \ # % * + - / : ; < = >)

Magnetic Ink Character Recognition (MICR) Recognition technology to facilitate the processing of the MICR fonts of Cheques. This minimizes chances of errors in clearing of Cheques. It is also useful for easier and faster transfer of funds. MICR provides a secure, high-speed method of scanning and processing information.

Optical Mark Recognition (OMR) identifies bubbles filled in by hand or check boxes on printed forms. Usually OMR supports single and multiple mark recognition. The fields to be recognized can be specified as grids (rows by columns) or single bubbles.

Barcode Recognition can read more than 20 industry 1D and 2D barcodes including Code39, CODABAR, Interleaved 2 of 5, Code93 and more. It automatically detects all barcodes in an image or specified area within the image.

Process:
The process of Automated Forms Processing includes the following steps: 1. A batch of completed forms is scanned using a high-speed scanner (usually scanners that scan at least 10 pages per minute are used); recommended scanners like Kodak, Canon and HP could be preferred. 2. Most of the data are recognized automatically using the pre requisites; 3. A few characters about which the program is uncertain are passed on to a human operator; Verified data are saved into a database or exported as CSV or XML.


Pre Requisites:
The process of Automatic Forms Processing is of a great success if the pre requisites are successfully maintained. Few of the pre requisites include: 1. Scan Format: It includes the format of scanned file, Resolution and DPI, Color Mode 2. Configuration: The scanned image layout needs to be configured for this automation 3. Recognition: The pre defined out put formats 4. Result / Analyze: Any specific format of result of capture value data presentation.