Content deleted Content added
→Web scraping: Added a citation |
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.2 |
||
Line 27:
In the 1980s, financial data providers such as [[Reuters]], [[Dow Jones & Company|Telerate]], and [[Quotron]] displayed data in 24×80 format intended for a human reader. Users of this data, particularly [[Investment banking|investment banks]], wrote applications to capture and convert this character data as numeric data for inclusion into calculations for trading decisions without [[data entry clerk|re-keying]] the data. The common term for this practice, especially in the [[United Kingdom]], was ''page shredding'', since the results could be imagined to have passed through a [[paper shredder]]. Internally Reuters used the term 'logicized' for this conversion process, running a sophisticated computer system on [[VAX/VMS]] called the Logicizer.<ref>[http://www.fxweek.com/fx-week/news/1539599/contributors-fret-about-reuters-plan-to-switch-from-monitor-network-to-idn Contributors Fret About Reuters' Plan To Switch From Monitor Network To IDN], ''FX Week'', 02 Nov 1990</ref>
More modern screen scraping techniques include capturing the bitmap data from the screen and running it through an [[Optical character recognition|OCR]] engine, or for some specialised automated testing systems, matching the screen's bitmap data against expected results.<ref>{{Cite journal|url = http://groups.csail.mit.edu/uid/projects/sikuli/sikuli-uist2009.pdf|title = Sikuli: Using GUI Screenshots for Search and Automation|last = Yeh|first = Tom|date = 2009|journal = UIST|access-date = 2015-02-16|archive-date = 2010-02-14|archive-url = https://web.archive.org/web/20100214184939/http://groups.csail.mit.edu/uid/projects/sikuli/sikuli-uist2009.pdf|url-status = dead}}</ref> This can be combined in the case of [[GUI]] applications, with querying the graphical controls by programmatically obtaining references to their underlying [[Object-oriented programming|programming objects]]. A sequence of screens is automatically captured and converted into a database.
Another modern adaptation to these techniques is to use, instead of a sequence of screens as input, a set of images or PDF files, so there are some overlaps with generic "document scraping" and [[#Report mining|report mining]] techniques.
|