Revision as of 00:44, 2 October 2022 edit Madssnake (talk \| contribs) Extended confirmed users 520 edits →Web scraping: Added a citation Tag: Visual edit ← Previous edit		Revision as of 09:50, 12 October 2022 edit undo InternetArchiveBot (talk \| contribs) Bots, Pending changes reviewers 5,682,677 edits Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.2 Next edit →
Line 27: In the 1980s, financial data providers such as [[Reuters]], [[Dow Jones & Company\|Telerate]], and [[Quotron]] displayed data in 24×80 format intended for a human reader. Users of this data, particularly [[Investment banking\|investment banks]], wrote applications to capture and convert this character data as numeric data for inclusion into calculations for trading decisions without [[data entry clerk\|re-keying]] the data. The common term for this practice, especially in the [[United Kingdom]], was ''page shredding'', since the results could be imagined to have passed through a [[paper shredder]]. Internally Reuters used the term 'logicized' for this conversion process, running a sophisticated computer system on [[VAX/VMS]] called the Logicizer.<ref>[http://www.fxweek.com/fx-week/news/1539599/contributors-fret-about-reuters-plan-to-switch-from-monitor-network-to-idn Contributors Fret About Reuters' Plan To Switch From Monitor Network To IDN], ''FX Week'', 02 Nov 1990</ref> More modern screen scraping techniques include capturing the bitmap data from the screen and running it through an [[Optical character recognition\|OCR]] engine, or for some specialised automated testing systems, matching the screen's bitmap data against expected results.<ref>{{Cite journal\|url = http://groups.csail.mit.edu/uid/projects/sikuli/sikuli-uist2009.pdf\|title = Sikuli: Using GUI Screenshots for Search and Automation\|last = Yeh\|first = Tom\|date = 2009\|journal = UIST\|access-date = 2015-02-16\|archive-date = 2010-02-14\|archive-url = https://web.archive.org/web/20100214184939/http://groups.csail.mit.edu/uid/projects/sikuli/sikuli-uist2009.pdf\|url-status = dead}}</ref> This can be combined in the case of [[GUI]] applications, with querying the graphical controls by programmatically obtaining references to their underlying [[Object-oriented programming\|programming objects]]. A sequence of screens is automatically captured and converted into a database. Another modern adaptation to these techniques is to use, instead of a sequence of screens as input, a set of images or PDF files, so there are some overlaps with generic "document scraping" and [[#Report mining\|report mining]] techniques.

Data scraping: Difference between revisions