Revision as of 10:43, 6 November 2021 edit Vt320 (talk \| contribs) Extended confirmed users 5,551 edits remove link to redirect Tag: Reverted ← Previous edit		Revision as of 23:03, 8 November 2021 edit undo Vt320 (talk \| contribs) Extended confirmed users 5,551 edits Undid revision 1053836087 by Vt320 (talk) Tag: Undo Next edit →
Line 25: As a concrete example of a classic screen scraper, consider a hypothetical legacy system dating from the 1960s—the dawn of computerized [[data processing]]. Computer to [[user interface]]s from that era were often simply text-based [[dumb terminal]]s which were not much more than virtual [[teleprinter]]s (such systems are still in use {{As of\|2007\|alt=today}}, for various reasons). The desire to interface such a system to more modern systems is common. A [[Robustness (computer science)\|robust]] solution will often require things no longer available, such as [[source code]], system [[documentation]], [[Application programming interface\|API]]s, or [[programmers]] with experience in a 50-year-old computer system. In such cases, the only feasible solution may be to write a screen scraper that "pretends" to be a user at a terminal. The screen scraper might connect to the legacy system via [[Telnet]], [[emulator\|emulate]] the keystrokes needed to navigate the old user interface, process the resulting display output, extract the desired data, and pass it on to the modern system. A sophisticated and resilient implementation of this kind, built on a platform providing the governance and control required by a major enterprise—e.g. change control, security, user management, data protection, operational audit, load balancing, and queue management, etc.—could be said to be an example of [[robotic process automation]] software, called RPA or RPAAI for self-guided RPA 2.0 based on [[artificial intelligence]]. In the 1980s, financial data providers such as [[Reuters]], [[Dow Jones & Company\|Telerate]], and [[Quotron]] displayed data in 24×80 format intended for a human reader. Users of this data, particularly [[Investment banking\|investment banks]], wrote applications to capture and convert this character data as numeric data for inclusion into calculations for trading decisions without [[data entry clerk\|re-keying]] the data. The common term for this practice, especially in the [[United Kingdom]], was ''page shredding'', since the results could be imagined to have passed through a [[paper shredder]]. Internally Reuters used the term 'logicized' for this conversion process, running a sophisticated computer system on [[~~OpenVMS\|~~VAX/VMS]] called the Logicizer.<ref>[http://www.fxweek.com/fx-week/news/1539599/contributors-fret-about-reuters-plan-to-switch-from-monitor-network-to-idn Contributors Fret About Reuters' Plan To Switch From Monitor Network To IDN], ''FX Week'', 02 Nov 1990</ref> More modern screen scraping techniques include capturing the bitmap data from the screen and running it through an [[Optical character recognition\|OCR]] engine, or for some specialised automated testing systems, matching the screen's bitmap data against expected results.<ref>{{Cite journal\|url = http://groups.csail.mit.edu/uid/projects/sikuli/sikuli-uist2009.pdf\|title = Sikuli: Using GUI Screenshots for Search and Automation\|last = Yeh\|first = Tom\|date = 2009\|journal = UIST}}</ref> This can be combined in the case of [[GUI]] applications, with querying the graphical controls by programmatically obtaining references to their underlying [[Object-oriented programming\|programming objects]]. A sequence of screens is automatically captured and converted into a database.

Data scraping: Difference between revisions