Revision as of 09:57, 6 October 2023 edit Bigeshjen (talk \| contribs) Extended confirmed users 1,126 edits Reverted 3 edits by 136.185.189.153 (talk): The user has added a name that has nothing to do with the article. It's definitely not in good faith, though. Tags: Twinkle Undo ← Previous edit		Revision as of 23:05, 13 October 2023 edit undo Wiki.0905 (talk \| contribs) 243 edits m Added hyperlink Tag: Visual edit Next edit →
Line 9: Thus, the key element that distinguishes data scraping from regular [[parsing]] is that the output being scraped is intended for display to an [[End-user (computer science)\|end-user]], rather than as an input to another program. It is therefore usually neither documented nor structured for convenient parsing. Data scraping often involves ignoring [[binary data]] (usually images or multimedia data), [[Display device\|display]] formatting, redundant labels, superfluous commentary, and other information which is either irrelevant or hinders automated processing. Data scraping is most often done either to [[Interface (computing)\|interface]] to a [[legacy system]], which has no other mechanism which is compatible with current [[computer hardware\|hardware]], or to interface to a third-party system which does not provide a more convenient [[Application programming interface\|API]]. In the second case, the operator of the third-party system will often see [[screen scraping]] as unwanted, due to reasons such as increased system [[load (computing)\|load]], the loss of [[advertisement]] [[revenue]], or the loss of control of the information content. Data scraping is generally considered an ''[[ad hoc]]'', inelegant technique, often used only as a "last resort" when no other mechanism for data interchange is available. Aside from the higher [[computer programming\|programming]] and processing overhead, output displays intended for human consumption often change structure frequently. Humans can cope with this easily, but a computer program will fail. Depending on the quality and the extent of [[error handling]] logic present in the computer, this failure can result in error messages, corrupted output or even [[program crash]]es.

Data scraping: Difference between revisions