Revision as of 23:06, 18 March 2021 edit Omnipaedista (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers 267,140 edits boldface per WP:R#PLA ← Previous edit		Revision as of 22:02, 24 April 2021 edit undo ColdRainyDay45 (talk \| contribs) 280 edits edited phrasing Tag: Visual edit Next edit →
Line 5: ==Description== Normally, [[Data transmission\|data transfer]] between programs is accomplished using [[data structures]] suited for [[Automation\|automated]] processing by [[computers]], not people. Such interchange [[File format\|formats]] and [[Protocol (computing)\|protocols]] are typically rigidly structured, well-documented, easily [[parsing\|parsed]], and ~~keep~~minimize ambiguity ~~to a minimum~~. Very often, these transmissions are not [[human-readable]] at all. Thus, the key element that distinguishes data scraping from regular [[parsing]] is that the output being scraped is intended for display to an [[End-user (computer science)\|end-user]], rather than as an input to another program,. ~~and~~It is therefore usually neither documented nor structured for convenient parsing. Data scraping often involves ignoring [[binary data]] (usually images or multimedia data), [[Display device\|display]] formatting, redundant labels, superfluous commentary, and other information which is either irrelevant or hinders automated processing. Data scraping is most often done either to interface to a [[legacy system]], which has no other mechanism which is compatible with current [[computer hardware\|hardware]], or to interface to a third-party system which does not provide a more convenient [[Application programming interface\|API]]. In the second case, the operator of the third-party system will often see [[screen scraping]] as unwanted, due to reasons such as increased system [[load (computing)\|load]], the loss of [[advertisement]] [[revenue]], or the loss of control of the information content.

Data scraping: Difference between revisions