Data scraping: Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 48:
'''Report mining'''<!--boldface per WP:R#PLA--> is the extraction of data from human-readable computer reports. Conventional [[data extraction]] requires a connection to a working source system, suitable [[Database connection|connectivity]] standards or an [[Application programming interface|API]], and usually complex querying. By using the source system's standard reporting options, and directing the output to a [[Spooling|spool file]] instead of to a [[printer (computing)|printer]], static reports can be generated suitable for offline analysis via report mining.<ref>Scott Steinacher, [https://web.archive.org/web/20160304205109/http://connection.ebscohost.com/c/product-reviews/2235513/data-pump-transforms-host-data "Data Pump transforms host data"], ''[[InfoWorld]]'', 30 August 1999, p55</ref> This approach can avoid intensive [[Central processing unit|CPU]] usage during business hours, can minimise [[end-user]] licence costs for [[Enterprise resource planning|ERP]] customers, and can offer very rapid prototyping and development of custom reports. Whereas data scraping and web scraping involve interacting with dynamic output, report mining involves extracting data from files in a human-readable format, such as [[HTML]], PDF, or text. These can be easily generated from almost any system by intercepting the data feed to a printer. This approach can provide a quick and simple route to obtaining data without the need to program an API to the source system.
 
'''Legal and Ethical Considerations'''
 
The legality and ethics of data scraping are often argued. Scraping publicly accessible data is generally legal, however scraping in a manner that infringes a website's terms of service, breaches security measures, or invades user privacy can lead to legal action. Moreover, some websites particularly prohibit data scraping in their robots.
 
'''Data Scraping and Anti-detect browsers'''
 
Anti-detect browsers have come up as a tool closely associated with data scraping, especially for users who need to manage multiple scraping actions simultaneously or avoid detection. This kind of browser allows users to create multiple virtual browser accounts and mimic different devices, locations, etc. At the same time, it is reducing the likelihood of being blocked or flagged by websites.
Line 73:
==References==
{{reflist}}
<ref>12. Multilogin. (n.d.). Multilogin | Prevent account bans and enables scaling. https://multilogin.com/blog/how-to-scrape-data-on-google/ <ref>
 
<ref>13. Mitchell, R. (2022). "The Ethics of Data Scraping." Journal of Information Ethics, 31(2), 45-61.<ref>
 
<ref>14. Kavanagh, D. (2021). "Anti-Detect Browsers: The Next Frontier in Web Scraping." Web Security Review, 19(4), 33-48. <ref>
 
<ref>15.Walker, J. (2020). "Legal Implications of Data Scraping." Tech Law Journal, 22(3), 109-126.<ref>
 
==Further reading==