Search engine scraping: Difference between revisions

Content deleted Content added
- Petal (trivial market share, WP:SOAP by LTA sockpuppets)
No edit summary
Line 61:
* [[cURL]] – a command line browser for automation and testing, as well as a powerful open source HTTP interaction library available for a large range of programming languages.<ref>{{cite web|url=https://curl.haxx.se/libcurl/|title=libcurl - the multiprotocol file transfer library|website=curl.haxx.se}}</ref>
* Google-search - A Go package to scrape Google. <ref>{{cite web|url=https://github.com/rocketlaunchr/google-search|title=A Go package to scrape Google.|via=GitHub}}</ref>
* [https://seotoolskit.co/ SEO Tools Kit] – Free Online Tools, Duckduckgo, Baidu, [[Petal Search|Petal]], [[Sogou]]) by using proxies (socks4/5, http proxy). The tool includes asynchronous networking support and is able to control real browsers to mitigate detection.<ref>{{cite web|url=https://seotoolskit.co/|title=Free online SEO Tools (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.: NikolaiT/SEO Tools Kit|date=15 January 2019|publisher=|via=GitHub}}</ref>
*se-scraper - Successor of SEO Tools Kit. Scrape search engines concurrently with different proxies. <ref>{{Citation|last=Tschacher|first=Nikolai|title=NikolaiT/se-scraper|date=2020-11-17|url=https://github.com/NikolaiT/se-scraper|access-date=2020-11-19}}</ref>
 
Line 70:
The largest public known incident of a search engine being scraped happened in 2011 when Microsoft was caught scraping unknown keywords from Google for their own, rather new Bing service,<ref>{{cite web|url=https://www.wired.com/2011/02/bing-copies-google/|title=Google Catches Bing Copying; Microsoft Says ‘So What?’|first=Ryan|last=Singel|work=Wired}}</ref> but even this incident did not result in a court case.
 
One possible reason might be that search engines like Google, [[Petal Search|Petal]], [[Sogou]] are getting almost all their data by scraping millions of public reachable websites, also without reading and accepting those terms.
 
==See also==