Revision as of 05:53, 25 April 2023 edit AnomieBOT (talk \| contribs) Bots 6,862,603 edits m Dating maintenance tags: {{Toomanylinks}} ← Previous edit		Revision as of 21:54, 15 May 2023 edit undo ChabbD (talk \| contribs) 265 edits fixed spelling Tag: Visual edit Next edit →
Line 45: Scraping scripts need to overcome a few technical challenges:<ref>{{cite web\|url=http://google-rank-checker.squabbel.com\|title=Scraping Google Ranks for Fun and Profit\|website=google-rank-checker.squabbel.com}}</ref> * IP rotation using Proxies (proxies should be unshared and not listed in blacklists) * Proper time management, time between keyword changes, pagination as well as correctly placed delays Effective ~~longterm~~long-term scraping rates can vary from only 3–5 requests (keywords or pages) per hour up to 100 and more per hour for each IP address / Proxy in use. The quality of IPs, methods of scraping, keywords requested and language/country requested can greatly affect the possible maximum rate. * Correct handling of URL parameters, cookies as well as HTTP headers to emulate a user with a typical browser<ref name=":0" /> HTML [[Document Object Model\|DOM]] parsing (extracting URLs, descriptions, ranking position, sitelinks and other relevant data from the HTML code) Line 65: [[cURL]] – a command line browser for automation and testing, as well as a powerful open source HTTP interaction library available for a large range of programming languages.<ref>{{cite web\|url=https://curl.haxx.se/libcurl/\|title=libcurl - the multiprotocol file transfer library\|website=curl.haxx.se}}</ref> * Google-search - A Go package to scrape Google.<ref>{{cite web\|url=https://github.com/rocketlaunchr/google-search\|title=A Go package to scrape Google.\|via=GitHub}}</ref> * [https://seotoolskit.co/ SEO Tools Kit] – Free Online Tools, ~~Duckduckgo~~DuckDuckGo, Baidu, [[Sogou]]) by using proxies (socks4/5, http proxy). The tool includes asynchronous networking support and is able to control real browsers to mitigate detection.<ref>{{cite web\|url=https://seotoolskit.co/\|title=Free online SEO Tools (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.: NikolaiT/SEO Tools Kit\|date=15 January 2019\|publisher=\|via=GitHub}}</ref> *se-scraper - Successor of SEO Tools Kit. Scrape search engines concurrently with different proxies.<ref>{{Citation\|last=Tschacher\|first=Nikolai\|title=NikolaiT/se-scraper\|date=2020-11-17\|url=https://github.com/NikolaiT/se-scraper\|access-date=2020-11-19}}</ref>

Search engine scraping: Difference between revisions