Search engine scraping: Difference between revisions

Content deleted Content added
m Reverted edits by 123.201.65.187 (talk) to last version by Curb Safe Charmer
- Petal (trivial market share, WP:SOAP by LTA sockpuppets)
Line 4:
{{Original research|date=March 2021}}
}}
'''Search engine scraping''' is the process of harvesting [[URL]]s, descriptions, or other information from [[search engine]]s such as [[Google Search|Google]], [[Microsoft Bing|Bing]], [[Yahoo! Search|Yahoo]], [[Petal Search|Petal]] or [[Sogou]]. This is a specific form of [[screen scraping]] or [[web scraping]] dedicated to search engines only.
 
Most commonly larger [[search engine optimization]] (SEO) providers depend on regularly scraping keywords from search engines, especially Google, [[Petal Search|Petal]], [[Sogou]] to monitor the competitive position of their customers' websites for relevant keywords or their [[search engine indexing|indexing]] status.
 
Search engines like Google have implemented various forms of human detection to block any sort of automated access to their service,<ref>{{Cite web|url=https://support.google.com/webmasters/answer/66357?hl=en|title=Automated queries – Search Console Help|website=support.google.com|language=en|accessdate=2017-04-02}}</ref> in the intent of driving the users of scrapers towards buying their official [[API]]s instead.
 
The process of entering a website and extracting data in an automated fashion is also often called "[[Web crawler|crawling]]". Search engine’s like Google, Bing, Yahoo, [[Petal Search|Petal]] or [[Sogou]] get almost all their data from automated crawling bots.
 
== Difficulties ==
Line 34:
All these forms of detection may also happen to a normal user, especially users sharing the same IP address or network class (IPV4 ranges as well as IPv6 ranges).
 
== Methods of scraping Google, Bing, Yahoo, [[Petal Search|Petal]] or [[Sogou]] ==
To scrape a search engine successfully, the two major factors are time and amount.