Content deleted Content added
Tag: Reverted |
Tag: Reverted |
||
Line 38:
===Web scraping===
{{main | Web scraping}}
[[Web page]]s are built using text-based mark-up languages ([[HTML]] and [[XHTML]]), and frequently contain a wealth of useful data in text form. However, most web pages are designed for human [[End-user (computer science)|end-users]] and not for ease of automated use. Because of this, tool kits that scrape web content were created. A [[Web scraping | web scraper]] is an [[API]] or tool to extract data from a website.<ref>{{Cite journal |last1=Thapelo |first1 = Tsaone Swaabow |last2 = Namoshe |first2 = Molaletsa |last3 = Matsebe |first3 = Oduetse | last4 = Motshegwa |first4 = Tshiamo |last5 = Bopape |first5=Mary-Jane Morongwa |date=2021-07-28 |title=SASSCAL WebSAPI: A Web Scraping Application Programming Interface to Support Access to SASSCAL's Weather Data |journal=Data Science Journal |language = en |volume=20 |pages=24 |doi = 10.5334/dsj-2021-024 |s2cid = 237719804 |issn = 1683-1470|doi-access=free }}</ref> Companies like [[Amazon AWS]] and [[Google]] provide '''web scraping''' tools, services, and public data available free of cost to end-users. Newer forms of web scraping involve listening to data feeds from web servers. For example, [[JSON]] is commonly used as a transport storage mechanism between the client and the webserver.
Sathish harik
Recently, companies have developed web scraping systems that rely on using techniques in DOM parsing, [[computer vision]] and [[natural language processing]] to simulate the human processing that occurs when viewing a webpage to automatically extract useful information.<ref>{{cite web|title = Diffbot aims to make it easier for apps to read Web pages the way humans do|url=http://www.technologyreview.com/news/428056/a-startup-hopes-to-help-computers-understand-web-pages/|website=MIT Technology Review | access-date=1 December 2014}}</ref><ref>{{cite magazine | title=This Simple Data-Scraping Tool Could Change How Apps Are Made|url=https://www.wired.com/2014/03/kimono/|magazine=WIRED|access-date=8 May 2015|url-status=dead|archive-url=https://web.archive.org/web/20150511050542/http://www.wired.com/2014/03/kimono|archive-date=11 May 2015}} <!-- ?! syntax error --></ref>
|