Web scraping: Difference between revisions

Content deleted Content added
m Methods to prevent web scraping: very minor copyediting
Line 44:
{{Further|Document Object Model}}
 
By using a program such as [[Selenium (software)|Selenium]] or [[Playwright (software)|Playwright]], developers can control a web browser such as [[ChromeOS|Chrome]] or [[Firefox]] wherein they can load, navigate, and retrieve data from websites. This method can be especially useful for scraping data from dynamic sites since a web browser will fully load each page. Once an entire page is loaded, you can access and parse the [[Document Object Model|DOM]] using an expression language such as [[XPath]].
By embedding a full-fledged web browser, such as the [[Internet Explorer]] or the [[Mozilla]] browser control, programs can retrieve the dynamic content generated by client-side scripts. These browser controls also parse web pages into a DOM tree, based on which programs can retrieve parts of the pages. Languages such as [[XPath|Xpath]] can be used to parse the resulting DOM tree.
 
=== Vertical aggregation ===