No page is out of reach! Using scrapy and playwright we have the best of both worlds for javascript rendering and data scraping capabilities. In this project i will show you how to get started with a basic scraper on a javascript heavy website, using scrapy-playwright. By putting the headless browser infront of scrapy to make the requests we are able to render out the page, and even wait for certain selectors to be visible before we return the page DOM/HTML and have it be parsed with Scrapy

Doing it this way we have many benefits; scrapy items, item loader, pipelines, middleware all accessible for us to use. There are a few drawbacks however, any web scraping using a real browser is inheritly slower - this is something we can't avoid, as the nature of this method requries loading a browser up to access the page. It does however give us access to sites that we previously would have issues scraping.

https://github.com/scrapy-plugins/scrapy-playwright

Support Me:

# Patreon: https://www.patreon.com/johnwatsonrooney (NEW)
# Amazon UK: https://amzn.to/2OYuMwo
# Hosting: Digital Ocean: https://m.do.co/c/c7c90f161ff6
# Gear Used: https://jhnwr.com/gear/ (NEW)

-------------------------------------
Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases
-------------------------------------