A distributed crawler powered by Headless Chrome ↦
This web crawler uses Headless Chrome (via Puppeteer) to crawl dynamically generated websites in addition to typical static HTML crawling. It can also be run distributed across multiple machines for speed’s sake, but that requires Redis for shared cache storage.
Discussion
Sign in or Join to comment or subscribe