Scrapy

Make building spiders a breeze

Scrapy is an open source python framework built specifically for web scraping by Scrapinghub co-founders Pablo Hoffman and Shane Evans. Out of the box, Scrapy spiders are designed to download HTML, parse and process the data and save it in either CSV, JSON or XML file formats.

Tell us you project requirements so we can get you an accurate quote - our pricing is based on the number of websites plus the number of records and complexity.

Terminal

 pip install scrapy
 cat > myspider.py <<EOF
import scrapy

class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://blog.scrapinghub.com']

    def parse(self, response):
        for title in response.css('.post-header>h2'):
            yield {'title': title.css('a ::text').get()}

        for next_page in response.css('a.next-posts-link'):
            yield response.follow(next_page, self.parse)
EOF scrapy runspider myspider.py

Trusted By Developers

Built by developers, for developers

Used by over 1 million developers

.

Over 32k Github stars

.

Continuously maintained & updated

.

Robust Web Scraping Capabilities

Scrapy boasts a wide range of built-in extensions and middlewares designed for handling cookies and sessions as well as HTTP features like compression, authentication, caching, user-agents, robots.txt and crawl depth restriction. It is also very easy to extend through the development of custom middlewares or pipelines to your web scraping projects which can give you the specific functionality you require.

Start Scraping The Web In Minutes

Simply deploy your spider to Scrapy Cloud and start extracting the data you need.

Terminal

 pip install shub
 shub login
Insert your Scrapinghub API Key: <API_KEY>

# Deploy the spider to Scrapy Cloud
 shub deploy

# Schedule the spider for execution
 shub schedule blogspider 
Spider blogspider scheduled, watch it running here:
https://app.scrapinghub.com/p/26731/job/1/8

# Retrieve the scraped data
 shub items 26731/1/8
{"title": "Improved Frontera: Web Crawling at Scale with Python 3 Support"}
{"title": "How to Crawl the Web Politely with Scrapy"}
...

If you’d like to build your first Scrapy spider, then be sure to check out the Scrapy Documentation or Learn Scrapy tutorials.

Need data you can rely on?

Tell us about your project or start using our scraping tools today.

© 2010 - 2019 Scrapinghub

refreshgithubgithub-altcode linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram