For the past three (3) years, I’ve worked on multiple Scrapy projects with lots of spiders. Most of them are scheduled via scrapyd, a JSON API to schedule spiders. Sometimes, this spiders go kaput for various reasons such as change in layout, change in URL, being blacklist, among others. Checking the logs for them one-by-one can be time-consuming – so here’s where Rollbar comes in.
Rollbar is an error monitoring service that groups similar errors and gives you insights which one occurs the most. It can even help you track which commit/versin introduced the bugs. This way, you can discover bugs faster making it quicker for you to fix them.
Installing Rollbar for Python
Fortunately, there’s pyrollbar. You can install it via pip:
pip install pyrollbar
Integration pyrollbar to your Scrapy spider
Then, on your base spider (the spider that your rest of your spiders will extend, hook an instance of the
RollbarHandler to the loggers of
twisted, and the spider itself.
import logging from scrapy import Spider from rollbar.logger import RollbarHandler class BaseSpider(Spider): name = 'base_spider' # I will be overwritten anyway def __init__(self, *args, **kwargs): handler = RollbarHandler(access_token=
, environment= , level=logging.ERROR) logging.getLogger('scrapy').addHandler(handler) logging.getLogger('twisted').addHandler(handler) self.logger.logger.addHandler(handler)