For the past three (3) years, I’ve worked on multiple Scrapy projects with lots of spiders. Most of them are scheduled via scrapyd, a JSON API to schedule spiders. Sometimes, this spiders go kaput for various reasons such as change in layout, change in URL, being blacklist, among others. Checking the logs for them one-by-one can be time-consuming – so here’s where Rollbar comes in.
Rollbar is an error monitoring service that groups similar errors and gives you insights which one occurs the most. It can even help you track which commit/versin introduced the bugs. This way, you can discover bugs faster making it quicker for you to fix them.
Installing Rollbar for Python
Fortunately, there’s pyrollbar. You can install it via pip:
pip install pyrollbar
Integration pyrollbar to your Scrapy spider
Then, on your base spider (the spider that your rest of your spiders will extend, hook an instance of the
RollbarHandler to the loggers of
twisted, and the spider itself.
from scrapy import Spider
from rollbar.logger import RollbarHandler
name = 'base_spider' # I will be overwritten anyway
def __init__(self, *args, **kwargs):
handler = RollbarHandler(access_token=<YOUR_TOKEN>,