Scrapy can keep a crawl queue on disk so a long run does not need to restart from the beginning after a planned stop. That is useful when a spider needs to pause for a deployment, a maintenance window, or a temporary target-side block.
The JOBDIR setting stores the pending request queue, the duplicate-request filter, and any persisted spider.state data for one spider run. A command-line override such as -s JOBDIR=jobstate/catalog-1 enables that storage for a single crawl without editing settings.py.
Each job directory belongs to one run only and should stay on persistent storage that untrusted users cannot write to. Resume works only after a clean shutdown, queued requests can fail later if login cookies expire, and requests that cannot be serialized with pickle will not survive a pause unless the spider is adjusted.
$ cd /srv/catalog_demo
Run the command from the directory that contains scrapy.cfg so scrapy crawl loads the intended project and the relative JOBDIR path lands in the expected place.
$ scrapy crawl catalog -s JOBDIR=jobstate/catalog-1
2026-04-22 06:58:18 [scrapy.crawler] INFO: Overridden settings:
{'JOBDIR': 'jobstate/catalog-1',
##### snipped #####
}
2026-04-22 06:58:20 [scrapy.core.engine] INFO: Spider opened
Command-line settings have the highest precedence in Scrapy, so -s JOBDIR=… overrides project or spider defaults for this run only. Use an absolute path when the crawl runs under a service manager, scheduler, or container and the working directory may vary.
Do not share one JOBDIR path between different spiders or between concurrent runs of the same spider.
^C
2026-04-22 06:58:23 [scrapy.crawler] INFO: Received SIGINT, shutting down gracefully. Send again to force
2026-04-22 06:58:23 [scrapy.core.engine] INFO: Closing spider (shutdown)
2026-04-22 06:58:24 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'finish_reason': 'shutdown',
'item_scraped_count': 4,
'scheduler/dequeued/disk': 5,
'scheduler/enqueued/disk': 21,
##### snipped #####
}
2026-04-22 06:58:24 [scrapy.core.engine] INFO: Spider closed (shutdown)
Forced termination can corrupt the saved queue and leave the next resume incomplete or unusable.
$ scrapy crawl catalog -s JOBDIR=jobstate/catalog-1
2026-04-22 06:58:41 [scrapy.core.engine] INFO: Spider opened
2026-04-22 06:58:41 [scrapy.core.scheduler] INFO: Resuming crawl (16 requests scheduled)
2026-04-22 06:58:51 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'finish_reason': 'finished',
'item_scraped_count': 16,
'scheduler/dequeued/disk': 17,
'scheduler/enqueued/disk': 1,
##### snipped #####
}
2026-04-22 06:58:51 [scrapy.core.engine] INFO: Spider closed (finished)
If some pending requests do not reappear after the resume, enable SCHEDULER_DEBUG = True so Scrapy logs requests that could not be serialized into the job directory. Related: How to use custom settings in Scrapy
$ rm -rf jobstate/catalog-1