python - How to tell scrapy crawler to STOP following more links dynamically? -

- January 15, 2012

basically have regex rule following pages

each page has 50 links

when hit link old (based on pre-defined date-time)

i want tell scrapy stop following more pages, not stop entirely, must continue scrape links has decided scrape -> (complete request objects created). must not follow more links. program grind stop (when it's done scraping links)

is there way can inside spider?

once hit "too old" page, throw closespider exception. in case, scrapy finish processing links being scheduled , shut down.

Search This Blog

Post

python - How to tell scrapy crawler to STOP following more links dynamically? -

Comments

Post a Comment

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

Maven Javadoc 'Cannot find default setter' and fails -

lua - nginx string.match non posix -