python - How to pass a url value to all subsequent items in the Scrapy crawl? -


i creating crawlspider scrape product website. page 1, extract category urls of form www.domain.com/color (simplified). on category page, follow first link product detail page, parse product detail page , crawl next 1 via next link. each color category therefore has unique crawl path.

the difficulty color variable not on product detail page. can extract category page parsing link follows:

def parse_item(self, response):         l = xpathitemloader(item=greenhouse(), response=response)         l.default_output_processor = join()         l.add_value('color', response.url.split("/")[-1])         return l.load_item() 

however, want add color value items parsed product detail page products crawled starting particular color category page. product urls crawled following next links, referring category page lost after first link. there in scrapy docs request.meta can pass data between parsers, i'm not sure applies here. appreciated.

my rules are:

rule(sgmllinkextractor(restrict_xpaths=('//table[@id="ctl18_ctlfacetlist_dlfacetlist"]/tr[2]/td',)),), rule(sgmllinkextractor(restrict_xpaths=('//table[@id="ctl18_dlproductlist"]/tr[1]/td[@class="productlistitem"][1]',)),callback='parse_item', follow=true,), rule(sgmllinkextractor(restrict_xpaths=('//a[@id="ctl18_ctl00_lbnext"]',)),callback='parse_item', follow=true, ), 

you can use process_request argument of rules:

class myspider(crawlspider):     ...     rules = [...         rule(sgmllinkextractor(), process_request='add_color'),     ]      def add_color(self, request):         meta = dict(color=request.url.split("/")[-1])         return request.replace(meta=meta) 

Comments

Popular posts from this blog

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -

node.js - Bad Request - node js ajax post -