python - How do i fix this; "TypeError: 'WikipediaItem' object does not support item assignment" -
i new python , scrapy. want scrape data wikipedia things didn't work out. everytime scrapy crawl wiki, get; "typeerror: 'wikipediaitem' object not support item assignment". how fix , me scrape details wikipedia.
anyway, here's code:
from scrapy.spider import basespider scrapy.selector import htmlxpathselector wikipedia.items import wikipediaitem class wikipediaitem(basespider): name = "wiki" allowed_domains = ["wikipedia.org"] start_urls = ["http://en.wikipedia.org/wiki/main_page"] def parse(self, response): hxs = htmlxpathselector(response) sites = hxs.select('//table[@id="mp-upper"]/tr') items = [] site in sites: item = wikipediaitem() item['title'] = site.select('.//a[@class="mainpagebg"]/text()').extract() item['link'] = site.select('.//a[@class="mainpagebg"]').extract() item['details'] = site.select('.//p/text()').extract() items.append(item) return items
and here's result get:
2013-04-18 23:56:54+0800 [scrapy] info: scrapy 0.14.4 started (bot: wikipedia) 2013-04-18 23:56:54+0800 [scrapy] debug: enabled extensions: logstats, telnetconsole, closespider, webservice, corestats, memoryusage, spiderstate 2013-04-18 23:56:54+0800 [scrapy] debug: enabled downloader middlewares: httpauthmiddleware, downloadtimeoutmiddleware, useragentmiddleware, retrymiddleware, defaultheadersmiddleware, redirectmiddleware, cookiesmiddleware, httpcompressionmiddleware, chunkedtransfermiddleware, downloaderstats 2013-04-18 23:56:54+0800 [scrapy] debug: enabled spider middlewares: httperrormiddleware, offsitemiddleware, referermiddleware, urllengthmiddleware, depthmiddleware 2013-04-18 23:56:54+0800 [scrapy] debug: enabled item pipelines: 2013-04-18 23:56:54+0800 [wiki] info: spider opened 2013-04-18 23:56:54+0800 [wiki] info: crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2013-04-18 23:56:54+0800 [scrapy] debug: telnet console listening on 0.0.0.0:6023 2013-04-18 23:56:54+0800 [scrapy] debug: web service listening on 0.0.0.0:6080 2013-04-18 23:56:56+0800 [wiki] debug: crawled (200) <get http://en.wikipedia.org/wiki/main_page> (referer: none) 2013-04-18 23:56:56+0800 [wiki] error: spider error processing <get http://en.wikipedia.org/wiki/main_page> traceback (most recent call last): file "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1178, in mainloop self.rununtilcurrent() file "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in rununtilcurrent call.func(*call.args, **call.kw) file "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 368, in callback self._startruncallbacks(result) file "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 464, in _startruncallbacks self._runcallbacks() --- <exception caught here> --- file "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 551, in _runcallbacks current.result = callback(current.result, *args, **kw) file "/home/jean/wiki/wikipedia/spiders/wikipedia_spider.py", line 17, in parse item['title'] = row.select('.//a[@class="mainpagebg"]/text()').extract() exceptions.typeerror: 'wikipediaitem' object not support item assignment 2013-04-18 23:56:56+0800 [wiki] info: closing spider (finished) 2013-04-18 23:56:56+0800 [wiki] info: dumping spider stats: {'downloader/request_bytes': 215, 'downloader/request_count': 1, 'downloader/request_method_count/get': 1, 'downloader/response_bytes': 17762, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2013, 4, 18, 15, 56, 56, 244255), 'scheduler/memory_enqueued': 1, 'spider_exceptions/typeerror': 1, 'start_time': datetime.datetime(2013, 4, 18, 15, 56, 54, 592948)} 2013-04-18 23:56:56+0800 [wiki] info: spider closed (finished) 2013-04-18 23:56:56+0800 [scrapy] info: dumping global stats: {'memusage/max': 28065792, 'memusage/startup': 28065792}
here's items.py
from scrapy.item import item, field
class wikipediaitem(item):
title = field() link = field() details = field()
you named scraper same wikipediaitem
imported:
from wikipedia.items import wikipediaitem class wikipediaitem(basespider): # ...
the parse
using basespider
subclass, not whatever defined in wikipedia.items
. perhaps want rename class:
class wikipediaspider(basespider): # ...
Comments
Post a Comment