python - What is the fastest way to get scraped data from so many web pages? -
i need scrap 40 random webpages @ same time.these pages vary on each request. have used rpcs in python fetch urls , scraped data using beautifulsoup. takes 25 seconds scrap data , display on screen.
to increase speed stored data in appengine datastore each data scraped once , can accessed there quickly.
but problem is-> size of data increases in datastore, taking long fetch data datastore(more scraping).
should use memcache or shift mysql? mysql faster gae-datastore? or there other better way fetch data possible?
based on know app make sense use memcache. faster, , automatically take care of things expiring stale cache entries.
Comments
Post a Comment