How to download HTML encoded with PHP/JavaScript content using WGET or Perl -
i have url want download , parse:
http://diana.cslab.ece.ntua.gr/micro-cds/index.php?r=search/results_mature&mir=hsa-mir-3131&kwd=mimat0014996
the problem when download unix wget
following way:
$ wget [the above url]
it gave me content different saw on browser (namely, list of genes not there).
what's right way programatically?
#/usr/bin/perl use www::mechanize; use strict; use warnings; $url = "http://diana.cslab.ece.ntua.gr/micro-cds/index.php?r=search/results_mature&mir=hsa-mir-3131&kwd=mimat0014996"; $mech = www::mechanize->new(); $mech->agent_alias("windows ie 6"); $mech->get($url); #now have access html code via $mech->content();
to process html code i'm recommend use html::treebuilder::xpath
(or other html parsing module)
Comments
Post a Comment