python - Why doesn't downloading text file work correctly? -

i using python 3.3.1. have created function called download_file() downloads file , saves disk.

#!/usr/bin/python3 # -*- coding: utf8 -*-  import datetime import os import urllib.error import urllib.request   def download_file(*urls, download_location=os.getcwd(), debugging=false):     """downloads files provided multiple url arguments.      provide url files downloaded strings. separate     files downloaded comma.      function download files , save in folder     provided keyword-argument download_location. if     download_location not provided, file saved in     current working directory. folder download_location     created if doesn't exist. not worry trailing     slash @ end download_location. code take carry of     you.      if download encounters error alert ,     provide information error code , error reason (if     received server).      normal usage:     >>> download_file('http://localhost/index.html',                       'http://localhost/info.php')     >>> download_file('http://localhost/index.html',                       'http://localhost/info.php',                       download_location='/home/aditya/download/test')     >>> download_file('http://localhost/index.html',                       'http://localhost/info.php',                       download_location='/home/aditya/download/test/')      in debug mode, files not downloaded, neither there     attempt establish connection server. prints     out filename , url have been attempted     downloaded in normal mode.      default, debug mode inactive. in order activate it,     need supply keyword-argument 'debugging=true', like:     >>> download_file('http://localhost/index.html',                       'http://localhost/info.php',                       debugging=true)     >>> download_file('http://localhost/index.html',                       'http://localhost/info.php',                       download_location='/home/aditya/download/test',                       debugging=true)      """     # append trailing slash @ end of download_location if not     # present     if download_location[-1] != '/':         download_location = download_location + '/'      # create folder download_location if not present     os.makedirs(download_location, exist_ok=true)      # other variables     time_format = '%y-%b-%d %h:%m:%s'   # '2000-jan-01 22:10:00'      # "request headers" information file downloaded     accept = 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'     accept_encoding = 'gzip, deflate'     accept_language = 'en-us,en;q=0.5'     connection = 'keep-alive'     user_agent = 'mozilla/5.0 (x11; ubuntu; linux i686; rv:20.0) \                   gecko/20100101 firefox/20.0'     headers = {'accept': accept,                'accept-encoding': accept_encoding,                'accept-language': accept_language,                'connection': connection,                'user-agent': user_agent,                }      # loop through files downloaded     url in urls:         filename = os.path.basename(url)         if not debugging:             try:                 request_sent = urllib.request.request(url, none, headers)                 response_received = urllib.request.urlopen(request_sent)             except urllib.error.urlerror error_encountered:                 print(datetime.datetime.now().strftime(time_format),                       ':', filename, '- file not downloaded.')                 if hasattr(error_encountered, 'code'):                     print(' ' * 22, 'error code -', error_encountered.code)                 if hasattr(error_encountered, 'reason'):                     print(' ' * 22, 'reason -', error_encountered.reason)             else:                 read_response = response_received.read()                 output_file = download_location + filename                 open(output_file, 'wb') downloaded_file:                     downloaded_file.write(read_response)                 print(datetime.datetime.now().strftime(time_format),                       ':', filename, '- downloaded successfully.')         else:             print(datetime.datetime.now().strftime(time_format),                   ': debugging :', filename, 'would downloaded :\n',                   ' ' * 21, url)

this function works downloading pdfs, images , other formats, giving trouble text documents html files. suspect problem has line @ end:

with open(output_file, 'wb') downloaded_file:

so, have tried opening in wt mode well. have tried work w mode only. doesn't solve problem.

the other problem might have been encoding have included second line as:

# -*- coding: utf8 -*-

but still doesn't work. might problem , how make work both text , binary files?

example of doesn't work:

>>>download_file("http://docs.python.org/3/tutorial/index.html")

when open in gedit, displayed as:

in gedit

similarly when opened in firefox:

in firefox

the file downloading has been sent gzip encoding -- can see if zcat index.html, downloaded file appears correctly. in code, might want add like:

if response_received.headers.get('content-encoding') == 'gzip':     read_response = zlib.decompress(read_response, 16 + zlib.max_wbits)

edit:

well, can't why works on windows (and unfortunately don't have windows box test on), if post dump of response (i.e. convert response object string) might give insight. presumably server chose not send gzip encoding, given code pretty explicit headers, i'm not sure different.

it's worth mentioning headers explicitly specified gzip , deflate allowed (see accept_encoding). if remove header shouldn't have worry decompressing response in case.

Search This Blog

Babette

python - Why doesn't downloading text file work correctly? -

Comments

Post a Comment

Popular posts from this blog

node.js - Bad Request - node js ajax post -

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -