Python爬取的文件数据大小和Content-Length不一致

jinlong2015 2016-06-13 08:43:31

代买



import urllib2



url='http://photogallery.sc.egov.usda.gov/netpub/server.np?original=2435&site=PhotoGallery&catalog=catalog&download'

user_agent = " Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0"

headers={'User-Agent':user_agent}

req = urllib2.Request(url,headers=headers)



response = urllib2.urlopen(req)

contentLength = dict(response.headers).get('content-length', 0)

print 'content',contentLength

html = response.read()

lenHtml = len(html)

结果：
contentLength=12605768
lenHtml = 12595135
化成写文件的方式，也是一样的问题

file_size_dl = 0

block_sz = 8192

fileName = 'D:\\123321.tif'

fd =open(fileName,'wb')

print 'content',contentLength

while True:

    buffer = response.read(block_sz)

    if not buffer:

        break;

    file_size_dl += len(buffer)

    fd.write(buffer)

fd.close()

求解决思路

...全文

857 1 打赏收藏转发到动态举报

写回复

用AI写文章

1 条回复

切换为时间正序

请发表友善的回复…

发表回复

屎克螂 2016-06-13

打赏
举报

可能是文件比较大的原因，你一次性读出来可能会有问题。 import urllib def cbk(a, b, c): per = 100.0 * a * b / c print '%.2f%%' % per url='http://photogallery.sc.egov.usda.gov/netpub/server.np?original=2435&site=PhotoGallery&catalog=catalog&download' urllib.urlretrieve(url, '1.tif', cbk)