爬虫初学报错：UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: i

浪迹红尘只为伊人 2018-12-14 09:57:33

源码：
import urllib.request

url1 = 'http://www.mzitu.com'
response1 = urllib.request.urlopen(url1)
html1 = response1.read()
html1 = html1.decode("UTF-8")

print(html1)

错误：
Traceback (most recent call last):
File "D:/Python Practice/爬取妹子图.py", line 37, in <module>
html1 = html1.decode("UTF-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

奇怪的是，我将URL改为'http://www.baidu.com'，程序就不会报错，decode中的内容，单引号、双引号、大小写我都试过了，只要用www.mzitu.com就会报错，不知道为什么，看了该网站的编码，是utf-8没错的，

不知道是什么原因，求大神不吝赐教，小弟谢过了

...全文

7589 2 打赏收藏转发到动态举报

写回复

用AI写文章

2 条回复

切换为时间正序

请发表友善的回复…

发表回复

浪迹红尘只为伊人 2018-12-15

打赏
举报

已解决：
解决：
第一种方法：导入gzip解压方法
import urllib.request
from io import BytesIO
import gzip

url = 'http://www.mzitu.com'
response = urllib.request.urlopen(url)
html = response.read()
print(html)
buff = BytesIO(html)
f = gzip.GzipFile(fileobj=buff)
res = f.read().decode('utf-8')

print(res)

第二种方法:引用requests库
import requests

url = 'http://www.mzitu.com'

r = requests.get(url)

print(r.text)

还有一点没搞懂的是，url换成百度的url就不会报错正常运行，不需要做gzip的解码。