关于用python的requests模块返回编码乱码问题
import requests
import lxml
from lxml import etree
url = 'http://www.chinacoalchem.com/news.asp?id=60719'
loginurl = "http://www.chinacoalchem.com/loginchk.asp?action=pw"
postdata = {
'userid':'liwan123',
'password':'lishichao1',
'UserLogin':'True',
'Submit.x':'25',
'Submit.y':'14'
}
res1 = requests.post(loginurl, data=postdata)
res2 = requests.get(url, cookies=res1.cookies)
res2.encoding = 'gb2312'
res2 = res2.text
xmlcontent = etree.HTML(res2)
links = xmlcontent.xpath('//@href')
for link in links:
print link
代码如上,运行结果如下:
Traceback (most recent call last):
File "D:/python-work/demo.py", line 23, in <module>
print link
UnicodeEncodeError: 'gbk' codec can't encode character u'\xb2' in position 34: illegal multibyte sequence
images/css.css
Events/2016MeOH.pdf
index.asp
more.asp?lm=政策规划
more.asp?lm=公司动态
more.asp?lm=工程项目
more.asp?lm=技术进展
more.asp?lm=市场行情
more.asp?lm=甲醇
more.asp?lm=%C3%BA%D6%C6%CC%EC%C8%BB%C6%F8
more.asp?lm=煤制油
more.asp?lm=MTO/MTP
more.asp?lm=其他煤化工
yuekan.asp
http://www.shenhuagroup.com.cn
http://www.ykjt.cn/
http://www.ctdmto.com/
http://meeting.qianzhan.com/
http://www.wison.com
http://www.famens.com
Process finished with exit code 1
不知道该怎么该好,怎么设置都有问题,