关于request.urlopen(url).read()

Joe428 2018-02-01 09:16:52

import urllib.request

from lxml import etree



url = "http://sh.ganji.com/fang1/"

resp = urllib.request.urlopen(url).read()

mystr = resp.decode("utf-8")  # 解码

print(mystr)

selector = etree.HTML(resp)



titles = selector.xpath('//*[@class="f-list-item ershoufang-list"]/dl/dd/a/text()')

prices = selector.xpath('//*[@class="price"]/span[1]/text()')



print("titleslen:", len(titles))

print("priceslen:", len(prices))

print(dict(zip(titles,prices)))

之前发了个贴，问取到的内容每次都不一致，且不完整。（http://bbs.csdn.net/topics/392315938）
现在发现，其实request.urlopen(url).read()到的内容就是不完整的，麻烦各位大神看看，我是哪里写错了吗？或是什么地方没有设置好。
同样代码别人就取到完整的记录。

...全文

1057 3 打赏收藏转发到动态举报

写回复

用AI写文章

3 条回复

切换为时间正序

请发表友善的回复…

发表回复

虾米馅煎包 2018-02-04

打赏
举报

引用 2 楼 weixin_38500276 的回复:

谢谢大神~果然我换成requests就正常了。。。

大神受不起菜鸟一枚

Joe428 2018-02-03

打赏
举报

引用 1 楼 BFInWR 的回复:

大兄弟你又发一次于是我深刻研究了下你这个问题通过抓包发现一个问题就是我在headers信息中设置哪怕是和网页浏览器请求头一样的但是还是有一项不同 'Connection':'', 就是它（我前面用的requests请求，它默认的是'Connection':'keep-alive'连接）然而当我用request.urlopen时我抓包却发现 'Connection':'close',咦这就是所谓的坑吗 = =！然后一顿百度翻了源码发现当你用urllib的时候在headers中它默认就是无连接的改不了汗所以综上我觉得你还是不要用 urllib ，用requests吧！最后就是我也不能完全保证是这个原因诱发的罪魁祸首，不过我感觉多半就是了，

谢谢大神~果然我换成requests就正常了。。。

import requests
from lxml import etree

r = requests.get(url='http://sh.ganji.com/fang1/')    # 最基本的GET请求

selector = etree.HTML(r.text)
titles = selector.xpath('//*[@class="f-list-item ershoufang-list"]/dl/dd/a/text()')
prices = selector.xpath('//*[@class="price"]/span[1]/text()')
print("titleslen:", len(titles))
print("priceslen:", len(prices))
for i,j in zip(titles,prices):
    print(i, j)

虾米馅煎包 2018-02-01

打赏
举报

大兄弟你又发一次于是我深刻研究了下你这个问题
通过抓包发现一个问题就是我在headers信息中设置哪怕是和网页浏览器请求头一样的但是还是有一项不同
'Connection':'', 就是它（我前面用的requests请求，它默认的是'Connection':'keep-alive'连接）
然而当我用request.urlopen时我抓包却发现 'Connection':'close',咦这就是所谓的坑吗 = =！
然后一顿百度翻了源码发现当你用urllib的时候在headers中它默认就是无连接的改不了汗

所以综上我觉得你还是不要用 urllib ，用requests吧！
最后就是我也不能完全保证是这个原因诱发的罪魁祸首，不过我感觉多半就是了，