论坛爬虫 已经设置好了header cookie可是还是获取不了内容

量化分析 2015-04-21 11:01:37
网址如下:http://forum.xitek.com/thread-1444940-1-1-1.html
色影无忌的一些内容。

用浏览器可以不用登陆就可以访问的内容,可是我用python模拟了header和cookie,用urllib2来获取返回内容,却一直无效,求解

class getSockPost():

def __init__(self):
self.url="http://forum.xitek.com/thread-1444940-1-1-1.html"
self.user_agent = 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)'
self.headers = {'User-Agent': self.user_agent, 'Host':'forum.xitek.com'}


def getContent(self):

cookie=cookielib.CookieJar()
cookie_support=urllib2.HTTPCookieProcessor(cookie)
opener=urllib2.build_opener(cookie_support)
urllib2.install_opener(opener)


print home
Req=urllib2.Request(self.home,headers=self.headers)
print "1"
Resp=urllib2.urlopen(Req)
print "2"
content=Resp.read().decode('gb2312','ignore')
print "done"
print content


if __name__=="__main__":
print "work"
obj=getSockPost()
obj.getContent()


返回内容是很长时间的等待。

Traceback (most recent call last):
File "C:/Kingsoft/Python/coding/getStockPostXitek.py", line 33, in <module>
obj.getContent()
File "C:/Kingsoft/Python/coding/getStockPostXitek.py", line 23, in getContent
Resp=urllib2.urlopen(Req)
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 404, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 422, in _open
'_open', req)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\Python27\lib\urllib2.py", line 1187, in do_open
r = h.getresponse(buffering=True)
File "C:\Python27\lib\httplib.py", line 1067, in getresponse
response.begin()
File "C:\Python27\lib\httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "C:\Python27\lib\httplib.py", line 365, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "C:\Python27\lib\socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
socket.error: [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

Process finished with exit code 1
...全文
251 2 打赏 收藏 转发到动态 举报
写回复
用AI写文章
2 条回复
切换为时间正序
请发表友善的回复…
发表回复
量化分析 2015-04-21
  • 打赏
  • 举报
回复
自己搞定了,结贴。 谢谢!
量化分析 2015-04-21
  • 打赏
  • 举报
回复
换成了cookie=cookielib.LWPCookieJar() 之后,结果又返回,不过返回的是 obj.getContent() File "C:/Kingsoft/Python/coding/getStockPostXitek.py", line 27, in getContent content=Resp.read() File "C:\Python27\lib\socket.py", line 351, in read data = self._sock.recv(rbufsize) File "C:\Python27\lib\httplib.py", line 543, in read return self._read_chunked(amt) File "C:\Python27\lib\httplib.py", line 603, in _read_chunked value.append(self._safe_read(amt)) File "C:\Python27\lib\httplib.py", line 660, in _safe_read raise IncompleteRead(''.join(s), amt) httplib.IncompleteRead: IncompleteRead(315 bytes read, 7877 more expected) 看来缓存区大小满了。。。请问怎么解决 ??

37,720

社区成员

发帖
与我相关
我的任务
社区描述
JavaScript,VBScript,AngleScript,ActionScript,Shell,Perl,Ruby,Lua,Tcl,Scala,MaxScript 等脚本语言交流。
社区管理员
  • 脚本语言(Perl/Python)社区
  • IT.BOB
加入社区
  • 近7日
  • 近30日
  • 至今

试试用AI创作助手写篇文章吧