求大神指导,python抓取网页方面的问题

heaven_peien 2016-07-11 04:37:47
代码如下
import requests
import requests,time
from bs4 import BeautifulSoup
data={'remoteAddress ':'202.4.234.122 ','key':''}
header = {'Host':'www.yougetsignal.com',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0',
'Accept':'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language':'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
'Accept-Encoding':'gzip, deflate',
'Cookie':'__cfduid=d2730f9ed8a80d74bf7143e1089266fa41468041288; _ga=GA1.2.480518849.1468041290; _cb_ls=1; _cb=BaSz3bBpkT45BIGTXc; _chartbeat2=.1468041293940.1468218281952.101; __atuvc=1%7C28',
'X-Forwarded-For':'8.8.8.8',
'Connection':'keep-alive'

}
headers = {'Host': 'domains.yougetsignal.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0',
'Accept': 'text/javascript, text/html, application/xml, text/xml, */*',
'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
'Accept-Encoding': 'gzip, deflate',
'X-Requested-With': 'XMLHttpRequest',
'X-Prototype-Version': '1.6.0',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Referer': 'http://www.yougetsignal.com/tools/web-sites-on-web-server/',
'Origin': 'http://www.yougetsignal.com',
'X-Forwarded-For': '8.8.8.8',
'Connection':'keep-alive'

}
s=requests.Session()
s.get('http://www.yougetsignal.com/tools/web-sites-on-web-server/',headers=header)
c=s.post('http://domains.yougetsignal.com/domains.php',data=data,headers=headers)
print c.content
在火狐浏览器上操作
原网址http://www.yougetsignal.com/tools/web-sites-on-web-server/
输入ip,会显示同ip的域名列表,但是右键查看源代码看不到域名列表,在火狐上选定页面上的域名列表后查看元素可以看到被隐藏的这些节点
用httpanalysis看到的返回包网址是http://domains.yougetsignal.com/domains.php
请问怎么用python获取到这些返回的信息,忘各位大神不吝赐教,日后必有重谢。
...全文
159 4 打赏 收藏 转发到动态 举报
写回复
用AI写文章
4 条回复
切换为时间正序
请发表友善的回复…
发表回复
bwlab 2016-07-18
  • 打赏
  • 举报
回复
我擦,竟然给限制了{"status":"Fail", "message":"Daily reverse IP check limit reached for 49.77.142.238. Please <a href='/about'>contact</a> me to remove this limit. Be sure to let me know how many queries you need per day."}
bwlab 2016-07-18
  • 打赏
  • 举报
回复

import http.cookiejar
import urllib.request
class getSignal():
    def __init__(self):
        self.cookieJar = http.cookiejar.LWPCookieJar() 
        opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(self.cookieJar))
        urllib.request.install_opener(opener)
    def get(self,ip):
        url='http://domains.yougetsignal.com/domains.php'
        data = {
            'remoteAddress': ip,
            'key': '',
        }  
        headers={
         'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
         'Referer':'http://www.yougetsignal.com/tools/web-sites-on-web-server/'
        }
        post_data = urllib.parse.urlencode(data).encode(encoding='UTF8')
        request = urllib.request.Request(url,data=post_data,headers=headers)
        request = urllib.request.urlopen(request)
        pageHtml = request.read().decode('UTF-8')
        request.close()
        print(pageHtml)
if __name__ == '__main__':
    s = getSignal();
    s.get('122.114.91.173')
bwlab 2016-07-18
  • 打赏
  • 举报
回复
谢谢分享,找到个好东西,比站长之家的好多了
coby002 2016-07-16
  • 打赏
  • 举报
回复
http://domains.yougetsignal.com/domains.php FormData={ 'remoteAddress':'8.8.8.8' 'key':'' '_':'' } 返回的 JSON

37,718

社区成员

发帖
与我相关
我的任务
社区描述
JavaScript,VBScript,AngleScript,ActionScript,Shell,Perl,Ruby,Lua,Tcl,Scala,MaxScript 等脚本语言交流。
社区管理员
  • 脚本语言(Perl/Python)社区
  • IT.BOB
加入社区
  • 近7日
  • 近30日
  • 至今

试试用AI创作助手写篇文章吧