WebBrowser登录淘宝后怎样直接用HttpWebRequest抓取数据?

rimland 2010-07-27 12:31:14
我先用WebBrowser登录淘宝,然后将其Cookie传给了HttpWebRequest请求需要登录才能查看的淘宝页面,按理说,应该不用再用HttpWebRequest模拟登录一次的,可是返回的仍然是淘宝的登录页面?
测试用同样的方法登录csdn,甚至163信箱查看邮件,都是可以的。为什么淘宝的不行?
有知道的吗?
...全文
2757 23 打赏 收藏 转发到动态 举报
写回复
用AI写文章
23 条回复
切换为时间正序
请发表友善的回复…
发表回复
java123object 2012-08-28
  • 打赏
  • 举报
回复
[Quote=引用 22 楼 的回复:]

webBrowser1.Document.Cookie;和InternetGetCookieEx(取HttpOnly的值) 一起使用就可以了啊。
[/Quote]

共享一下代码、
rimland 2011-08-22
  • 打赏
  • 举报
回复
webBrowser1.Document.Cookie;和InternetGetCookieEx(取HttpOnly的值) 一起使用就可以了啊。
zhouyusuo 2011-04-28
  • 打赏
  • 举报
回复
怎么解决的啊啊
zhegaozhouji 2011-03-06
  • 打赏
  • 举报
回复
这个问题灰常麻烦
nhua890515 2011-01-10
  • 打赏
  • 举报
回复
是怎么解决的哦 InternetGetCookieEx 的是什么值 我用这个方式弄 一直都是空的数据
rimland 2010-10-17
  • 打赏
  • 举报
回复
问题早解决了。是HttpOnly的Cookie没取到的问题,调用InternetGetCookieEx在IE7和IE8下可取得HttpOnly的Cookie. IE6还是取不到。
  • 打赏
  • 举报
回复
[Quote=引用 8 楼 whoami333 的回复:]
个人意见这个webbrowser并不好用。不如直接控制浏览器。
[/Quote]
无厘头了吧?!

lz的bug出现在httpwebrequest,而不是web browser。如果他懂得直接在嵌入web browser的自己的程序直接向网页修改其element的值并执行javascript,就根本用不着httpwebrequest了。lz的问题能说明web browser什么地方不好么?
vip__888 2010-09-04
  • 打赏
  • 举报
回复
查看抓包的信息 是否cookie信息抓完了
  • 打赏
  • 举报
回复
我想知道下“lzstat_uv”这个COOKIE是什么设置的,为什么我在我的网站里面并没有设置这个COOKIE变量,却发现有他?$_COOKIE['lzstat_uv'];
Applebananap 2010-09-02
  • 打赏
  • 举报
回复
有解吗?我也遇到同样的问题
rimland 2010-07-27
  • 打赏
  • 举报
回复
[Quote=引用 4 楼 xiaoqiang321 的回复:]
webbrowser和httprequest对象的cookie不一样,需要编写代码
把webbrowser的cookie赋给httprequest,
就能带上已登陆状态了。
[/Quote]
我是有带上的,我的代码如下
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://mai.taobao.com/home/seller_home.htm");
request.Method = "GET";
request.Headers.Clear();
request.Headers[HttpRequestHeader.Cookie] = webBrowser1.Document.Cookie;
request.ContentType = "application/x-www-form-urlencoded";
request.KeepAlive = true;
request.AllowAutoRedirect = true;
HttpWebResponse httpResponse = (HttpWebResponse)request.GetResponse();
using (System.IO.Stream dataStream = httpResponse.GetResponseStream())
{
using (System.IO.StreamReader sr = new System.IO.StreamReader(dataStream, Encoding.GetEncoding("gb2312")))
{
string responseData = sr.ReadToEnd();
sr.Close();
}
}
httpResponse.Close();
rimland 2010-07-27
  • 打赏
  • 举报
回复
上面位置搞错了。应该是这个:
WebBrowser里请求时的Headers:
GET /home/seller_home.htm HTTP/1.0
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-shockwave-flash, */*
Referer: http://member1.taobao.com/member/login.jhtml?ssl=false&redirectURL=http%3A%2F%2Fmai.taobao.com%2Fhome%2Fseller_home.htm
Accept-Language: zh-cn
Cookie: t=5ee67fa31f25a0ddd6054f630d23c34d; cna=kGfUBM2ryxkCAboaqLTFb8+f; ck1=; tg=0; _cc_=URm48syIZQ%3D%3D; nt=U%2BGCWk%2F78BYmkwgESBq%2Fw1N0sXuYgiiBZNvf%2BUaJzRfYdQE%3D; tracknick=partysover; ssllogin=; lzstat_uv=2753697379382927443|1642079@1862319; x=e%3D1%26p%3Dtdog%26s%3D0%26c%3D1; cookie2=a41d5af987dc6d609dd8a855994e6cab; _tb_token_=3433841575e85; uc1=lltime=1280202932&cookie14=UoMz1BV2XyWrPQ%3D%3D&existShop=true&cookie16=W5iHLLyFPlMGbLDwA%2BdvAGZqLg%3D%3D&sg=r11&_yb_=true&cookie21=VFC%2FuZ9ajCWYhIooqbMmIw%3D%3D&cookie15=Vq8l%2BKCLz3%2F65A%3D%3D&_msg_v=false&_rt_=1149573635&_msg_=0&_ypid_=1216482505894; v=0; _lang=zh_CN:GBK; _sv_=0; _nk_=partysover; _l_g_=Ug%3D%3D; _wwmsg_=0%2C0; lastgetwwmsg=MTI4MDIwNTYzNA%3D%3D; cookie1=UR2MdRCF%2FX2WLrgQpTz7VDlqLzRSp8IKepCXJGOPEK8%3D; cookie17=Vy0QmFlvLpI%3D
UA-CPU: x86
Connection: Keep-Alive
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; CIBA; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Pragma: no-cache
Host: mai.taobao.com

HTTP/1.1 200 OK
Date: Tue, 27 Jul 2010 04:40:34 GMT
Server: Apache
X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA date=200710221139)/Tomcat-5.5
Set-Cookie: tlut=UoMz1BV2XyWrPQ%3D%3D; Domain=.taobao.com; Path=/
Set-Cookie: uc1=lltime=1280202932&cookie14=UoMz1BV2XyWrPQ%3D%3D&existShop=true&cookie16=UtASsssmPlP%2Ff1IHDsDaPRu%2BPw%3D%3D&sg=r11&_yb_=true&cookie21=UIHiLt3xSi%2BtvZI3pouIbg%3D%3D&cookie15=VT5L2FSpMGV7TQ%3D%3D&_msg_v=false&_rt_=1149573635&_msg_=0&_ypid_=1216482505894; Domain=.taobao.com; Path=/
Content-Language: zh-CN
Vary: Accept-Encoding
Connection: close
Content-Type: text/html;charset=GBK

-----------------------------------------------
HttpWebRequest里请求时的Headers:
GET /home/seller_home.htm HTTP/1.1
Cookie: t=5ee67fa31f25a0ddd6054f630d23c34d; cna=kGfUBM2ryxkCAboaqLTFb8+f; ck1; tg=0; _cc_=URm48syIZQ%3D%3D; nt=U%2BGCWk%2F78BYmkwgESBq%2Fw1N0sXuYgiiBZNvf%2BUaJzRfYdQE%3D; tracknick=partysover; ssllogin; lzstat_uv=2753697379382927443|1642079@1862319; x=e%3D1%26p%3Dtdog%26s%3D0%26c%3D1; uc1=lltime=1280202932&cookie14=UoMz1BV2XyWrPw%3D%3D&existShop=true&cookie16=VT5L2FSpNgq6fDudInPRgavC%2BQ%3D%3D&sg=r11&_yb_=true&cookie21=V32FPkk%2FhoSsMDW4rod%2BNA%3D%3D&cookie15=VFC%2FuZ9ayeYq2g%3D%3D&_msg_v=false&_rt_=1149573635&_msg_=0&_ypid_=1216482505894; v=0; _lang=zh_CN:GBK; _sv_=0; _nk_=partysover; _l_g_=Ug%3D%3D; _wwmsg_=0%2C0; lastgetwwmsg=MTI4MDIwNTYzNA%3D%3D; tlut=UoMz1BV2XyWrPQ%3D%3D; lzstat_ss=818292203_0_1280234582_1642079; JSESSIONID=683AC8090B8C32DA9514AB8A65F825AF
Content-Type: application/x-www-form-urlencoded
Host: mai.taobao.com
Connection: Keep-Alive

HTTP/1.1 302 Moved Temporarily
Date: Tue, 27 Jul 2010 04:41:52 GMT
Server: Apache
X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA date=200710221139)/Tomcat-5.5
Set-Cookie: uc1=cookie14=UoMz1BV2XySt9g%3D%3D; Domain=.taobao.com; Path=/
Set-Cookie: _sv_=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: tlut=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: _l_g_=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: _wwmsg_=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: _lang=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: _nk_=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: v=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: lastgetwwmsg=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Location: https://login.taobao.com/member/login.jhtml?redirectURL=http%3A%2F%2Fmai.taobao.com%2Fhome%2Fseller_home.htm
Content-Language: zh-CN
Vary: Accept-Encoding
Content-Length: 0
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=GB2312
--------------------------------------------------
确实有很大的不同,能帮我看一下怎么改吗?
飞天赤狐 2010-07-27
  • 打赏
  • 举报
回复
webbrowser和httprequest对象的cookie不一样,需要编写代码
把webbrowser的cookie赋给httprequest,
就能带上已登陆状态了。
rimland 2010-07-27
  • 打赏
  • 举报
回复
WebBrowser里请求时的Headers:
GET /home/seller_home.htm HTTP/1.0
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-shockwave-flash, */*
Referer: http://member1.taobao.com/member/login.jhtml?ssl=false&redirectURL=http%3A%2F%2Fmai.taobao.com%2Fhome%2Fseller_home.htm
Accept-Language: zh-cn
Cookie: t=5ee67fa31f25a0ddd6054f630d23c34d; cna=kGfUBM2ryxkCAboaqLTFb8+f; ck1=; tg=0; _cc_=URm48syIZQ%3D%3D; nt=U%2BGCWk%2F78BYmkwgESBq%2Fw1N0sXuYgiiBZNvf%2BUaJzRfYdQE%3D; tracknick=partysover; ssllogin=; lzstat_uv=2753697379382927443|1642079@1862319; x=e%3D1%26p%3Dtdog%26s%3D0%26c%3D1; cookie2=a41d5af987dc6d609dd8a855994e6cab; _tb_token_=3433841575e85; uc1=lltime=1280202932&cookie14=UoMz1BV2XyWrPQ%3D%3D&existShop=true&cookie16=W5iHLLyFPlMGbLDwA%2BdvAGZqLg%3D%3D&sg=r11&_yb_=true&cookie21=VFC%2FuZ9ajCWYhIooqbMmIw%3D%3D&cookie15=Vq8l%2BKCLz3%2F65A%3D%3D&_msg_v=false&_rt_=1149573635&_msg_=0&_ypid_=1216482505894; v=0; _lang=zh_CN:GBK; _sv_=0; _nk_=partysover; _l_g_=Ug%3D%3D; _wwmsg_=0%2C0; lastgetwwmsg=MTI4MDIwNTYzNA%3D%3D; cookie1=UR2MdRCF%2FX2WLrgQpTz7VDlqLzRSp8IKepCXJGOPEK8%3D; cookie17=Vy0QmFlvLpI%3D
UA-CPU: x86
Connection: Keep-Alive
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; CIBA; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Pragma: no-cache
Host: mai.taobao.com

HTTP/1.1 200 OK
Date: Tue, 27 Jul 2010 04:40:34 GMT
Server: Apache
X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA date=200710221139)/Tomcat-5.5
Set-Cookie: tlut=UoMz1BV2XyWrPQ%3D%3D; Domain=.taobao.com; Path=/
Set-Cookie: uc1=lltime=1280202932&cookie14=UoMz1BV2XyWrPQ%3D%3D&existShop=true&cookie16=UtASsssmPlP%2Ff1IHDsDaPRu%2BPw%3D%3D&sg=r11&_yb_=true&cookie21=UIHiLt3xSi%2BtvZI3pouIbg%3D%3D&cookie15=VT5L2FSpMGV7TQ%3D%3D&_msg_v=false&_rt_=1149573635&_msg_=0&_ypid_=1216482505894; Domain=.taobao.com; Path=/
Content-Language: zh-CN
Vary: Accept-Encoding
Connection: close
Content-Type: text/html;charset=GBK


GET /home/seller_home.htm HTTP/1.1
Cookie: t=5ee67fa31f25a0ddd6054f630d23c34d; cna=kGfUBM2ryxkCAboaqLTFb8+f; ck1; tg=0; _cc_=URm48syIZQ%3D%3D; nt=U%2BGCWk%2F78BYmkwgESBq%2Fw1N0sXuYgiiBZNvf%2BUaJzRfYdQE%3D; tracknick=partysover; ssllogin; lzstat_uv=2753697379382927443|1642079@1862319; x=e%3D1%26p%3Dtdog%26s%3D0%26c%3D1; uc1=lltime=1280202932&cookie14=UoMz1BV2XyWrPw%3D%3D&existShop=true&cookie16=VT5L2FSpNgq6fDudInPRgavC%2BQ%3D%3D&sg=r11&_yb_=true&cookie21=V32FPkk%2FhoSsMDW4rod%2BNA%3D%3D&cookie15=VFC%2FuZ9ayeYq2g%3D%3D&_msg_v=false&_rt_=1149573635&_msg_=0&_ypid_=1216482505894; v=0; _lang=zh_CN:GBK; _sv_=0; _nk_=partysover; _l_g_=Ug%3D%3D; _wwmsg_=0%2C0; lastgetwwmsg=MTI4MDIwNTYzNA%3D%3D; tlut=UoMz1BV2XyWrPQ%3D%3D; lzstat_ss=818292203_0_1280234582_1642079; JSESSIONID=683AC8090B8C32DA9514AB8A65F825AF
Content-Type: application/x-www-form-urlencoded
Host: mai.taobao.com
Connection: Keep-Alive
-----------------------------------------------
HttpWebRequest里请求时的Headers:
HTTP/1.1 302 Moved Temporarily
Date: Tue, 27 Jul 2010 04:41:52 GMT
Server: Apache
X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA date=200710221139)/Tomcat-5.5
Set-Cookie: uc1=cookie14=UoMz1BV2XySt9g%3D%3D; Domain=.taobao.com; Path=/
Set-Cookie: _sv_=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: tlut=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: _l_g_=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: _wwmsg_=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: _lang=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: _nk_=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: v=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Set-Cookie: lastgetwwmsg=; Domain=.taobao.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Location: https://login.taobao.com/member/login.jhtml?redirectURL=http%3A%2F%2Fmai.taobao.com%2Fhome%2Fseller_home.htm
Content-Language: zh-CN
Vary: Accept-Encoding
Content-Length: 0
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=GB2312
--------------------------------------------------
确实有很大的不同,能帮我看一下怎么改吗?
rimland 2010-07-27
  • 打赏
  • 举报
回复
[Quote=引用 1 楼 xyunsh 的回复:]
抓包看看二者有什么不一样呗
[/Quote]
好的我看看,谢谢先
xyunsh 2010-07-27
  • 打赏
  • 举报
回复
抓包看看二者有什么不一样呗
rimland 2010-07-27
  • 打赏
  • 举报
回复
原因找到了,发现有四个是Cookie是HttpOnly。这个通过webBrowser1.Document.Cookie;取不到。
cols[7]
{cookie2=4ed1d8b372d33575c9c8f5827214659e}
Comment: ""
CommentUri: null
Discard: false
Domain: ".taobao.com"
Expired: false
Expires: {0001-1-1 0:00:00}
HttpOnly: true
Name: "cookie2"
Path: "/"
Port: ""
Secure: false
TimeStamp: {2010-7-27 15:41:22}
Value: "4ed1d8b372d33575c9c8f5827214659e"
Version: 0
cols[8]
{_tb_token_=49b53de17de3}
Comment: ""
CommentUri: null
Discard: false
Domain: ".taobao.com"
Expired: false
Expires: {0001-1-1 0:00:00}
HttpOnly: true
Name: "_tb_token_"
Path: "/"
Port: ""
Secure: false
TimeStamp: {2010-7-27 15:41:07}
Value: "49b53de17de3"
Version: 0
cols[16]
{cookie1=UR2MdRCF%2FX2WLrgQpTz7VDlqLzRSp8IKepCXJGOPEK8%3D}
Comment: ""
CommentUri: null
Discard: false
Domain: ".taobao.com"
Expired: false
Expires: {0001-1-1 0:00:00}
HttpOnly: true
Name: "cookie1"
Path: "/"
Port: ""
Secure: false
TimeStamp: {2010-7-27 15:41:22}
Value: "UR2MdRCF%2FX2WLrgQpTz7VDlqLzRSp8IKepCXJGOPEK8%3D"
Version: 0
cols[17]
{cookie17=Vy0QmFlvLpI%3D}
Comment: ""
CommentUri: null
Discard: false
Domain: ".taobao.com"
Expired: false
Expires: {0001-1-1 0:00:00}
HttpOnly: true
Name: "cookie17"
Path: "/"
Port: ""
Secure: false
TimeStamp: {2010-7-27 15:41:22}
Value: "Vy0QmFlvLpI%3D"
Version: 0
怎么才能取到这些Cookie呢?
rimland 2010-07-27
  • 打赏
  • 举报
回复
不要沉下去啊,求高手做答~~~~~~~~~~~~~~~~~~~~~~
rimland 2010-07-27
  • 打赏
  • 举报
回复
都不能解决问题啊,有没有能够解决问题的,回答一下啊?
whoami333 2010-07-27
  • 打赏
  • 举报
回复
个人意见这个webbrowser并不好用。不如直接控制浏览器。
加载更多回复(1)
开发初衷:为能演示更多的WEB组件,所以写该程序时更多在于考虑能使用到不同组件实现各个功能,所以对各组件的没能展现得较深入。另外由于是利用业余时间所以写得比较仓促,未能演示到利用SOCKET实现的HTTP协议,只利用HttpWebRequest和HttpWebResponse来代替了HTTP协议封装和解吸。 开发平台: VB.NET 2005 实现的功能: 1.显示选中区域代码:使用WebBrowser,为用户分析所选中的WEB对象相对应代码。协助定位代码分析。 2.显示当前对象信息:使用WebBrowser,分析当前的用户操作焦点所在的WEB对象信息。协助自动操作。 3.显示所有对象信息:使用WebBrowser,分析页面的所有WEB对象信息。 协助定位代码分析。 4.自动登陆:使用WebBrowser,结合..\LoginScript\Script.txt自定义脚本,对指定页面做相应的自动操作。协助自动操作网页。 5.执行脚本:使用WebBrowser,动态向页面嵌入自定义的WEB脚本,对指定页面做相应的自动操作。协助自动操作网页。 6.数据抓取:使用WebBrowser和API,对页面的资源文件做分析,下载到本地并本地化该HTML代码。再利用API重IE缓冲区拷贝对应资源文件到指定目录。 7.页面变化监控:使用HttpWebRequest和HttpWebResponse,对指定页面做周期性的变化监控,并在页面变化时通知用户更新数据。 8.网络蜘蛛:使用WebClient,从一个入口页面开始获取和保存其HTML代码,以广度优先的分析页面上的所有超联接并做爬行移动。 其他描述: 网页抓取数据后,数据将存到NetSpider\bin\SaveHtml\目录下,NetSpider\bin\SaveHtml\Src\存放相应的缓存数据。 自动登陆时,程序将在NetSpider\bin\LoginScript\Script.txt文件检索和监控视窗地址匹配的脚本操作并执行。这里附上http://my.51job.com/my/My_SignIn.php和http://www.baidu.com/两个地址用于演示。 在功能菜单启动网络蜘蛛后,数据分析线程将会启动,所解析到的数据将会写入到NetSpider\bin\Data\UrlDB.mdb的Microsoft Office Access数据库。

110,539

社区成员

发帖
与我相关
我的任务
社区描述
.NET技术 C#
社区管理员
  • C#
  • Web++
  • by_封爱
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告

让您成为最强悍的C#开发者

试试用AI创作助手写篇文章吧