HOWTO 使用Python3 保存网页

wyjam 2012-03-02 10:31:32

方法一
import http.client
conn = http.client.HTTPConnection("okooo.com")
conn.request("GET", "/shuangseqiu/ssqzs")
r1=conn.getreason()
print(r1.status, r1.reason)
产生301,Moved Permanently (重定向问题）,普通网页可以保存下来，动态网页就有问题了。

方法二
想要实现如：打开ie，文件，另存为html网页。
这个也是我想使用python来实现的。
使用win32com调用ie来实现

import win32com.client
ie = win32com.client.Dispatch("InternetExplorer.Application")
ie.navigate("http://okooo.com/shuangseqiu/ssqzs")
print(ie.Document)
返回[object]
ie.Document.SaveAs("C:\\1.txt")
出现错误：AttributeError:(unknown).SaveAs

有那位大侠知道的，帮个忙。谢谢。

...全文

503 6 打赏收藏转发到动态举报

写回复

用AI写文章

6 条回复

切换为时间正序

请发表友善的回复…

发表回复

wyjam 2012-03-22

打赏
举报

难道本问题无解了？

wyjam 2012-03-12

打赏
举报

我要的就是要另存网页的内容。有什么办法取得。

yby4769250 2012-03-04

打赏
举报

[Quote=引用 3 楼 wyjam 的回复:]
谢谢楼上的回复。
是的，但使用http.client取得的数据与使用urllib.request取得的数据是一样的。也就是打开网页查看网页源代码所看到的内容，这个与网页另存下来看html的内容有很大的区别。
[/Quote]

那是肯定的，这样得到的只是网页上的静态源码，并不是像我们保存的mht那种格式的网页，mht是浏览器帮你把页面上的文字图片之类的东西全部下载完下来，然后再保存为mht格式，而http.client这个只是把代码下载下来，你要是保存为html，也只能看到文字，图像什么的丢失，因为图像需要再重新下载

wyjam 2012-03-04

打赏
举报

谢谢楼上的回复。
是的，但使用http.client取得的数据与使用urllib.request取得的数据是一样的。也就是打开网页查看网页源代码所看到的内容，这个与网页另存下来看html的内容有很大的区别。

wyjam 2012-03-02

打赏
举报

import urllib.request
indexurl="http://www.okooo.com/shuangseqiu/ssqzs"
page = urllib.request.urlopen(indexurl,timeout=40).read()
page = page.decode('cp936')
print(page)
取得的如果是静态页的是正确，如果是动态页就不行。

与页面显示的内容不同的。

yby4769250 2012-03-02

打赏
举报

用httpwatch工具观察，你的地址写得不对，get的路径有问题，你把地址换成下面这个就没有301重定向了http://www.okooo.com/shuangseqiu/ssqzs/



import http.client

conn = http.client.HTTPConnection("www.okooo.com")

conn.request("GET", "/shuangseqiu/ssqzs/")

r1=conn.getresponse()

print(r1.status, r1.reason)

data = r1.read()

print(data)