简单问题，获取网页的HTML

liter156 2010-01-19 10:32:38

我先给一个

 HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);

                webRequest.Timeout = 1000 * 20;//这里加了个时间控制

                HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();

                Stream stream = webResponse.GetResponseStream();

                System.IO.StreamReader streamReader = new StreamReader(stream, System.Text.Encoding.GetEncoding(code));

                string content = streamReader.ReadToEnd();

                streamReader.Close();

                webResponse.Close();

大家谁还有其他的方法？？要求返回的是string不能直接打包下载到本地

上面的这个问题主要是网速比较慢时获取的HTML为空值，也不报错。。郁闷之极。。。加了Timeout也不行

解决马上结贴。。。。。。。。

...全文

127 12 打赏收藏转发到动态举报

写回复

用AI写文章

12 条回复

切换为时间正序

请发表友善的回复…

发表回复

以专业开发人员为伍 2010-01-19

打赏
举报

例如你可以输入

~/default.aspx?id=123456&name=abc

以及

~/app_data/default_123456_abc.html

以专业开发人员为伍 2010-01-19

打赏
举报

在自己的网站里把网页缓存成html，我贴过一个demo，再贴一下：

<%@ Page Language="C#" %>



<%@ Import Namespace="System.IO" %>



<script runat="server">

    protected void Button1_Click(object sender, EventArgs e)

    {

        StringWriter wr = new StringWriter();

        Server.Execute(this.TextBox1.Text, wr);     //你可以使用第三个参数传递页面的更多初始数据

        this.Label1.Text = Server.HtmlEncode(wr.ToString());

        File.WriteAllText(Server.MapPath(this.TextBox2.Text), wr.ToString());

    }

</script>



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head runat="server">

    <title>演示取得当前网站中其它网页的源代码只需要两行代码就够了</title>

</head>

<body>

    <form id="form1" runat="server" defaultbutton="Button1">

    <div>

        请输入本网站的页面名称：<asp:TextBox ID="TextBox1" runat="server"></asp:TextBox>

        <br />

        请输入目标文件名称<asp:TextBox ID="TextBox2" runat="server"></asp:TextBox>

        <br />

        <asp:Button ID="Button1" runat="server" Text="Button" OnClick="Button1_Click" /><hr />

        <asp:Label ID="Label1" runat="server" Text="Label"></asp:Label>

    </div>

    </form>

</body>

</html>

wuyq11 2010-01-19

打赏
举报

使用webbrower看看
System.Net.WebClient wc = new System.Net.WebClient();
string content = wc.DownloadString("");

System.Net.HttpWebRequest request = (System.Net.HttpWebRequest)System.Net.WebRequest.Create("");
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)";
System.Net.WebResponse response = request.GetResponse();
System.IO.Stream resStream = response.GetResponseStream();
System.IO.StreamReader sr = new System.IO.StreamReader(resStream, encoding);
string html = sr.ReadToEnd();
resStream.Close();
sr.Close();

mngzilin 2010-01-19

打赏
举报

[Quote=引用楼主 liter156 的回复:]
大家谁还有其他的方法？？要求返回的是string不能直接打包下载到本地
上面的这个问题主要是网速比较慢时获取的HTML为空值，也不报错。。郁闷之极。。。加了Timeout也不行
解决马上结贴。。。。。。。。
[/Quote]
网速比较慢，谁也救不了你。

webRequest.KeepAlive = true;去掉webRequest.Timeout = 1000 * 20;

liter156 2010-01-19

打赏
举报

谁能给分析下原因。。。
按理说下载不成功会提示的，什么也不提示

liter156 2010-01-19

打赏
举报

试了都不行。。。
郁闷

ck11926375 2010-01-19

打赏
举报



        private string GetHtml(string url, string charSet)

        {

            WebClient myWebClient = new WebClient(); 

            myWebClient.Credentials = CredentialCache.DefaultCredentials;            

            byte[] myDataBuffer = myWebClient.DownloadData(url);

            string strWebData = Encoding.Default.GetString(myDataBuffer);

            Match charSetMatch = Regex.Match(strWebData, "<meta([^<]*)charset=([^<]*)\"", RegexOptions.IgnoreCase | RegexOptions.Multiline);

            string webCharSet = charSetMatch.Groups[2].Value;

            if (charSet == null || charSet == "")

                charSet = webCharSet;



            if (charSet != null && charSet != "" && Encoding.GetEncoding(charSet) != Encoding.Default)

                strWebData = Encoding.GetEncoding(charSet).GetString(myDataBuffer);

            return strWebData;

        }

zhulong1111 2010-01-19

打赏
举报

mark

段传涛 2010-01-19

打赏
举报

我用mshtml 和dom 。帮你顶

cwblaze 2010-01-19

打赏
举报

wiki14 2010-01-19

打赏
举报



        /// <summary>

        /// 获取网页的HTML代码

         /// </summary>

        /// <param name="url">Url地址(http://www.baidu.com/)</param>

        /// <param name="Prog">进度条</param>

        /// <returns></returns>

        public static string GetUrlData(string url,ProgressBar Prog)

        {

            HttpWebResponse res = null;

            HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(url);

                res = (HttpWebResponse)req.GetResponse();

            long  totalBytes = res.ContentLength;

            Stream input = null;

            input = res.GetResponseStream();

            int totalDownloadedByte = 0;

            byte[] by = new byte[10245];

            string Content=null; 

            System.Text.Encoding encoder = System.Text.Encoding.GetEncoding("GB2312");

            do

            {

                totalDownloadedByte = input.Read(by, 0, (int)by.Length); 

                Content+=encoder.GetString(by, 0, totalDownloadedByte);

                if (Prog.Value + totalDownloadedByte <= Prog.Maximum)

                {

                    Prog.Value += totalDownloadedByte;

                    Application.DoEvents();

                }

                else

                {

                    Prog.Value = Prog.Maximum;

                }               

                   

            }

            while (totalDownloadedByte != 0);

            

            res.Close();

            Prog.Value =0 ;

            return Content;

        }

lyboyc 2010-01-19

打赏
举报

input.Read(by, 0, (int)by.Length);
streamReader.ReadToEnd();
似乎是网速慢读不到数据就以为读完了。
你可以用ContentLength获得文件长度，然后一个个字符读，读不到继续，直到读这些长度的串，
不过你要设置不能断网了还一直循环执行下去。

爬虫搜索,简单的搜索引擎,java爬虫,搜索引擎例子,爬虫demo,java实现互联网内容抓取，搜索引擎大揭密.java爬虫程序。web搜索。爬虫程序。sigar搜索，定时搜索互联网内容信息。

获取网页html内容一、前言：转载地址今天写个简单的程序，根据指定的 URL 来抓取相应的网页内容，然后存入本地文件。这个程序会涉及到网络请求和文件操作等知识点，下面是实现代码：二、代码 package main import ( "fmt" "io/ioutil" "net/http" "os" ) func main() { //...