正则匹配第1个
..
标签

fuxqde 2011-01-20 02:42:30

要从如下html中取到<ul class="c_l14s_01" id="sh_news_gn"></ul>之间包括在内的内容。应该怎么写呢？高手相助，万分感谢。

HTML:


<div class="blk_07" id="blk_gnxw_01" style="height:305px;overflow:hidden;padding-top:2px;">

<ul class="c_l14s_01" id="sh_news_gn">

<li><a href=" http://news.sina.com.cn/c/2011-01-20/132821846800.shtml" target="_blank">北京基层公务员首次向外地生源外地高校放开</a></li> </ul>


<ul class="c_l14s_01">

...全文

650 16 打赏收藏转发到动态举报

写回复

用AI写文章

16 条回复

切换为时间正序

请发表友善的回复…

发表回复

rockyc 2011-08-03

打赏
举报

这是c# 的为什么不用。 public static List<string> GetHtmls(string start, string end, string html)
{
IList list = new List<string>();
try
{
string pattern = string.Format("{0}(?<g>(.|[\r\n])+?){1}", start, end);//匹配URL的模式,并分组
MatchCollection mc = Regex.Matches(html, pattern);//满足pattern的匹配集合
if (mc.Count != 0)
{
foreach (Match match in mc)
{
GroupCollection gc = match.Groups;
list.Add(gc["g"].Value);
}
}
}
catch { }
return null;
}

fuxqde 2011-01-20

打赏
举报

衷心感谢 q107770540的帮助问题都解决了。结贴。

阿水哥 2011-01-20

打赏
举报

实难看的懂！

TimZhuFaith 2011-01-20

打赏
举报

++[Quote=引用 10 楼 q107770540 的回复:]
C# code

void Main()
{
string html=@"<div class=""blkContainerSblk"">
<h1 id=""artibodyTitle"" pid=""1"" tid=""1""

did=""21836924"" fid=""1666"">人社部官员:事业单位人员不必因担心改革早退休</h1>
<div class=""……
[/Quote]

q107770540 2011-01-20

打赏
举报

如果内容不想要<p>之类的HTML标签
可修改为：
Console.WriteLine("内容: "+Regex.Replace(m.Groups["news"].Value,"<[^>]*>",""));

q107770540 2011-01-20

打赏
举报

此帖不再回复有问题请另开帖

q107770540 2011-01-20

打赏
举报



void Main()

{

   string html=@"<div class=""blkContainerSblk"">

<h1 id=""artibodyTitle"" pid=""1"" tid=""1""  



did=""21836924"" fid=""1666"">人社部官员:事业单位人员不必因担心改革早退休</h1>

<div class=""artInfo""><span id=""art_source""><a  



href=""http://www.sina.com.cn"">http://www.sina.com.cn</a></span>  <s



pan id=""pub_date"">2011年01月19日03:15</span>  <span  



id=""media_name""><a href=""http://epaper.jinghua.cn/html/2011-



01/19/content_624637.htm"" target=""_blank"">京华时报</a> <a  



href=""http://epaper.jinghua.cn/html/2011-01/19/content_624637.htm""  



target=""_blank""></a> </span>





</div>



<!-- 正文内容 begin -->

<!-- google_ad_section_start -->



<div class=""blkContainerSblkCon""  



id=""artibody"">



<!-- publish_helper name='原始正文'  



p_id='1' t_id='1' d_id='21836924' f_id='3' -->

<p>　　昨天，人社部中国人事科学研究院院长吴江接受本报记者专访时表示，此前启动



的5个事业单位改革全国试点，目前仍在试点阶段，并未停滞。</p>



<p>　　他也表示，事业单位改革在制度设计上会有合理安排，不论是北京还是其他地方



，事业单位工作人员没必要因此提前退休。</p>







<!-- publish_helper_end -->



</div>

</div>";

	foreach(Match m in  Regex.Matches(html,@"(?is)<h1\s*id=""artibodyTitle""[^>]*>(?<title>[^<]+).*?id=""pub_date"">(?<time>[^<]+)<.*?id=""media_name"">.*?>(?<from>[^<]+).*?name='原始正文'.*?>(?<news>.*?)<!-- publish_helper_end -->"))

	{

	  Console.WriteLine("标题：   "+m.Groups["title"].Value);

	  Console.WriteLine("时间:    "+m.Groups["time"].Value);

	  Console.WriteLine("来源:    "+m.Groups["from"].Value);

	  Console.WriteLine("内容:    "+m.Groups["news"].Value);

 

	}

}



/*

标题：   人社部官员:事业单位人员不必因担心改革早退休

时间:    2011年01月19日03:15

来源:    京华时报

内容:    

<p>　　昨天，人社部中国人事科学研究院院长吴江接受本报记者专访时表示，此前启动



的5个事业单位改革全国试点，目前仍在试点阶段，并未停滞。</p>



<p>　　他也表示，事业单位改革在制度设计上会有合理安排，不论是北京还是其他地方



，事业单位工作人员没必要因此提前退休。</p>







*/

TimZhuFaith 2011-01-20

打赏
举报

(?<=\>)\s*[^\<]+(?=\<)

fuxqde 2011-01-20

打赏
举报

[Quote=引用 7 楼 q107770540 的回复:]

获取标题：
(?is)(?<=<h1\s*id="artibodyTitle"[^>]*>)[^<]+
[/Quote]
那新闻来源：京华时报、内容：<p>　　昨天，人社部中国人事科学研究院院长吴江接.....
要怎么写呀能不能给一下？谢谢啦

q107770540 2011-01-20

打赏
举报

获取标题：
(?is)(?<=<h1\s*id="artibodyTitle"[^>]*>)[^<]+

fuxqde 2011-01-20

打赏
举报

[Quote=引用 1 楼 sq_zhuyi 的回复:]

string ulHtml = new Regex(@"(?is)<ul class=""c_l14s_01"" id=""sh_news_gn"">(.+)</ul>").Match(html).Groups[1].Value;
[/Quote]
我试了你的方法也可以了。

fuxqde 2011-01-20

打赏
举报

[Quote=引用 4 楼 q107770540 的回复:]

更正：
foreach(Match m in Regex.Matches(html,@"(?is)(?<=<ul class=""c_l14s_01"" id=""sh_news_gn"">).+?(?=</ul>)"))
{
Console.Writeline(m.Value);
}
[/Quote]
非常感谢你的方法很好，很强大。

不过还有个问题就是获取如下的HTMl的标题：人社部官员:事业单位人员不必因担心改革早退休、时间：2011年01月19日03:15、新闻来源：京华时报、内容：<p>　　昨天，人社部中国人事科学研究院院长吴江接.....要怎么写呀小弟初学正则啥都不知望给予指导。

<div class="blkContainerSblk">
<h1 id="artibodyTitle" pid="1" tid="1"

did="21836924" fid="1666">人社部官员:事业单位人员不必因担心改革早退休</h1>
<div class="artInfo"><span id="art_source"><a

href="http://www.sina.com.cn">http://www.sina.com.cn</a></span> <s

pan id="pub_date">2011年01月19日03:15</span> <span

id="media_name"><a href="http://epaper.jinghua.cn/html/2011-

01/19/content_624637.htm" target="_blank">京华时报</a> <a

href="http://epaper.jinghua.cn/html/2011-01/19/content_624637.htm"

target="_blank"></a> </span>

</div>




<div class="blkContainerSblkCon"

id="artibody">


<p>　　昨天，人社部中国人事科学研究院院长吴江接受本报记者专访时表示，此前启动

的5个事业单位改革全国试点，目前仍在试点阶段，并未停滞。</p>

<p>　　他也表示，事业单位改革在制度设计上会有合理安排，不论是北京还是其他地方

，事业单位工作人员没必要因此提前退休。</p>



</div>
</div>

q107770540 2011-01-20

打赏
举报

更正：
foreach(Match m in Regex.Matches(html,@"(?is)(?<=<ul class=""c_l14s_01"" id=""sh_news_gn"">).+?(?=</ul>)"))
{
Console.Writeline(m.Value);
}

q107770540 2011-01-20

打赏
举报

如果匹配多次使用
foreach(Match m in Regex.Matches(html,@"(?is)(?<=<ul class=""c_l14s_01"" id=""sh_news_gn"">).+?(?=</ul>)").Value)
{
Console.Writeline(m.Value);
}

q107770540 2011-01-20