这个正则表达式不好写,各位帮帮忙。版主,你也过来看看!

oldsky 2003-04-28 10:32:43
我要处理的是word另存为html的文件,现在要对该htm文件进行减肥。我知道现在有很多这样的专门工具,但现在我必须要将这个减肥过程写到我的程序中去。
1.要将所有注视标签内容去掉。就是:<!--[……]-->如:< !- -[if gte mso 9]>…<![endif]-->这些标签和中间的内容去掉。
2.将文件中如<![if !vml]><img width=81 height=93
src="./text.files/image006.gif" v:shapes="_x0000_i1048"><![endif]>的内容替换成<img width=81 height=93 src="./text.files/image006.gif">
3.。。。
先问这2个好了!
...全文
82 12 打赏 收藏 转发到动态 举报
AI 作业
写回复
用AI写文章
12 条回复
切换为时间正序
请发表友善的回复…
发表回复
oldsky 2003-04-28
  • 打赏
  • 举报
回复
这是要求替换后的内容:
<html>
<head>
<meta name=Generator content="Microsoft Office HTML Filter 2.0">
<meta http-equiv=Content-Type content="text/html; charset=GB2312">
<meta name=Originator content="Microsoft Word 9">
<title>智利:</title>
<style>
<!--

p.MsoNormal, li.MsoNormal, div.MsoNormal
{
margin:0cm;
margin-bottom:.0001pt;
text-align:justify;
text-justify:inter-ideograph;
font-size:10.5pt;
font-family:"Times New Roman";}
-->
</style>
</head>

<body lang=ZH-CN style='text-justify-trim:punctuation'>

<div class=Section1 style='layout-grid:15.6pt'>

<p class=MsoNormal><span style='font-family:宋体;'>智利:</span></p>

<p class=MsoNormal><span lang=EN-US>A</span><span style='font-family:宋体;'>.</span><span
lang=EN-US><img width=84 height=69
src="./text.files/image002.jpg">           
B</span><span style='font-family:宋体;'>。</span><span lang=EN-US><img width=239 height=127
src="./text.files/image004.gif"></span></p>

<p class=MsoNormal><span lang=EN-US>C</span><span style='font-family:宋体;'>.</span><span
lang=EN-US><sub><img width=81 height=93
src="./text.files/image006.gif"></sub>            
D</span><span style='font-family:宋体;'>。</span><span lang=EN-US><img width=125 height=125
src="./text.files/image008.jpg"></span></p>
</div>
</body>
</html>
oldsky 2003-04-28
  • 打赏
  • 举报
回复
假设这是原文:
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=GB2312">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 9">
<meta name=Originator content="Microsoft Word 9">
<link rel=File-List href="./text.files/filelist.xml">
<link rel=Edit-Time-Data href="./text.files/editdata.mso">
<link rel=OLE-Object-Data href="./text.files/oledata.mso">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]-->
<title>智利:</title>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Author>Administrator</o:Author>
<o:LastAuthor>Administrator</o:LastAuthor>
<o:Revision>1</o:Revision>
<o:TotalTime>4</o:TotalTime>
<o:Created>2003-04-28T00:55:00Z</o:Created>
<o:LastSaved>2003-04-28T00:59:00Z</o:LastSaved>
<o:Pages>1</o:Pages>
<o:Company>iin</o:Company>
<o:Lines>1</o:Lines>
<o:Paragraphs>1</o:Paragraphs>
<o:Version>9.2812</o:Version>
</o:DocumentProperties>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:PunctuationKerning/>
<w:DrawingGridVerticalSpacing>7.8 磅</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>2</w:DisplayVerticalDrawingGridEvery>
<w:Compatibility>
<w:SpaceForUL/>
<w:BalanceSingleByteDoubleByteWidth/>
<w:DoNotLeaveBackslashAlone/>
<w:ULTrailSpace/>
<w:DoNotExpandShiftReturn/>
<w:AdjustLineHeightInTable/>
<w:UseFELayout/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]-->
<style>
<!--
/* Font Definitions */
@font-face
{font-family:宋体;
panose-1:2 1 6 0 3 1 1 1 1 1;
mso-font-alt:SimSun;
mso-font-charset:134;
mso-generic-font-family:auto;
mso-font-pitch:variable;
mso-font-signature:3 135135232 16 0 262145 0;}
@font-face
{font-family:"\@宋体";
panose-1:2 1 6 0 3 1 1 1 1 1;
mso-font-charset:134;
mso-generic-font-family:auto;
mso-font-pitch:variable;
mso-font-signature:3 135135232 16 0 262145 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-parent:"";
margin:0cm;
margin-bottom:.0001pt;
text-align:justify;
text-justify:inter-ideograph;
mso-pagination:none;
font-size:10.5pt;
mso-bidi-font-size:12.0pt;
font-family:"Times New Roman";
mso-fareast-font-family:宋体;
mso-font-kerning:1.0pt;}
/* Page Definitions */
@page
{mso-page-border-surround-header:no;
mso-page-border-surround-footer:no;}
@page Section1
{size:595.3pt 841.9pt;
margin:72.0pt 90.0pt 72.0pt 90.0pt;
mso-header-margin:42.55pt;
mso-footer-margin:49.6pt;
mso-paper-source:0;
layout-grid:15.6pt;}
div.Section1
{page:Section1;}
-->
</style>
</head>

<body lang=ZH-CN style='tab-interval:21.0pt;text-justify-trim:punctuation'>

<div class=Section1 style='layout-grid:15.6pt'>

<p class=MsoNormal><span style='font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>智利:</span></p>

<p class=MsoNormal><span lang=EN-US>A</span><span style='font-family:宋体;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>.</span><span
lang=EN-US><!--[if gte vml 1]><v:shapetype id="_x0000_t75" coordsize="21600,21600"
o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f"
stroked="f">
<v:stroke joinstyle="miter"/>
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0"/>
<v:f eqn="sum @0 1 0"/>
<v:f eqn="sum 0 0 @1"/>
<v:f eqn="prod @2 1 2"/>
<v:f eqn="prod @3 21600 pixelWidth"/>
<v:f eqn="prod @3 21600 pixelHeight"/>
<v:f eqn="sum @0 0 1"/>
<v:f eqn="prod @6 1 2"/>
<v:f eqn="prod @7 21600 pixelWidth"/>
<v:f eqn="sum @8 21600 0"/>
<v:f eqn="prod @7 21600 pixelHeight"/>
<v:f eqn="sum @10 21600 0"/>
</v:formulas>
<v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
<o:lock v:ext="edit" aspectratio="t"/>
</v:shapetype><v:shape id="_x0000_i1025" type="#_x0000_t75" style='width:63pt;
height:51.75pt'>
<v:imagedata src="./text.files/image001.jpg" o:title="0107"/>
</v:shape><![endif]--><![if !vml]><img width=84 height=69
src="./text.files/image002.jpg" v:shapes="_x0000_i1025"><![endif]><span
style="mso-spacerun:
yes">           
</span>B</span><span style='font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>。</span><span lang=EN-US><!--[if gte vml 1]><v:shape
id="_x0000_i1036" type="#_x0000_t75" style='width:179.25pt;height:95.25pt'
o:ole="">
<v:imagedata src="./text.files/image003.wmz" o:title=""/>
</v:shape><![endif]--><![if !vml]><img width=239 height=127
src="./text.files/image004.gif" v:shapes="_x0000_i1036"><![endif]><!--[if gte mso 9]><xml>
<o:OLEObject Type="Embed" ProgID="Excel.Chart.8" ShapeID="_x0000_i1036"
DrawAspect="Content" ObjectID="_1113025523">
</o:OLEObject>
</xml><![endif]--></span></p>

<p class=MsoNormal><span lang=EN-US>C</span><span style='font-family:宋体;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>.</span><span
lang=EN-US><sub><!--[if gte vml 1]><v:shape id="_x0000_i1048" type="#_x0000_t75"
style='width:60.75pt;height:69.75pt' o:ole="">
<v:imagedata src="./text.files/image005.wmz" o:title=""/>
</v:shape><![endif]--><![if !vml]><img width=81 height=93
src="./text.files/image006.gif" v:shapes="_x0000_i1048"><![endif]></sub><!--[if gte mso 9]><xml>
<o:OLEObject Type="Embed" ProgID="Equation.3" ShapeID="_x0000_i1048"
DrawAspect="Content" ObjectID="_1113025524">
</o:OLEObject>
</xml><![endif]--><span style="mso-spacerun:
yes">            
</span>D</span><span style='font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>。</span><span lang=EN-US><!--[if gte vml 1]><v:shape
id="_x0000_i1049" type="#_x0000_t75" style='width:93.75pt;height:93.75pt'
o:ole="">
<v:imagedata src="./text.files/image007.png" o:title=""/>
</v:shape><![endif]--><![if !vml]><img width=125 height=125
src="./text.files/image008.jpg" v:shapes="_x0000_i1049"><![endif]><!--[if gte mso 9]><xml>
<o:OLEObject Type="Embed" ProgID="Paint.Picture" ShapeID="_x0000_i1049"
DrawAspect="Content" ObjectID="_1113025525">
</o:OLEObject>
</xml><![endif]--></span></p>

</div>

</body>

</html>
oldsky 2003-04-28
  • 打赏
  • 举报
回复
to walkingpoison(walkingpoison) :
你有办法将所有<!--[if gte mso 9]>…<![endif]-->这样的完整部分都替换吗?
walkingpoison 2003-04-28
  • 打赏
  • 举报
回复
呵呵,人脑到底不是电脑。/<![^<]+>/匹配的是<!--[if gte mso 9]>这样的部分,但是不能匹配<!--[if gte mso 9]>…<![endif]-->这样的完整部分。
walkingpoison 2003-04-28
  • 打赏
  • 举报
回复
原来你要的是vbscript,那么当然不能用javascript的语法了。
re.Pattern = "/<![^<]+>/ig"
改为
re.Pattern = "<![^<]+>"

不过我提醒你,/<![^<]+>/的匹配有问题的。可能会将从你的第一个注释开始的地方一直到html结束的地方全都匹配进去。
oldsky 2003-04-28
  • 打赏
  • 举报
回复
各位,不对啊,我在vbscript是这样写的。
temp=我的读出html内容
Dim re
Set re = New RegExp
re.Pattern = "/<![^<]+>/ig"
re.Global = True
re.IgnoreCase = True
re.MultiLine = True
response.write temp
temp = re.Replace(temp,"")
response.write temp
fokker 2003-04-28
  • 打赏
  • 举报
回复
1.我以前写的:
/style=(['"])[^'"<>]*\1|<span[^<>]*>|<\/span>|<o:p>|<\/o:p>|<\?xml[^<>]*>/ig

hyee 2003-04-28
  • 打赏
  • 举报
回复
text=text.replace(/<\![^>]*>/g,"")
oldsky 2003-04-28
  • 打赏
  • 举报
回复
哦,刚才拷过来的时候没注意,因该是:
1.要将所有注视标签内容去掉。就是:<!--[……]-->如:<!--[if gte mso 9]>…<![endif]-->这些标签和中间的内容去掉。
2.将文件中如<![if !vml]><img width=81 height=93
src="./text.files/image006.gif" v:shapes="_x0000_i1048"><![endif]>的内容替换成<img width=81 height=93 src="./text.files/image006.gif">
walkingpoison 2003-04-28
  • 打赏
  • 举报
回复
另外/v:.*\"/ig的匹配太可怕了,只要后面有双引号,贪婪匹配就会把中间所有的内容匹配进去。要小心使用这样的匹配啊。
walkingpoison 2003-04-28
  • 打赏
  • 举报
回复
1./(<!--.*?-->)/ig
ie5.5以上支持。

2.看你的要求,根本就不需要正则表达式,直接替换就可以了。

另:楼上理解有误,楼主的注释是整个的< !- -[if gte mso 9]>…<![endif]-->
xuzuning 2003-04-28
  • 打赏
  • 举报
回复
1. /<![^<]+>/ig
这样的标签不是注释:< !- -[if gte mso 9]>
2. /v:.*\"/ig
先1后2,不要试图将他们组装成一个

87,997

社区成员

发帖
与我相关
我的任务
社区描述
Web 开发 JavaScript
社区管理员
  • JavaScript
  • 无·法
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧