php截取字符串的问题，，求大神指教

punny123 2012-11-06 01:08:02

有一段字符串length>300,现在我只想截取开头的150个，但是字符串中有包含<a href="www.baidu.com">百度链接</a>这样的内容，如果字符串150的位置刚好在这段内容的www这个位置，那截取的字符串就有问题，有什么办法可以避免这样的情况？？

...全文

2229 5 打赏收藏转发到动态举报

写回复

用AI写文章

5 条回复

切换为时间正序

请发表友善的回复…

发表回复

punny123 2012-11-08

打赏
举报

引用 3 楼 yiwusuo 的回复:

162100代码过滤函数中有对HTML各标签处理的方法，建议看看： //过滤内容 function filter2($text) { $text = trim($text); $text = stripslashes($text); $text = str_replace('[','[', $text); $text = ……

好强大~~！！！

goosman 2012-11-07

打赏
举报

<?php
/**
 * author: goosman
 * blog: http://blog.csdn.net/lgg201
 * mail: lgg860911@yahoo.com.cn
 */

$str	= '0123456789<a>012</a>0123456789';
function substr_remain_tag($s, $o, $l) {
	$is_match	= preg_match_all(<<<heredoc
;
#该正则表达式解析xml标签, 标签属性内部支持转义符"\", 支持对"\"自身和对应引号的转义
<(\w+)												#标签开始
	(?:												#属性列表
		\s+											#前置空格
		\w+											#属性名
		\s*											#属性名后的空白(为了兼容)
		=											#属性名值之间的等号
		\s*											#属性值前的空白(为了兼容)
		(?:											#属性值(引号处理)
			"										#双引号的情况
			(?:
				\\\\\\\\							#吃掉连续两个转义符(表示转义符自身)
				|
				\\\\"								#吃掉转义符接着一个引号(表示转义的引号)
				|
				[^"\\\\]*							#其他字符
			)*
			"
			|
			'										#单引号情况
			(?:
				\\\\\\\\							#吃掉连续两个转义符(表示转义符自身)
				|
				\\\\'								#吃掉转义符接着一个引号(表示转义的引号)
				|
				[^'\\\\]*							#其他字符
			)*
			'
		)
	)*
>
.*?													#标签内容
</(?1)>												#结束标签
;x
heredoc
, $s, $matches, PREG_OFFSET_CAPTURE, $o);
	if ( $is_match ) {
		foreach ( $matches[0] as $match ) {
			$o0	= $match[1];
			#标签左边界越过截取目标右边界, 退出
			if ( $o0 >= $o + $l ) break;
			$l0	= strlen($match[0]);
			#标签右边界在截取目标右边界内, 继续
			if ( $o0 + $l0 < $o + $l ) continue;

			#以下为标签跨边界处理
			$l	= $o0 + $l0 - $o;
			break;
		}
	}
	return substr($s, $o, $l);
}

echo $str . chr(10);
echo substr_remain_tag($str, 0, 20) . chr(10);

yiwusuo 2012-11-07

打赏
举报

162100代码过滤函数中有对HTML各标签处理的方法，建议看看： //过滤内容 function filter2($text) { $text = trim($text); $text = stripslashes($text); $text = str_replace('[','[', $text); $text = str_replace(']', ']', $text); $text = str_replace('|', '|', $text); $text = str_replace('\\', '\', $text); $text = preg_replace('/[\r\n]+/', '', $text); $text = preg_replace('/\s+/', ' ', $text); $text = preg_replace('/<\?.*\?>/sU', '', $text); $text = preg_replace('/<\!--.*-->/sU', '', $text); $text = preg_replace('/<\!DOCTYPE[^>]*>/i', '', $text); $text = preg_replace('/<li>/i', ' ·', $text); $text = preg_replace('/<li [^>]*>/i', ' ·', $text); $text = preg_replace('/<(dt|dd)>/i', ' ', $text); $text = preg_replace('/<(dt|dd) [^>]*>/i', ' ', $text); //可单个使用的标记处理 $text = preg_replace('/<\/(li|dt|dd)>/i', ' ', $text); $text = preg_replace('/<div>/i', '', $text); $text = preg_replace('/<div ([^>]*)>/i', '', $text); $text = preg_replace('/<\/div\s*>/i', '', $text); $text = preg_replace('//i', '', $text); $text = preg_replace('/]*)>/i', '', $text); $text = preg_replace('/<\/p\s*>/i', '', $text); $text = preg_replace('/<([^>]+)align\s*=[\s\"\']*(center|left|right)[\s\"\']*([^>]*)>/i', '<${1} style="text-align:${2}" ${3}>', $text); $text = preg_replace('/<\/?(html|head|meta|link|base|body|title|style|script|noscript|form|iframe|frame|frameset|noframes|\?xml)[^>]*>/i', '', $text); //处理视频播放器 $text = preg_replace('/<(object|param)([^>]*) (id|name)([^>]*)>/i', '<${1}${2} video${3}${4}>', $text); //如果是视频，放行id或name标签 $text = preg_replace('/<\/embed>/i', '', $text); //不得不加之，解决firefox下的bug $text = preg_replace('/<embed [^>]*>/i', '${0}</embed>', $text); //不得不加之，解决firefox下的bug while (preg_match('/(<[^>]+)(title|alt|lang|id|name|class|on\w+)\s*=\s*((\"[^\">]+\")|(\'[^\'>]+\')|[^\s>]+)([^>]*>)/i', $text, $mat)) { $text = str_replace($mat[0], $mat[1].' '.$mat[6], $text); unset($mat); } while (preg_match('/(<[^>]+)(window\.|javascript:|js:|about:|file:|document\.|vbs:|cookie)([^>]*)/i', $text, $mat)) { $text = str_replace($mat[0], $mat[1].' '.$mat[3], $text); unset($mat); } // //$text=preg_replace('/<(hr|br|nobr|img|\/img|input|area|isindex|param)([^>]*)>/i','[\1\2]',$text); $text = preg_replace('/<(\/?[a-z]+[^<>]*)>/i', '[$1]', $text); $text = preg_replace('/(\[br\]\s*) {10,}/i', '

', $text); /* while(preg_match('/\[([a-z]+)[^\]]*\][^\[\]]*\[\/\1\]/i',$text,$mat)) { $text=str_replace($mat[0],str_replace('>',']',str_replace('<','[',$mat[0])),$text); unset($mat); } */ while (preg_match('/(\[[^\]]*=\s*)(\"|\')([^=\2\]]+)\2([^\]]*\])/i', $text, $mat)) { $text=str_replace($mat[0], $mat[1].'|'.$mat[3].'|'.$mat[4], $text); unset($mat); } while (preg_match('/(\[[^\"\'\]]*)(\"|\')([^\]]*\])/i', $text, $mat)) { $text = str_replace($mat[0], $mat[1].$mat[3], $text); unset($mat); } $text = str_replace('<', '<', $text); $text = str_replace('>', '>', $text); $text = str_replace('\\"', '"', $text); $text = str_replace('"', '"', $text); $text = str_replace('\'', ''', $text); $text = str_replace('[', '<', $text); $text = str_replace(']', '>', $text); $text = str_replace('|', '"', $text); $text = preg_replace('/<(object|param)([^>]*) video(id|name)([^>]*)>/i', '<${1}${2} ${3}${4}>', $text); //转换视频中的id或name标签 return $text; }