求问正则表达式的处理方法

boyle0630 2008-05-15 11:54:54

现从数据表字段中取出如下数据:
花都HOLON(会)都
其中类似花的数据表示一个汉字,夹在&#与;的数字表示汉字十进制的机内码(区位码)
现要求将这些看不懂的数据转换成正常显示,如上面一串表示:花都HOLON(会)都,请问如何处理(将这些字串输入到GOOGLE搜索栏里点搜索后它能正常显示,百度不行)
先给50分,不够再加就是

...全文

187 18 打赏收藏转发到动态举报

写回复

用AI写文章

18 条回复

切换为时间正序

请发表友善的回复…

发表回复

boyle0630 2008-05-15

打赏
举报

我是楼主,正如一楼所用的方法,我能够将机内码提出来,可关键是要在对应的位置用对应的汉字去替代

avrilxu 2008-05-15

打赏
举报

http://www.lokcore.com/avrilxu/article.asp?id=8
这样匹配正则表达式来验证

boyle0630 2008-05-15

打赏
举报

取是可以取出来,关键是要在对应的位置用对应的汉字去替代

qq562342 2008-05-15

打赏
举报



                Regex r=new Regex(@"\&#(?<number>\d+);");

                MatchCollection mmm=r.Matches("花都HOLON(会)都");

                foreach(Match m in mmm)

                {

                    Console.Write(m.Groups["number"]);

                }

取花都HOLON(会)都中所有数字

如花 33457

boyle0630 2008-05-15

打赏
举报

如何将机内码转换成汉字的代码(它要求十六进制)
private string CodingToCharacter1(string coding)
{
string characters = "";
if (coding.Length % 4 != 0)//编码为16进制,必须为4的倍数。
{
throw new System.Exception("编码格式不正确");
}
for (int i = 0; i < coding.Length; i += 4) //每四位为一个汉字
{
byte[] bytes = new byte[2];
string lowCode = coding.Substring(i, 2); //取出低字节,并以16进制进制转换
bytes[0] = System.Convert.ToByte(lowCode, 16);
string highCode = coding.Substring(i + 2, 2); //取出高字节,并以16进制进行转换
bytes[1] = System.Convert.ToByte(highCode, 16);
string character = System.Text.Encoding.Unicode.GetString(bytes);
characters += character;
}
return characters;
}

boyle0630 2008-05-15

打赏
举报

谢了,结贴了

priwilliam 2008-05-15

打赏
举报

占座学习

HimeTale 2008-05-15

打赏
举报

//textBox3.Text是测试用的，添入不同的表达式验证是否可用

分组捕获之后再用正则委托。
处理字符串没有比正则表达式更快的了。

boyle0630 2008-05-15

打赏
举报

textBox3.Text是什么东东?

boyle0630 2008-05-15

打赏
举报

有点看不懂啊,有谁可以解释一下吗,11楼的代码

boyle0630 2008-05-15

打赏
举报

都不起没认真看就回了,我先看看,好像有点技术含量

boyle0630 2008-05-15

打赏
举报

这样只是去了一个循环,反正是一个一个内码处理,有没有循环都一样,想要更优化的

HimeTale 2008-05-15

打赏
举报

不好意思，上边的发错了

            strTest = Regex.Replace(strTest, @"&#(?<code>\d+);", delegate(Match m)

            {

                string coding = Convert.ToString( Convert.ToInt32(m.Groups["code"].Value), 16);

                if(coding.Length!=4)

                    throw new System.Exception("编码格式不正确");

                byte[] bytes = new byte[2];

                string lowCode = coding.Substring(0, 2);       

                bytes[0] = System.Convert.ToByte(lowCode, 16);

                string highCode = coding.Substring(2, 2);       

                bytes[1] = System.Convert.ToByte(highCode, 16);

                return System.Text.Encoding.Unicode.GetString(bytes);

            });

HimeTale 2008-05-15

打赏
举报



            strTest= Regex.Replace(strTest, textBox3.Text, delegate(Match m)

            {

                string coding = Convert.ToString( Convert.ToInt32(m.Groups["code"].Value), 16);



                string characters = "";

                if (coding.Length % 4 != 0)     

                {

                    throw new System.Exception("编码格式不正确");

                }

                for (int i = 0; i < coding.Length; i += 4)    

                {

                    byte[] bytes = new byte[2];

                    string lowCode = coding.Substring(i, 2);    

                    bytes[0] = System.Convert.ToByte(lowCode, 16);

                    string highCode = coding.Substring(i + 2, 2);   

                    bytes[1] = System.Convert.ToByte(highCode, 16);

                    string character = System.Text.Encoding.Unicode.GetString(bytes);

                    characters += character;

                }

                return characters;

            });

分够了，结帖就行。

boyle0630 2008-05-15

打赏
举报

不过效率不怎么高,有没有人能帮忙优化一下啊!

boyle0630 2008-05-15

打赏
举报

呵呵,自已解决了,累死了,贴一下代码吧:
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

namespace Regex1
{
class Program
{
static void Main()
{
string str = "花都HOLON(会)都";
try
{
str = CodingToCharacter(str);
Console.WriteLine("Result=" + str);
}
catch { Console.WriteLine("error"); }
Console.ReadKey();
}

static string CodingToCharacter(string strParam)
{
Regex r = new Regex(@"\&#(?<number>\d+);");
MatchCollection mmm = r.Matches(strParam);
for(int i=0;i<mmm.Count;i++)
{
string code = mmm[i].Groups["number"].ToString();
code = CodingToCharacter1(Convert.ToString(Convert.ToInt64(code), 16));// 把整数转换成16进制表示的字符串．
strParam = strParam.Replace(mmm[i].ToString(), code);
}
return strParam;
}

static string CodingToCharacter1(string coding)
{
string characters = "";
if (coding.Length % 4 != 0)//编码为16进制,必须为4的倍数。
{
throw new System.Exception("编码格式不正确");
}
for (int i = 0; i < coding.Length; i += 4) //每四位为一个汉字
{
byte[] bytes = new byte[2];
string lowCode = coding.Substring(i+2, 2); //取出低字节,并以16进制进制转换
bytes[0] = System.Convert.ToByte(lowCode, 16);
string highCode = coding.Substring(i,2); //取出高字节,并以16进制进行转换
bytes[1] = System.Convert.ToByte(highCode, 16);
string character = System.Text.Encoding.Unicode.GetString(bytes);
characters += character;
}
return characters;
}

}
}

boyle0630 2008-05-15