关于C++模拟实现URLEncode，对中文的编码不正确，高手赐教

tiancanyue 2009-10-08 04:42:37

RT，要将char*类型实现UTF-8编码，问题是：
比如“中文”二字，有的编码为%E4%B8%AD%E6%96%87，有的编码为%D6%D0%CE%C4，为什么会有两种类型？
代码如下：
inline BYTE toHex(const BYTE &x)
{
return x > 9 ? x + 55: x + 48;
}

string urlEncoding( string &sIn )
{
cout << "size: " << sIn.size() << endl;
string sOut;
for( int ix = 0; ix < sIn.size(); ix++ )
{
BYTE buf[4];
memset( buf, 0, 4 );
if( isalnum( (BYTE)sIn[ix] ) )
{
buf[0] = sIn[ix];
}
else if ( isspace( (BYTE)sIn[ix] ) )
{
buf[0] = '+';
}
else
{
buf[0] = '%';
buf[1] = toHex( (BYTE)sIn[ix] >> 4 );
buf[2] = toHex( (BYTE)sIn[ix] % 16);
}
sOut += (char *)buf;
}
return sOut;
}

...全文

616 19 打赏收藏转发到动态举报

写回复

用AI写文章

19 条回复

切换为时间正序

请发表友善的回复…

发表回复

tiancanyue 2009-10-09

打赏
举报

问题解决，sprintf时大概这样：sprintf(tempbuff,"%%%X%X",((BYTE)tt.at(i)) >>4,((BYTE)tt.at(i)) %16);

多谢各位老大，多谢多谢

whg01 2009-10-09

打赏
举报

把ConvertGBKToUtf8得到的字符串用下面的代码处理：
int i;
const char *strUTF8 = (LPCTSTR)strGBK;
char* urlEncode = malloc(strlen(strUTF8)*3+1);
char* pTmp = urlEncode;
i = 0;
while (i<strlen(strUTF8))
{
sprintf(pTmp, "%%%02x", strUTF8[i]);
pTmp+=3;
}
pTmp[0] = 0x00;
就能得到"%E4%B8%AD%E6%96%87"了。

utf8编码的char数组，需要设置codepage，否则输出会是乱码。

tiancanyue 2009-10-09

打赏
举报

看到http://topic.csdn.net/u/20091008/21/511790bd-65a6-4d2d-b249-e906010a46d4.html这了有才、相同问题的帖子，正确的代码为：

void ConvertGBKToUtf8( CString& strGBK )

{

    int len=MultiByteToWideChar(CP_ACP, 0, (LPCTSTR)strGBK, -1, NULL,0);

    WCHAR * wszUtf8 = new WCHAR[len+1];

    memset(wszUtf8, 0, len * 2 + 2);

    MultiByteToWideChar(CP_ACP, 0, (LPCTSTR)strGBK, -1, wszUtf8, len);



    len = WideCharToMultiByte(CP_UTF8, 0, wszUtf8, -1, NULL, 0, NULL, NULL);

    char *szUtf8=new char[len + 1];

    memset(szUtf8, 0, len + 1);

    WideCharToMultiByte (CP_UTF8, 0, wszUtf8, -1, szUtf8, len, NULL,NULL);



    strGBK = szUtf8;

    delete[] szUtf8;

    delete[] wszUtf8;

}

这好像是GBK的转化啊？

最后想输出转化后的string，怎么输出？我试了几种办法，是乱码或者不正确，希望哪位老大给个答复

tiancanyue 2009-10-09

打赏
举报

我死的心都有了，TNND，就是不行

我的需要：
char *s="中文";

把它转化成 s="%E4%B8%AD%E6%96%87";

并且能把他strcat到其他char[]后面

多谢各位帮助，继续求解

tiancanyue 2009-10-08

打赏
举报

抱歉，我真是弄不出来，代码如下：
void main()
{
CChineseCodeLib ChineseCodelib;
string pOut;
char *pText="中文";
ChineseCodelib.GB2312ToUTF_8(pOut,pText,10);//int pLen)
//cout<<pOut<<endl;
char strUTF8[20]="";
pOut.copy(strUTF8,pOut.capacity());
char* urlEncode =(char*) malloc(strlen(strUTF8)*3+1);
char* pTmp = urlEncode;
//while (strUTF8)
for(int i=0;i<pOut.capacity();i++)
{
sprintf(pTmp, "%02x", *strUTF8);
pTmp+=3;
}
cout<<pTmp<<endl;
}
最后输出为：fffe4
其中ChineseCodelib.GB2312ToUTF_8(pOut,pText,10);代码如下：
void CChineseCodeLib::GB2312ToUTF_8(string& pOut,char *pText, int pLen)
{
char buf[4];
char* rst = new char[pLen + (pLen >> 2) + 2];

memset(buf,0,4);
memset(rst,0,pLen + (pLen >> 2) + 2);

int i = 0;
int j = 0;
while(i < pLen)
{
//如果是英文直接复制就可以
if( *(pText + i) >= 0)
{
rst[j++] = pText[i++];
}
else
{
WCHAR pbuffer;
Gb2312ToUnicode(&pbuffer,pText+i);

UnicodeToUTF_8(buf,&pbuffer);

unsigned short int tmp = 0;
tmp = rst[j] = buf[0];
tmp = rst[j+1] = buf[1];
tmp = rst[j+2] = buf[2];

j += 3;
i += 2;
}
}
rst[j] = '\0';

//返回结果
pOut = rst;
delete []rst;

return;
}

郁闷，多谢各位的帮助

whg01 2009-10-08

打赏
举报

VC下，先用MultiByteToWideChar把字符串转成unicode编码，然后再用WideCharToMultiByte把unicode编码转为utf-8编码。
http://blog.csdn.net/LJLOVELZ/archive/2008/02/03/2080462.aspx
http://www.vckbase.com/document/viewdoc/?id=1444

tiancanyue 2009-10-08

打赏
举报

还是搞不出来，郁闷。。。

tiancanyue 2009-10-08

打赏
举报

[Quote=引用 11 楼 whg01 的回复:]
你先要将字符串转为utf8编码。然后再用上面的代码处理。
[/Quote]
多谢，我试一下

whg01 2009-10-08

打赏
举报

你先要将字符串转为utf8编码。然后再用上面的代码处理。

whg01 2009-10-08

打赏
举报

假设转换为utf8编码的字符串为 char *strUTF8;
大致代码如下：
char* urlEncode = malloc(strlen(strUTF8)*3+1);
char* pTmp = urlEncode;
while (strUTF8)
{
sprintf(pTmp, "%02x", *strUTF8);
pTmp+=3;
}

tiancanyue 2009-10-08