含汉字的char* 转unicode

allthesame 2011-03-08 02:15:46

在unicode工程中，如何把一个含有汉字的char*转为unicode编码，然后从编辑框输入？
MultiByteToWideChar试过了没有汉字还行，有汉字会有乱码。。

...全文

462 17 打赏收藏转发到动态举报

写回复

用AI写文章

17 条回复

切换为时间正序

请发表友善的回复…

发表回复

赵4老师 2011-03-08

打赏
举报

[Quote=引用 16 楼 allthesame 的回复:]
引用 15 楼 year2002 的回复:
引用 14 楼 allthesame 的回复:

引用 13 楼 year2002 的回复:
这说明你的数据库传过来的时候不是UTF8，把第一个参数换成CP_ACP，CP_UTF7一个个试下，反正也不麻烦，应该总有一个行的

果然，用CP_ACP时无乱码，但数据库的设置确实是utf8啊

但是取回来以后char*里面指向的内容又不……
貌似有道理，但我在数据库编码设为gbk后，再插入一条记录（有汉字），然后运行。。。新记录中的汉字有乱码！难道驱动只能自动从utf8转换
[/Quote]
有可能“驱动只能自动从utf8转换”，也有可能某项注册表或某项配置让驱动从utf8转换

allthesame 2011-03-08

打赏
举报

[Quote=引用 15 楼 year2002 的回复:]
引用 14 楼 allthesame 的回复:

引用 13 楼 year2002 的回复:
这说明你的数据库传过来的时候不是UTF8，把第一个参数换成CP_ACP，CP_UTF7一个个试下，反正也不麻烦，应该总有一个行的

果然，用CP_ACP时无乱码，但数据库的设置确实是utf8啊

但是取回来以后char*里面指向的内容又不是utf8格式的，访问数据库的驱动程序把它转换……
[/Quote]

貌似有道理，但我在数据库编码设为gbk后，再插入一条记录（有汉字），然后运行。。。新记录中的汉字有乱码！难道驱动只能自动从utf8转换

实践是最好的学习 2011-03-08

打赏
举报

[Quote=引用 14 楼 allthesame 的回复:]

引用 13 楼 year2002 的回复:
这说明你的数据库传过来的时候不是UTF8，把第一个参数换成CP_ACP，CP_UTF7一个个试下，反正也不麻烦，应该总有一个行的

果然，用CP_ACP时无乱码，但数据库的设置确实是utf8啊
[/Quote]

但是取回来以后char*里面指向的内容又不是utf8格式的，访问数据库的驱动程序把它转换了

allthesame 2011-03-08

打赏
举报

[Quote=引用 13 楼 year2002 的回复:]
这说明你的数据库传过来的时候不是UTF8，把第一个参数换成CP_ACP，CP_UTF7一个个试下，反正也不麻烦，应该总有一个行的
[/Quote]

果然，用CP_ACP时无乱码，但数据库的设置确实是utf8啊

实践是最好的学习 2011-03-08

打赏
举报

这说明你的数据库传过来的时候不是UTF8，把第一个参数换成CP_ACP，CP_UTF7一个个试下，反正也不麻烦，应该总有一个行的

allthesame 2011-03-08

打赏
举报

您好，但是汉字转换有问题

[Quote=引用 10 楼 zhao4zhong1 的回复:]
既然数据库中汉字是UTF8，那么从数据库中读出的包含汉字的字符串就用MultiByteToWideChar从UTF8转为Unicode

MultiByteToWideChar
The MultiByteToWideChar function maps a character string to a wide-character (Unicode) string. The character……
[/Quote]

allthesame 2011-03-08

打赏
举报

高手指点啊。。。

赵4老师 2011-03-08

打赏
举报

既然数据库中汉字是UTF8，那么从数据库中读出的包含汉字的字符串就用MultiByteToWideChar从UTF8转为Unicode

MultiByteToWideChar
The MultiByteToWideChar function maps a character string to a wide-character (Unicode) string. The character string mapped by this function is not necessarily from a multibyte character set.

int MultiByteToWideChar(
UINT CodePage, // code page
DWORD dwFlags, // character-type options
LPCSTR lpMultiByteStr, // address of string to map
int cchMultiByte, // number of bytes in string
LPWSTR lpWideCharStr, // address of wide-character buffer
int cchWideChar // size of buffer
);

Parameters
CodePage
Specifies the code page to be used to perform the conversion. This parameter can be given the value of any code page that is installed or available in the system. You can also specify one of the following values: Value Meaning
CP_ACP ANSI code page
CP_MACCP Macintosh code page
CP_OEMCP OEM code page
CP_SYMBOL Symbol code page (42)
CP_THREAD_ACP The current thread's ANSI code page
CP_UTF7 Translate using UTF-7
CP_UTF8 Translate using UTF-8

allthesame 2011-03-08

打赏
举报

UTF8转UNICODE

void CConvertDlg::OnBnClickedButtonU8ToUnicode()
{
//UTF8 to Unicode
//由于中文直接复制过来会成乱码，编译器有时会报错，故采用16进制形式
char* szU8 = "abcd1234\xe4\xbd\xa0\xe6\x88\x91\xe4\xbb\x96\x00";
//预转换，得到所需空间的大小
int wcsLen = ::MultiByteToWideChar(CP_UTF8, NULL, szU8, strlen(szU8), NULL, 0);
//分配空间要给'\0'留个空间，MultiByteToWideChar不会给'\0'空间
wchar_t* wszString = new wchar_t[wcsLen + 1];
//转换
::MultiByteToWideChar(CP_UTF8, NULL, szU8, strlen(szU8), wszString, wcsLen);
//最后加上'\0'
wszString[wcsLen] = '\0';
//unicode版的MessageBox API
::MessageBoxW(GetSafeHwnd(), wszString, wszString, MB_OK);

//写文本同ansi to unicode
}

刚刚从网上看到的，我试了我的工程中也可以把char* szU8 = "abcd1234\xe4\xbd\xa0\xe6\x88\x91\xe4\xbb\x96"无乱码输出，但是写成其对应汉字“abcd1234你我他”则汉字有乱码

qq120848369 2011-03-08

打赏
举报

工程是Unicode，那么任何东西都是Unicode了，问题肯定出在数据库中，围观学习了。

allthesame 2011-03-08

打赏
举报

不好意思，刚引用点错了
[Quote=引用 1 楼 zhao4zhong1 的回复:]
L"汉字" ?
先确定你的汉字到底是UTF8、Unicode、GBK哪种编码
[/Quote]

allthesame 2011-03-08

打赏
举报

要确定汉字编码吗，从utf8的数据库中读出，那也就是utf8的了吧

[Quote=引用楼主 allthesame 的回复:]
在unicode工程中，如何把一个含有汉字的char*转为unicode编码，然后从编辑框输入？
MultiByteToWideChar试过了没有汉字还行，有汉字会有乱码。。
[/Quote]

allthesame 2011-03-08