可能是很难的问题, 如何判断一段字符串是否为UTF-8编码的?

bb2003 2005-07-13 04:53:09
请注意是UTF-8, 不是Unicode,
字符串中可能含有几种文字, 如: 英文+西欧语言+中文, 等等
有现成的API判断吗?
...全文
248 9 打赏 收藏 转发到动态 举报
写回复
用AI写文章
9 条回复
切换为时间正序
请发表友善的回复…
发表回复
crystal_heart 2005-07-29
  • 打赏
  • 举报
回复
真是一代不如一代
crystal_heart 2005-07-29
  • 打赏
  • 举报
回复
http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html

A composite approach to language/encoding detection


Shanjian Li (shanjian@netscape.com)
Katsuhiko Momoi (momoi@netscape.com)
Netscape Communications Corp.

[Note: This paper was originally presented at the 19th International Unicode Conference (San Jose). Since then the implementation has gone through a period of real world usage and we made many improvements along the way. A major change is that we now use positive sequences to detect single byte charsets, c.f. Sections 4.7 and 4.7.1. This paper was written when the universal charset detection code was not part of the Mozilla main source. (See Section 8). Since then, the code was checked into the tree. For more updated implementation, see our open source code at Mozilla Source Tree. - The authors. 2002-11-25.]

crystal_heart 2005-07-29
  • 打赏
  • 举报
回复
开玩笑
bb2003 2005-07-28
  • 打赏
  • 举报
回复
up
bb2003 2005-07-15
  • 打赏
  • 举报
回复
up
Kudeet 2005-07-13
  • 打赏
  • 举报
回复
http://community.csdn.net/Expert/FAQ/FAQ_Index.asp?id=191432

GZ
masterz 2005-07-13
  • 打赏
  • 举报
回复
use MultiByteToWideChar(CP_UTF8,MB_ERR_INVALID_CHARS,...);
如果没有出错,就认为这个字符串是UTF8编码
morning550 2005-07-13
  • 打赏
  • 举报
回复
int iswctype(wint_t c, wctype_t category);
The function returns nonzero if c is any character in the category category. The value of category must have been returned by an earlier successful call to wctype.

category 由 wctype(const char *property)函数返回!property中是UTF-8字库!

sky 2005-07-13
  • 打赏
  • 举报
回复
应该没有现在的API,不过,自己写一个也没多难吧。

不就是四个字节了嘛。

18,356

社区成员

发帖
与我相关
我的任务
社区描述
VC/MFC 网络编程
c++c语言开发语言 技术论坛(原bbs)
社区管理员
  • 网络编程
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧