可能是很难的问题, 如何判断一段字符串是否为UTF-8编码的?

bb2003 2005-07-13 04:53:09
请注意是UTF-8, 不是Unicode,
字符串中可能含有几种文字, 如: 英文+西欧语言+中文, 等等
有现成的API判断吗?
...全文
225 点赞 收藏 9
写回复
9 条回复
切换为时间正序
当前发帖距今超过3年,不再开放新的回复
发表回复
crystal_heart 2005-07-29
真是一代不如一代
回复
crystal_heart 2005-07-29
http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html

A composite approach to language/encoding detection


Shanjian Li (shanjian@netscape.com)
Katsuhiko Momoi (momoi@netscape.com)
Netscape Communications Corp.

[Note: This paper was originally presented at the 19th International Unicode Conference (San Jose). Since then the implementation has gone through a period of real world usage and we made many improvements along the way. A major change is that we now use positive sequences to detect single byte charsets, c.f. Sections 4.7 and 4.7.1. This paper was written when the universal charset detection code was not part of the Mozilla main source. (See Section 8). Since then, the code was checked into the tree. For more updated implementation, see our open source code at Mozilla Source Tree. - The authors. 2002-11-25.]

回复
crystal_heart 2005-07-29
开玩笑
回复
bb2003 2005-07-28
up
回复
bb2003 2005-07-15
up
回复
Kudeet 2005-07-13
http://community.csdn.net/Expert/FAQ/FAQ_Index.asp?id=191432

GZ
回复
masterz 2005-07-13
use MultiByteToWideChar(CP_UTF8,MB_ERR_INVALID_CHARS,...);
如果没有出错,就认为这个字符串是UTF8编码
回复
morning550 2005-07-13
int iswctype(wint_t c, wctype_t category);
The function returns nonzero if c is any character in the category category. The value of category must have been returned by an earlier successful call to wctype.

category 由 wctype(const char *property)函数返回!property中是UTF-8字库!

回复
sky 2005-07-13
应该没有现在的API,不过,自己写一个也没多难吧。

不就是四个字节了嘛。
回复
发帖
网络编程
创建于2007-09-28

1.8w+

社区成员

VC/MFC 网络编程
申请成为版主
帖子事件
创建了帖子
2005-07-13 04:53
社区公告
暂无公告