区分英文和中文的问题

kivien 2014-05-08 11:57:17
下面代码想只把英文找出来,但是它把中文也找出来了【当然也过滤了一些中文】……


CString &originaltext;//这是引用传递进来的

if(""==originaltext)
return;
setlocale( LC_ALL, "" );//解决问题:_ASSERTE((unsigned)(c + 1) <= 256);程序中断
int len=originaltext.GetLength(),i=0;
for(i=0;i<len;i++){
if(isalpha(originaltext[i])==0)//当前不是字母,即不为a-z,A-Z
{
if((originaltext[i]=='-'||originaltext[i]=='\'')) //it's that's co-worker
{
if((i-1>0&&isalpha(originaltext[i-1]))&&(i+1<len&&isalpha(originaltext[i+1])))
end++;
}
else{
if(end!=-1) //end of word
{
word=originaltext.Mid(begin,end-begin+1);//word
word.MakeLower();
Insert_Pair=mapText.insert(pair<CString,stu>(word,stuTmp));//mapText存储单词
if(Insert_Pair.second==false){
miter=mapText.find(word);
if(miter!=mapText.end()){
miter->second.count++;
}
}
end=-1;
}
}
}else{ //当前为字母
if(end==-1){ //如果是第一个字母
begin=i;
end=i;
}else //如果不是第一个字母
end++;
}
}
...全文
89 点赞 收藏 8
写回复
8 条回复
kivien 2014年05月08日
吃饭了,希望回来有大神回复
回复 点赞
kivien 2014年05月08日
回复 点赞
赵4老师 2014年05月08日
使用iswalpha或 _ismbcalpha
回复 点赞
kivien 2014年05月08日
同一个人不能连续回复3次?
回复 点赞
kivien 2014年05月08日


	for(i=0;i<len;i++){
		if(originaltext[i]<255&&originaltext[i]>0){//扩充的ASCII字符范围为0-255,如是,处理一个字节
//处理英文
		}else{//<0,>255的是汉字,处理两个字节
//处理中文等文体的文字
			i++;
		}
	}
贴解决的代码
回复 点赞
kivien 2014年05月08日
一直在等一个人回复。 因为无满意结贴不返回分数,嘿嘿。
引用 4 楼 zhao4zhong1 的回复:
//GBK汉字内码范围(不包括A1xx~A9xx的标点符号英文字母特殊符号等) //区码 ,位码 //81-A0 ,40-7E 80-FE //AA-AF ,40-7E 80-A0 //B0-D6 ,40-7E 80-FE //D7 ,40-7E 80-F9 //D8-F7 ,40-7E 80-FE //F8-FE ,40-7E 80-A0 isalpha, iswalpha int isalpha( int c ); int iswalpha( wint_t c ); Each of these routines returns true if c is a particular representation of an alphabetic character. Routine Required Header Compatibility isalpha <ctype.h> ANSI, Win 95, Win NT iswalpha <ctype.h> or <wchar.h> ANSI, Win 95, Win NT For additional compatibility information, see Compatibility in the Introduction. Libraries LIBC.LIB Single thread static library, retail version LIBCMT.LIB Multithread static library, retail version MSVCRT.LIB Import library for MSVCRT.DLL, retail version Return Value isalpha returns a non-zero value if c is within the ranges A – Z or a – z. iswalpha returns a non-zero value only for wide characters for which iswupper or iswlower is true, that is, for any wide character that is one of an implementation-defined set for which none of iswcntrl, iswdigit, iswpunct, or iswspace is true. Each of these routines returns 0 if c does not satisfy the test condition. The result of the test condition for the isalpha function depends on the LC_CTYPE category setting of the current locale; see setlocale for more information. For iswalpha, the result of the test condition is independent of locale. Parameter c Integer to test Generic-Text Routine Mappings TCHAR.H Routine _UNICODE & _MBCS Not Defined _MBCS Defined _UNICODE Defined _istalpha isalpha _ismbcalpha iswalpha Character Classification Routines | Locale Routines | is, isw Function Overview
回复 点赞
赵4老师 2014年05月08日
//GBK汉字内码范围(不包括A1xx~A9xx的标点符号英文字母特殊符号等) //区码 ,位码 //81-A0 ,40-7E 80-FE //AA-AF ,40-7E 80-A0 //B0-D6 ,40-7E 80-FE //D7 ,40-7E 80-F9 //D8-F7 ,40-7E 80-FE //F8-FE ,40-7E 80-A0 isalpha, iswalpha int isalpha( int c ); int iswalpha( wint_t c ); Each of these routines returns true if c is a particular representation of an alphabetic character. Routine Required Header Compatibility isalpha <ctype.h> ANSI, Win 95, Win NT iswalpha <ctype.h> or <wchar.h> ANSI, Win 95, Win NT For additional compatibility information, see Compatibility in the Introduction. Libraries LIBC.LIB Single thread static library, retail version LIBCMT.LIB Multithread static library, retail version MSVCRT.LIB Import library for MSVCRT.DLL, retail version Return Value isalpha returns a non-zero value if c is within the ranges A – Z or a – z. iswalpha returns a non-zero value only for wide characters for which iswupper or iswlower is true, that is, for any wide character that is one of an implementation-defined set for which none of iswcntrl, iswdigit, iswpunct, or iswspace is true. Each of these routines returns 0 if c does not satisfy the test condition. The result of the test condition for the isalpha function depends on the LC_CTYPE category setting of the current locale; see setlocale for more information. For iswalpha, the result of the test condition is independent of locale. Parameter c Integer to test Generic-Text Routine Mappings TCHAR.H Routine _UNICODE & _MBCS Not Defined _MBCS Defined _UNICODE Defined _istalpha isalpha _ismbcalpha iswalpha Character Classification Routines | Locale Routines | is, isw Function Overview
回复 点赞
kivien 2014年05月08日
已解决
回复 点赞
发动态
发帖子
C++ 语言
创建于2007-09-28

3.1w+

社区成员

24.8w+

社区内容

C++ 语言相关问题讨论,技术干货分享
社区公告
暂无公告