区分英文和中文的问题

kivien 2014-05-08 11:57:17
下面代码想只把英文找出来,但是它把中文也找出来了【当然也过滤了一些中文】……


CString &originaltext;//这是引用传递进来的

if(""==originaltext)
return;
setlocale( LC_ALL, "" );//解决问题:_ASSERTE((unsigned)(c + 1) <= 256);程序中断
int len=originaltext.GetLength(),i=0;
for(i=0;i<len;i++){
if(isalpha(originaltext[i])==0)//当前不是字母,即不为a-z,A-Z
{
if((originaltext[i]=='-'||originaltext[i]=='\'')) //it's that's co-worker
{
if((i-1>0&&isalpha(originaltext[i-1]))&&(i+1<len&&isalpha(originaltext[i+1])))
end++;
}
else{
if(end!=-1) //end of word
{
word=originaltext.Mid(begin,end-begin+1);//word
word.MakeLower();
Insert_Pair=mapText.insert(pair<CString,stu>(word,stuTmp));//mapText存储单词
if(Insert_Pair.second==false){
miter=mapText.find(word);
if(miter!=mapText.end()){
miter->second.count++;
}
}
end=-1;
}
}
}else{ //当前为字母
if(end==-1){ //如果是第一个字母
begin=i;
end=i;
}else //如果不是第一个字母
end++;
}
}
...全文
188 8 打赏 收藏 转发到动态 举报
AI 作业
写回复
用AI写文章
8 条回复
切换为时间正序
请发表友善的回复…
发表回复
kivien 2014-05-08
  • 打赏
  • 举报
回复
吃饭了,希望回来有大神回复
kivien 2014-05-08
  • 打赏
  • 举报
回复
赵4老师 2014-05-08
  • 打赏
  • 举报
回复
使用iswalpha或 _ismbcalpha
kivien 2014-05-08
  • 打赏
  • 举报
回复
同一个人不能连续回复3次?
kivien 2014-05-08
  • 打赏
  • 举报
回复


	for(i=0;i<len;i++){
		if(originaltext[i]<255&&originaltext[i]>0){//扩充的ASCII字符范围为0-255,如是,处理一个字节
//处理英文
		}else{//<0,>255的是汉字,处理两个字节
//处理中文等文体的文字
			i++;
		}
	}
贴解决的代码
kivien 2014-05-08
  • 打赏
  • 举报
回复
一直在等一个人回复。 因为无满意结贴不返回分数,嘿嘿。
引用 4 楼 zhao4zhong1 的回复:
//GBK汉字内码范围(不包括A1xx~A9xx的标点符号英文字母特殊符号等) //区码 ,位码 //81-A0 ,40-7E 80-FE //AA-AF ,40-7E 80-A0 //B0-D6 ,40-7E 80-FE //D7 ,40-7E 80-F9 //D8-F7 ,40-7E 80-FE //F8-FE ,40-7E 80-A0 isalpha, iswalpha int isalpha( int c ); int iswalpha( wint_t c ); Each of these routines returns true if c is a particular representation of an alphabetic character. Routine Required Header Compatibility isalpha <ctype.h> ANSI, Win 95, Win NT iswalpha <ctype.h> or <wchar.h> ANSI, Win 95, Win NT For additional compatibility information, see Compatibility in the Introduction. Libraries LIBC.LIB Single thread static library, retail version LIBCMT.LIB Multithread static library, retail version MSVCRT.LIB Import library for MSVCRT.DLL, retail version Return Value isalpha returns a non-zero value if c is within the ranges A – Z or a – z. iswalpha returns a non-zero value only for wide characters for which iswupper or iswlower is true, that is, for any wide character that is one of an implementation-defined set for which none of iswcntrl, iswdigit, iswpunct, or iswspace is true. Each of these routines returns 0 if c does not satisfy the test condition. The result of the test condition for the isalpha function depends on the LC_CTYPE category setting of the current locale; see setlocale for more information. For iswalpha, the result of the test condition is independent of locale. Parameter c Integer to test Generic-Text Routine Mappings TCHAR.H Routine _UNICODE & _MBCS Not Defined _MBCS Defined _UNICODE Defined _istalpha isalpha _ismbcalpha iswalpha Character Classification Routines | Locale Routines | is, isw Function Overview
赵4老师 2014-05-08
  • 打赏
  • 举报
回复
//GBK汉字内码范围(不包括A1xx~A9xx的标点符号英文字母特殊符号等) //区码 ,位码 //81-A0 ,40-7E 80-FE //AA-AF ,40-7E 80-A0 //B0-D6 ,40-7E 80-FE //D7 ,40-7E 80-F9 //D8-F7 ,40-7E 80-FE //F8-FE ,40-7E 80-A0 isalpha, iswalpha int isalpha( int c ); int iswalpha( wint_t c ); Each of these routines returns true if c is a particular representation of an alphabetic character. Routine Required Header Compatibility isalpha <ctype.h> ANSI, Win 95, Win NT iswalpha <ctype.h> or <wchar.h> ANSI, Win 95, Win NT For additional compatibility information, see Compatibility in the Introduction. Libraries LIBC.LIB Single thread static library, retail version LIBCMT.LIB Multithread static library, retail version MSVCRT.LIB Import library for MSVCRT.DLL, retail version Return Value isalpha returns a non-zero value if c is within the ranges A – Z or a – z. iswalpha returns a non-zero value only for wide characters for which iswupper or iswlower is true, that is, for any wide character that is one of an implementation-defined set for which none of iswcntrl, iswdigit, iswpunct, or iswspace is true. Each of these routines returns 0 if c does not satisfy the test condition. The result of the test condition for the isalpha function depends on the LC_CTYPE category setting of the current locale; see setlocale for more information. For iswalpha, the result of the test condition is independent of locale. Parameter c Integer to test Generic-Text Routine Mappings TCHAR.H Routine _UNICODE & _MBCS Not Defined _MBCS Defined _UNICODE Defined _istalpha isalpha _ismbcalpha iswalpha Character Classification Routines | Locale Routines | is, isw Function Overview
kivien 2014-05-08
  • 打赏
  • 举报
回复
已解决

65,184

社区成员

发帖
与我相关
我的任务
社区描述
C++ 语言相关问题讨论,技术干货分享,前沿动态等
c++ 技术论坛(原bbs)
社区管理员
  • C++ 语言社区
  • encoderlee
  • paschen
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
  1. 请不要发布与C++技术无关的贴子
  2. 请不要发布与技术无关的招聘、广告的帖子
  3. 请尽可能的描述清楚你的问题,如果涉及到代码请尽可能的格式化一下

试试用AI创作助手写篇文章吧