C++ unicode 读取汉子

jiangwenbo737373 2013-05-30 03:16:16
怎么在VS2008 unicode编码环境下读取txt中的汉子,并且判断汉子是否是我们想要的,然后执行跳过,取出等操作
...全文
137 12 打赏 收藏 转发到动态 举报
写回复
用AI写文章
12 条回复
切换为时间正序
请发表友善的回复…
发表回复
朝耕暮耘 2013-05-31
  • 打赏
  • 举报
回复
学习了
jiangwenbo737373 2013-05-30
  • 打赏
  • 举报
回复
引用 9 楼 zhao4zhong1 的回复:
wcout.imbue(locale("chs"));
wifstream wfin1=...;
wifstream.imbue(locale("chs"));
wchar_t c;
wfin1.get(c);
if (!wfin1.eof()) {
 if (c!=L'正') {
  wcout<<c<<endl;
 }
}
谢谢
gootyking 2013-05-30
  • 打赏
  • 举报
回复
lz思汉子心切。。。
赵4老师 2013-05-30
  • 打赏
  • 举报
回复
wcout.imbue(locale("chs"));
wifstream wfin1=...;
wifstream.imbue(locale("chs"));
wchar_t c;
wfin1.get(c);
if (!wfin1.eof()) {
 if (c!=L'正') {
  wcout<<c<<endl;
 }
}
赵4老师 2013-05-30
  • 打赏
  • 举报
回复
仅供参考,不一定对:
wcout.imbue(locale("chs"));
wifstream wfin1=...;
wifstream.imbue(locale("chs"));
wchar_t c;
wfin1.get(c);
if (!wfin1.eof()) {
 if (c!=L' 正') {
  wout<<c<<endl;
 }
}
jiangwenbo737373 2013-05-30
  • 打赏
  • 举报
回复
引用 6 楼 zhao4zhong1 的回复:
为什么不用fgetwc呢?
while(!fin1.eof()) { TCHAR c; fin1.get(c); if (c != '正') { printf("%c",c); } } 我想判断字符如果不是 正 这个汉字 这么写行么? 结果完全没影响呢 还是全都打出来了 txt中第一句第一个字就是 正
赵4老师 2013-05-30
  • 打赏
  • 举报
回复
为什么不用fgetwc呢?
jiangwenbo737373 2013-05-30
  • 打赏
  • 举报
回复
引用 3 楼 zhao4zhong1 的回复:
查MSDN是Windows程序员必须掌握的技能之一。 ms-help://MS.VSCC.v90/MS.MSDNQTR.v90.chs/dv_vccrt/html/e868993f-738c-4920-b5e4-d8f2f41f933d.htm fopen, _wfopen Open a file. More secure versions of these functions are available; see fopen_s, _wfopen_s. FILE *fopen( const char *filename, const char *mode ); FILE *_wfopen( const wchar_t *filename, const wchar_t *mode ); Parameters filename Filename. mode Type of access permitted. Return Value Each of these functions returns a pointer to the open file. A null pointer value indicates an error. If filename or mode is NULL or an empty string, these functions trigger the invalid parameter handler, as described in Parameter Validation. If execution is allowed to continue, these functions return NULL and set errno to EINVAL. See _doserrno, errno, _sys_errlist, and _sys_nerr for more information on these, and other, error codes. Remarks More secure versions of these functions exist, see fopen_s, _wfopen_s. The fopen function opens the file specified by filename. _wfopen is a wide-character version of fopen; the arguments to _wfopen are wide-character strings. _wfopen and fopen behave identically otherwise. Simply using _wfopen has no effect on the coded character set used in the file stream. fopen will accept paths that are valid on the file system at the point of execution; UNC paths and paths involving mapped network drives are accepted by fopen as long as the system executing the code has access to the share or mapped network drive at the time of execution. Special care must be taken when constructing paths for fopen to avoid making assumptions about available drives, paths or network shares in the execution environment. Always check the return value to see if the pointer is NULL before performing any further operations on the file. If an error occurs, the global variableerrno is set and may be used to get specific error information. For further information, see errno. In Visual C++ 2005, fopen supports Unicode file streams. A flag specifying the desired encoding may be passed to fopen when opening a new file or overwriting an existing file, like this: fopen("newfile.txt", "rw, ccs=<encoding>"); Allowed values of the encoding include UNICODE, UTF-8, and UTF16-LE. If the file is already in existence and is opened for reading or appending, the Byte Order Mark (BOM) is used to determine the correct encoding. It is not necessary to specify the encoding with a flag. In fact, the flag will be ignored if it conflicts with the type of the file as indicated by the BOM. The flag is only used when no BOM is present or if the file is a new file. The following table summarizes the modes used in for various flags given to fopen and Byte Order Marks used in the file. Flag No BOM (or new file) BOM: UTF-8 BOM: UTF-16 UNICODE ANSI UTF-8 UTF-16LE UTF-8 UTF-8 UTF-8 UTF-16LE UTF-16LE UTF-16LE UTF-8 UTF-16LE If mode is "a, ccs=<encoding>", fopen will first try to open the file with both read and write access. If it succeeds, it will read the BOM to determine the encoding for this file; however, if it fails, it will use the default encoding for the file. In either case, fopen will then re-open the file with write-only access. (This applies to mode a only, not a+.) TCHAR.H routine _UNICODE & _MBCS not defined _MBCS defined _UNICODE defined _tfopen fopen fopen _wfopen The character string mode specifies the type of access requested for the file, as follows: "r" Opens for reading. If the file does not exist or cannot be found, the fopen call fails. "w" Opens an empty file for writing. If the given file exists, its contents are destroyed. "a" Opens for writing at the end of the file (appending) without removing the EOF marker before writing new data to the file; creates the file first if it doesn't exist. "r+" Opens for both reading and writing. (The file must exist.) "w+" Opens an empty file for both reading and writing. If the given file exists, its contents are destroyed. "a+" Opens for reading and appending; the appending operation includes the removal of the EOF marker before new data is written to the file and the EOF marker is restored after writing is complete; creates the file first if it doesn't exist. When a file is opened with the "a" or "a+" access type, all write operations occur at the end of the file. The file pointer can be repositioned using fseek or rewind, but is always moved back to the end of the file before any write operation is carried out. Thus, existing data cannot be overwritten. The "a" mode does not remove the EOF marker before appending to the file. After appending has occurred, the MS-DOS TYPE command only shows data up to the original EOF marker and not any data appended to the file. The "a+" mode does remove the EOF marker before appending to the file. After appending, the MS-DOS TYPE command shows all data in the file. The "a+" mode is required for appending to a stream file that is terminated with the CTRL+Z EOF marker. When the "r+", "w+", or "a+" access type is specified, both reading and writing are allowed (the file is said to be open for "update"). However, when you switch between reading and writing, there must be an intervening fflush, fsetpos, fseek, or rewind operation. The current position can be specified for the fsetpos or fseek operation, if desired. In addition to the above values, the following characters can be included in mode to specify the translation mode for newline characters: t Open in text (translated) mode. In this mode, CTRL+Z is interpreted as an end-of-file character on input. In files opened for reading/writing with "a+", fopen checks for a CTRL+Z at the end of the file and removes it, if possible. This is done because using fseek and ftell to move within a file that ends with a CTRL+Z, may cause fseek to behave improperly near the end of the file. Also, in text mode, carriage return–linefeed combinations are translated into single linefeeds on input, and linefeed characters are translated to carriage return–linefeed combinations on output. When a Unicode stream-I/O function operates in text mode (the default), the source or destination stream is assumed to be a sequence of multibyte characters. Therefore, the Unicode stream-input functions convert multibyte characters to wide characters (as if by a call to the mbtowc function). For the same reason, the Unicode stream-output functions convert wide characters to multibyte characters (as if by a call to the wctomb function). b Open in binary (untranslated) mode; translations involving carriage-return and linefeed characters are suppressed. If t or b is not given in mode, the default translation mode is defined by the global variable _fmode. If t or b is prefixed to the argument, the function fails and returns NULL. For more information about using text and binary modes in Unicode and multibyte stream-I/O, see Text and Binary Mode File I/O and Unicode Stream I/O in Text and Binary Modes. c Enable the commit flag for the associated filename so that the contents of the file buffer are written directly to disk if either fflush or _flushall is called. n Reset the commit flag for the associated filename to "no-commit." This is the default. It also overrides the global commit flag if you link your program with COMMODE.OBJ. The global commit flag default is "no-commit" unless you explicitly link your program with COMMODE.OBJ (see Link Options). N Specifies that the file is not inherited by child processes. S Specifies that caching is optimized for, but not restricted to, sequential access from disk. R Specifies that caching is optimized for, but not restricted to, random access from disk. T Specifies a file as temporary. If possible, it is not flushed to disk. D Specifies a file as temporary. It is deleted when the last file pointer is closed. ccs=ENCODING Specifies the coded character set to use (UTF-8, UTF-16LE, or UNICODE) for this file. Leave unspecified if you want ANSI encoding. This option is available in Visual C++ 2005 and later. Valid characters for the mode string used in fopen and _fdopen correspond to oflag arguments used in _open and _sopen, as follows. Characters in mode string Equivalent oflag value for _open/_sopen a _O_WRONLY | _O_APPEND (usually _O_WRONLY | _O_CREAT | _O_APPEND) a+ _O_RDWR | _O_APPEND (usually _O_RDWR | _O_APPEND | _O_CREAT ) r _O_RDONLY r+ _O_RDWR w _O_WRONLY (usually _O_WRONLY | _O_CREAT | _O_TRUNC) w+ _O_RDWR (usually _O_RDWR | _O_CREAT | _O_TRUNC) b _O_BINARY t _O_TEXT c None n None S _O_SEQUENTIAL R _O_RANDOM T _O_SHORTLIVED D _O_TEMPORARY ccs=UNICODE _O_WTEXT ccs=UTF-8 _O_UTF8 ccs=UTF-16LE _O_UTF16 If you are using rb mode, won't need to port your code, and expect to read a lot of the file and/or don't care about network performance, memory mapped Win32 files might also be an option. Requirements Function Required header fopen <stdio.h> _wfopen <stdio.h> or <wchar.h> For additional compatibility information, see Compatibility in the Introduction. The c, n, t,S,R,T andD mode options are Microsoft extensions for fopen and _fdopen and should not be used where ANSI portability is desired.
那判断的时候 用fgetc 怎么判断是一个汉字呢? fgetc应该是一个字节一个字节的读吧? 我就是想把打到txt中的ping后的延迟读出来
geekjack 2013-05-30
  • 打赏
  • 举报
回复
用套马杆 ------------------ 宽字符比较 http://blog.csdn.net/xiunai78/article/details/4096532
赵4老师 2013-05-30
  • 打赏
  • 举报
回复
查MSDN是Windows程序员必须掌握的技能之一。 ms-help://MS.VSCC.v90/MS.MSDNQTR.v90.chs/dv_vccrt/html/e868993f-738c-4920-b5e4-d8f2f41f933d.htm fopen, _wfopen Open a file. More secure versions of these functions are available; see fopen_s, _wfopen_s. FILE *fopen( const char *filename, const char *mode ); FILE *_wfopen( const wchar_t *filename, const wchar_t *mode ); Parameters filename Filename. mode Type of access permitted. Return Value Each of these functions returns a pointer to the open file. A null pointer value indicates an error. If filename or mode is NULL or an empty string, these functions trigger the invalid parameter handler, as described in Parameter Validation. If execution is allowed to continue, these functions return NULL and set errno to EINVAL. See _doserrno, errno, _sys_errlist, and _sys_nerr for more information on these, and other, error codes. Remarks More secure versions of these functions exist, see fopen_s, _wfopen_s. The fopen function opens the file specified by filename. _wfopen is a wide-character version of fopen; the arguments to _wfopen are wide-character strings. _wfopen and fopen behave identically otherwise. Simply using _wfopen has no effect on the coded character set used in the file stream. fopen will accept paths that are valid on the file system at the point of execution; UNC paths and paths involving mapped network drives are accepted by fopen as long as the system executing the code has access to the share or mapped network drive at the time of execution. Special care must be taken when constructing paths for fopen to avoid making assumptions about available drives, paths or network shares in the execution environment. Always check the return value to see if the pointer is NULL before performing any further operations on the file. If an error occurs, the global variableerrno is set and may be used to get specific error information. For further information, see errno. In Visual C++ 2005, fopen supports Unicode file streams. A flag specifying the desired encoding may be passed to fopen when opening a new file or overwriting an existing file, like this: fopen("newfile.txt", "rw, ccs=<encoding>"); Allowed values of the encoding include UNICODE, UTF-8, and UTF16-LE. If the file is already in existence and is opened for reading or appending, the Byte Order Mark (BOM) is used to determine the correct encoding. It is not necessary to specify the encoding with a flag. In fact, the flag will be ignored if it conflicts with the type of the file as indicated by the BOM. The flag is only used when no BOM is present or if the file is a new file. The following table summarizes the modes used in for various flags given to fopen and Byte Order Marks used in the file. Flag No BOM (or new file) BOM: UTF-8 BOM: UTF-16 UNICODE ANSI UTF-8 UTF-16LE UTF-8 UTF-8 UTF-8 UTF-16LE UTF-16LE UTF-16LE UTF-8 UTF-16LE If mode is "a, ccs=<encoding>", fopen will first try to open the file with both read and write access. If it succeeds, it will read the BOM to determine the encoding for this file; however, if it fails, it will use the default encoding for the file. In either case, fopen will then re-open the file with write-only access. (This applies to mode a only, not a+.) TCHAR.H routine _UNICODE & _MBCS not defined _MBCS defined _UNICODE defined _tfopen fopen fopen _wfopen The character string mode specifies the type of access requested for the file, as follows: "r" Opens for reading. If the file does not exist or cannot be found, the fopen call fails. "w" Opens an empty file for writing. If the given file exists, its contents are destroyed. "a" Opens for writing at the end of the file (appending) without removing the EOF marker before writing new data to the file; creates the file first if it doesn't exist. "r+" Opens for both reading and writing. (The file must exist.) "w+" Opens an empty file for both reading and writing. If the given file exists, its contents are destroyed. "a+" Opens for reading and appending; the appending operation includes the removal of the EOF marker before new data is written to the file and the EOF marker is restored after writing is complete; creates the file first if it doesn't exist. When a file is opened with the "a" or "a+" access type, all write operations occur at the end of the file. The file pointer can be repositioned using fseek or rewind, but is always moved back to the end of the file before any write operation is carried out. Thus, existing data cannot be overwritten. The "a" mode does not remove the EOF marker before appending to the file. After appending has occurred, the MS-DOS TYPE command only shows data up to the original EOF marker and not any data appended to the file. The "a+" mode does remove the EOF marker before appending to the file. After appending, the MS-DOS TYPE command shows all data in the file. The "a+" mode is required for appending to a stream file that is terminated with the CTRL+Z EOF marker. When the "r+", "w+", or "a+" access type is specified, both reading and writing are allowed (the file is said to be open for "update"). However, when you switch between reading and writing, there must be an intervening fflush, fsetpos, fseek, or rewind operation. The current position can be specified for the fsetpos or fseek operation, if desired. In addition to the above values, the following characters can be included in mode to specify the translation mode for newline characters: t Open in text (translated) mode. In this mode, CTRL+Z is interpreted as an end-of-file character on input. In files opened for reading/writing with "a+", fopen checks for a CTRL+Z at the end of the file and removes it, if possible. This is done because using fseek and ftell to move within a file that ends with a CTRL+Z, may cause fseek to behave improperly near the end of the file. Also, in text mode, carriage return–linefeed combinations are translated into single linefeeds on input, and linefeed characters are translated to carriage return–linefeed combinations on output. When a Unicode stream-I/O function operates in text mode (the default), the source or destination stream is assumed to be a sequence of multibyte characters. Therefore, the Unicode stream-input functions convert multibyte characters to wide characters (as if by a call to the mbtowc function). For the same reason, the Unicode stream-output functions convert wide characters to multibyte characters (as if by a call to the wctomb function). b Open in binary (untranslated) mode; translations involving carriage-return and linefeed characters are suppressed. If t or b is not given in mode, the default translation mode is defined by the global variable _fmode. If t or b is prefixed to the argument, the function fails and returns NULL. For more information about using text and binary modes in Unicode and multibyte stream-I/O, see Text and Binary Mode File I/O and Unicode Stream I/O in Text and Binary Modes. c Enable the commit flag for the associated filename so that the contents of the file buffer are written directly to disk if either fflush or _flushall is called. n Reset the commit flag for the associated filename to "no-commit." This is the default. It also overrides the global commit flag if you link your program with COMMODE.OBJ. The global commit flag default is "no-commit" unless you explicitly link your program with COMMODE.OBJ (see Link Options). N Specifies that the file is not inherited by child processes. S Specifies that caching is optimized for, but not restricted to, sequential access from disk. R Specifies that caching is optimized for, but not restricted to, random access from disk. T Specifies a file as temporary. If possible, it is not flushed to disk. D Specifies a file as temporary. It is deleted when the last file pointer is closed. ccs=ENCODING Specifies the coded character set to use (UTF-8, UTF-16LE, or UNICODE) for this file. Leave unspecified if you want ANSI encoding. This option is available in Visual C++ 2005 and later. Valid characters for the mode string used in fopen and _fdopen correspond to oflag arguments used in _open and _sopen, as follows. Characters in mode string Equivalent oflag value for _open/_sopen a _O_WRONLY | _O_APPEND (usually _O_WRONLY | _O_CREAT | _O_APPEND) a+ _O_RDWR | _O_APPEND (usually _O_RDWR | _O_APPEND | _O_CREAT ) r _O_RDONLY r+ _O_RDWR w _O_WRONLY (usually _O_WRONLY | _O_CREAT | _O_TRUNC) w+ _O_RDWR (usually _O_RDWR | _O_CREAT | _O_TRUNC) b _O_BINARY t _O_TEXT c None n None S _O_SEQUENTIAL R _O_RANDOM T _O_SHORTLIVED D _O_TEMPORARY ccs=UNICODE _O_WTEXT ccs=UTF-8 _O_UTF8 ccs=UTF-16LE _O_UTF16 If you are using rb mode, won't need to port your code, and expect to read a lot of the file and/or don't care about network performance, memory mapped Win32 files might also be an option. Requirements Function Required header fopen <stdio.h> _wfopen <stdio.h> or <wchar.h> For additional compatibility information, see Compatibility in the Introduction. The c, n, t,S,R,T andD mode options are Microsoft extensions for fopen and _fdopen and should not be used where ANSI portability is desired.
赵4老师 2013-05-30
  • 打赏
  • 举报
回复
推荐使用WinHex软件查看硬盘或文件或内存中的原始字节内容。 对电脑而言没有乱码,只有二进制字节;对人脑才有乱码。啊 GBK:0xB0 0xA1,Unicode:0x4A 0x55,UTF-8:0xE5 0x95 0x8A 注意BOM
healer_kx 2013-05-30
  • 打赏
  • 举报
回复
威武雄壮是吧?

64,682

社区成员

发帖
与我相关
我的任务
社区描述
C++ 语言相关问题讨论,技术干货分享,前沿动态等
c++ 技术论坛(原bbs)
社区管理员
  • C++ 语言社区
  • encoderlee
  • paschen
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
  1. 请不要发布与C++技术无关的贴子
  2. 请不要发布与技术无关的招聘、广告的帖子
  3. 请尽可能的描述清楚你的问题,如果涉及到代码请尽可能的格式化一下

试试用AI创作助手写篇文章吧