VS2010环境下对UTF8编码的文本文件的读取和显示

光辉岁月Ivy 2015-01-06 04:09:00

由于该文本文件为UTF8，现需要读取该文件,并将内容显示在对话框上面，以方便查看。目前已经将该文件内容读取在内存中并且内容正常,但是显示在界面上有乱码。。。

...全文

1058 8 打赏收藏转发到动态举报

写回复

用AI写文章

8 条回复

切换为时间正序

请发表友善的回复…

发表回复

光辉岁月Ivy 2015-01-08

打赏
举报

@luciferisnotsatan,谢谢你的回答。。我读取文件用的MFC的CFile,不知道这个可以指定读取文件的编码不？等哈我试试fopen @zhao4zhong1..文件头的格式应该是能正确理解的...因为需要这个来判断文件的编码格式的...并且我读文件都是用的CFile的二进制模式,每次读取一个字符,但是我看了哈文件里面有些特殊的字符...这种方式都读取不到了

encoderlee 2015-01-08

打赏
举报

我的意思是把文件中的字节流读到一个字节数组中，然后用MultiByteToWideChar进行转换即可。读文件的时候不用管编码问题，把它的每一个字节读到字节数组中就行了，然后再进行转换。如果是TXT文件，打开文件后需要跳过头三个字节（这三个字节和内容无关，是标明内容是UTF8的标志）。从第四个字节开始读到char数组中，再调用MultiByteToWideChar进行转换

赵4老师 2015-01-07

打赏
举报

对电脑而言没有乱码，只有二进制字节；对人脑才有乱码。啊 GBK:0xB0 0xA1,Unicode-16 LE:0x4A 0x55,Unicode-16 BE:0x55 0x4A,UTF-8:0xE5 0x95 0x8A 推荐使用WinHex软件查看硬盘或文件或内存中的原始字节内容。

luciferisnotsatan 2015-01-07

打赏
举报

引用 4 楼 thtianhui123 的回复:

@zhao4zhong1,@CharlesSimonyi 首先谢谢两位的回答, 但是两位说的都是正确获取以UTF8编码的方式获取UTF8文件后进行的转换吧？但是我这边读取UTF8编码的文件都有问题了... 另外我参考的是http://www.oschina.net/code/snippet_222150_20567,这里面代码说的比较详细,但是我执行的时候还有读取的有乱码..只是乱码少了很多

fopen时，有没有用ccs指定编码为UTF-8？ fopen("newfile.txt", "rw, ccs=<encoding>");

光辉岁月Ivy 2015-01-07

打赏
举报

@zhao4zhong1,@CharlesSimonyi 首先谢谢两位的回答, 但是两位说的都是正确获取以UTF8编码的方式获取UTF8文件后进行的转换吧？但是我这边读取UTF8编码的文件都有问题了... 另外我参考的是http://www.oschina.net/code/snippet_222150_20567,这里面代码说的比较详细,但是我执行的时候还有读取的有乱码..只是乱码少了很多

encoderlee 2015-01-06

打赏
举报

如果你的工程是Unicode工程，可以转换为Unicode字符串再显示


CStringW UTF8ToUTF16(LPCSTR szUTF8)
{
	DWORD nWszLen = MultiByteToWideChar(CP_UTF8, NULL, szUTF8, -1, NULL, NULL);

	CStringW strUTF16;
	nWszLen = MultiByteToWideChar(CP_UTF8, NULL, szUTF8, -1, strUTF16.GetBuffer(nWszLen), nWszLen);
	strUTF16.ReleaseBuffer();

	return strUTF16;
}

赵4老师 2015-01-06

打赏
举报

再供参考： ms-help://MS.VSCC.v90/MS.MSDNQTR.v90.chs/dv_vccrt/html/e868993f-738c-4920-b5e4-d8f2f41f933d.htm Run-Time Library Reference fopen, _wfopen Example See Also Send Feedback Open a file. More secure versions of these functions are available; see fopen_s, _wfopen_s. FILE *fopen( const char *filename, const char *mode ); FILE *_wfopen( const wchar_t *filename, const wchar_t *mode ); Parameters filename Filename. mode Type of access permitted. Return Value Each of these functions returns a pointer to the open file. A null pointer value indicates an error. If filename or mode is NULL or an empty string, these functions trigger the invalid parameter handler, as described in Parameter Validation. If execution is allowed to continue, these functions return NULL and set errno to EINVAL. See _doserrno, errno, _sys_errlist, and _sys_nerr for more information on these, and other, error codes. Remarks More secure versions of these functions exist, see fopen_s, _wfopen_s. The fopen function opens the file specified by filename. _wfopen is a wide-character version of fopen; the arguments to _wfopen are wide-character strings. _wfopen and fopen behave identically otherwise. Simply using _wfopen has no effect on the coded character set used in the file stream. fopen will accept paths that are valid on the file system at the point of execution; UNC paths and paths involving mapped network drives are accepted by fopen as long as the system executing the code has access to the share or mapped network drive at the time of execution. Special care must be taken when constructing paths for fopen to avoid making assumptions about available drives, paths or network shares in the execution environment. Always check the return value to see if the pointer is NULL before performing any further operations on the file. If an error occurs, the global variableerrno is set and may be used to get specific error information. For further information, see errno. In Visual C++ 2005, fopen supports Unicode file streams. A flag specifying the desired encoding may be passed to fopen when opening a new file or overwriting an existing file, like this: fopen("newfile.txt", "rw, ccs=<encoding>"); Allowed values of the encoding include UNICODE, UTF-8, and UTF16-LE. If the file is already in existence and is opened for reading or appending, the Byte Order Mark (BOM) is used to determine the correct encoding. It is not necessary to specify the encoding with a flag. In fact, the flag will be ignored if it conflicts with the type of the file as indicated by the BOM. The flag is only used when no BOM is present or if the file is a new file. The following table summarizes the modes used in for various flags given to fopen and Byte Order Marks used in the file. Flag No BOM (or new file) BOM: UTF-8 BOM: UTF-16 UNICODE ANSI UTF-8 UTF-16LE UTF-8 UTF-8 UTF-8 UTF-16LE UTF-16LE UTF-16LE UTF-8 UTF-16LE If mode is "a, ccs=<encoding>", fopen will first try to open the file with both read and write access. If it succeeds, it will read the BOM to determine the encoding for this file; however, if it fails, it will use the default encoding for the file. In either case, fopen will then re-open the file with write-only access. (This applies to mode a only, not a+.) ……

赵4老师 2015-01-06

打赏
举报

仅供参考，尽管是VB6：

Private Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByRef lpMultiByteStr As Any, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByRef lpMultiByteStr As Any, ByVal cchMultiByte As Long, ByVal lpDefaultChar As Long, ByVal lpUsedDefaultChar As Long) As Long
'常用的代码页：
const cpUTF8   =65001
const cpGB2312 =  936
const cpGB18030=54936
const cpUTF7   =65000
Function MultiByteToUTF16(UTF8() As Byte, CodePage As Long) As String
    Dim bufSize As Long
    bufSize = MultiByteToWideChar(CodePage, 0&, UTF8(0), UBound(UTF8) + 1, 0, 0)
    MultiByteToUTF16 = Space(bufSize)
    MultiByteToWideChar CodePage, 0&, UTF8(0), UBound(UTF8) + 1, StrPtr(MultiByteToUTF16), bufSize
End Function

Function UTF16ToMultiByte(UTF16 As String, CodePage As Long) As Byte()
    Dim bufSize As Long
    Dim arr() As Byte
    bufSize = WideCharToMultiByte(CodePage, 0&, StrPtr(UTF16), Len(UTF16), 0, 0, 0, 0)
    ReDim arr(bufSize - 1)
    WideCharToMultiByte CodePage, 0&, StrPtr(UTF16), Len(UTF16), arr(0), bufSize, 0, 0
    UTF16ToMultiByte = arr
End Function

Private Sub Command1_Click()
    MsgBox MultiByteToUTF16(UTF16ToMultiByte("ab中,c", cpUTF8), cpUTF8)
End Sub