请问如何将txt文件中以utf16编码的文件读取出来？

Huai-yu 2016-11-15 09:01:37

对这块比较陌生，现在情况是txt文件中有部分字段是utf16编码的，导致直接打开文件会乱码，希望通过程序将其读取出来并正确显示，应该转成什么编码？gbk么？我百度了一些资料，主要是WidecharTomultibyte这个函数。希望各位能给点建议。

...全文

678 11 打赏收藏转发到动态举报

写回复

用AI写文章

11 条回复

切换为时间正序

请发表友善的回复…

发表回复

Huai-yu 2016-11-15

打赏
举报

引用 4 楼 ID870177103 的回复:

static const STRU8 M_BOM[] = {STRU8 (0XEF) ,STRU8 (0XBB) ,STRU8 (0XBF)} ; static const STRU16 M_BOM[] = {STRU16 (0XFEFF)} ; static const STRU32 M_BOM[] = {STRU32 (0X0000FEFF)} ; windows默认是utf16le

请问 windows默认是UTF16是什么意思？

ID870177103 2016-11-15

打赏
举报

static const STRU8 M_BOM[] = {STRU8 (0XEF) ,STRU8 (0XBB) ,STRU8 (0XBF)} ; static const STRU16 M_BOM[] = {STRU16 (0XFEFF)} ; static const STRU32 M_BOM[] = {STRU32 (0X0000FEFF)} ; windows默认是utf16le

ID870177103 2016-11-15

打赏
举报

/* 1字节(0X7F) 0xxxxxxx 2字节(0X7FF) 110xxxxx 10xxxxxx 3字节(0XFFFF) 1110xxxx 10xxxxxx 10xxxxxx 4字节(0X1FFFFF) 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 5字节(0X3FFFFFF) 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 6字节(0X7FFFFFFF) 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx utf16代理(D800-DBFF) 110110xx xxxxxxxx (DC00-DFFF) 110111xx xxxxxxxx utf32对应(0X1FFFF-0X10FFFF) 00000001 xxxxxxxx xxxxxxxx-00010000 xxxxxxxx xxxxxxxx */

Huai-yu 2016-11-15

打赏
举报

目前看到一篇比较有用的是这边博文 https://my.oschina.net/zhangzhihao/blog/70462 但是对其中的转换原理还不是很清楚，打算先copy过来看看成效

Huai-yu 2016-11-15

打赏
举报

PS：我用的是VS2013 Update5 所以不支持C++11关于Unicode字符转换的的功能,参考链接 https://msdn.microsoft.com/zh-cn/library/hh567368.aspx

赵4老师 2016-11-15

打赏
举报

引用 10 楼 w1373199 的回复:

[quote=引用 9 楼 zhao4zhong1 的回复:]

#include <stdio.h>
#include <locale.h>
unsigned char b[14] = {0x27,0x59,0xde,0x8f,0x2f,0x6e,0x00,0x30,0x00,0x30,0x00,0x30,0x7c,0x44};
int n,i;
wchar_t *p;
int main() {
    for (i=0;i<13;i++) if ('|'==b[i]) break;
    if (i>=13) {
        printf("format error!\n");
        return 1;
    }
    n=i;
    b[n]=0;b[n+1]=0;
    p=(wchar_t *)&b[0];
    setlocale(LC_ALL,"chs");
    printf("[%S]\n",p);
    return 0;
}
//[大连港　　　]
//

非常感谢，我想我知道怎么处理了[/quote] 这个世界上最大的差别和最远的距离都存在于“说”和“做”之间。

Huai-yu 2016-11-15

打赏
举报

引用 9 楼 zhao4zhong1 的回复:

#include <stdio.h>
#include <locale.h>
unsigned char b[14] = {0x27,0x59,0xde,0x8f,0x2f,0x6e,0x00,0x30,0x00,0x30,0x00,0x30,0x7c,0x44};
int n,i;
wchar_t *p;
int main() {
    for (i=0;i<13;i++) if ('|'==b[i]) break;
    if (i>=13) {
        printf("format error!\n");
        return 1;
    }
    n=i;
    b[n]=0;b[n+1]=0;
    p=(wchar_t *)&b[0];
    setlocale(LC_ALL,"chs");
    printf("[%S]\n",p);
    return 0;
}
//[大连港　　　]
//

非常感谢，我想我知道怎么处理了

赵4老师 2016-11-15

打赏
举报

#include <stdio.h>
#include <locale.h>
unsigned char b[14] = {0x27,0x59,0xde,0x8f,0x2f,0x6e,0x00,0x30,0x00,0x30,0x00,0x30,0x7c,0x44};
int n,i;
wchar_t *p;
int main() {
    for (i=0;i<13;i++) if ('|'==b[i]) break;
    if (i>=13) {
        printf("format error!\n");
        return 1;
    }
    n=i;
    b[n]=0;b[n+1]=0;
    p=(wchar_t *)&b[0];
    setlocale(LC_ALL,"chs");
    printf("[%S]\n",p);
    return 0;
}
//[大连港　　　]
//

Huai-yu 2016-11-15

打赏
举报

引用 6 楼 zhao4zhong1 的回复:

对电脑而言没有乱码，只有二进制字节；对人脑才有乱码。啊 GBK:0xB0 0xA1,Unicode-16 LE:0x4A 0x55,Unicode-16 BE:0x55 0x4A,UTF-8:0xE5 0x95 0x8A
仅供参考：

void HexDump(char *buf,int len,int addr) {

    int i,j,k;

    char binstr[80];



    for (i=0;i<len;i++) {

        if (0==(i%16)) {

            sprintf(binstr,"%08x -",i+addr);

            sprintf(binstr,"%s %02x",binstr,(unsigned char)buf[i]);

        } else if (15==(i%16)) {

            sprintf(binstr,"%s %02x",binstr,(unsigned char)buf[i]);

            sprintf(binstr,"%s  ",binstr);

            for (j=i-15;j<=i;j++) {

                sprintf(binstr,"%s%c",binstr,('!'<buf[j]&&buf[j]<='~')?buf[j]:'.');

            }

            printf("%s\n",binstr);

        } else {

            sprintf(binstr,"%s %02x",binstr,(unsigned char)buf[i]);

        }

    }

    if (0!=(i%16)) {

        k=16-(i%16);

        for (j=0;j<k;j++) {

            sprintf(binstr,"%s   ",binstr);

        }

        sprintf(binstr,"%s  ",binstr);

        k=16-k;

        for (j=i-k;j<i;j++) {

            sprintf(binstr,"%s%c",binstr,('!'<buf[j]&&buf[j]<='~')?buf[j]:'.');

        }

        printf("%s\n",binstr);

    }

}

对，你说是对的，只有人脑才对乱码，现在其实我自己倒没有乱码，如图，两个 7C（管道符 |）都是用的UTF16编码的，采用的是小端序，5927（大） 8fde(连） 6e2f (港）后面的 3000 是空格，现在纠结是就是找一个合适的方法读到这些字段，然后解析出来，对这块以前真是0接触，所以有困惑

赵4老师 2016-11-15

打赏
举报

In Visual C++ 2005, fopen supports Unicode file streams. A flag specifying the desired encoding may be passed to fopen when opening a new file or overwriting an existing file, like this: fopen("newfile.txt", "rw, ccs=<encoding>"); Allowed values of the encoding include UNICODE, UTF-8, and UTF16-LE. If the file is already in existence and is opened for reading or appending, the Byte Order Mark (BOM) is used to determine the correct encoding. It is not necessary to specify the encoding with a flag. In fact, the flag will be ignored if it conflicts with the type of the file as indicated by the BOM. The flag is only used when no BOM is present or if the file is a new file. The following table summarizes the modes used in for various flags given to fopen and Byte Order Marks used in the file. Encodings Used Based on Flag and BOM Flag No BOM (or new file) BOM: UTF-8 BOM: UTF-16 UNICODE ANSI UTF-8 UTF-16LE UTF-8 UTF-8 UTF-8 UTF-16LE UTF-16LE UTF-16LE UTF-8 UTF-16LE

赵4老师 2016-11-15

打赏
举报

对电脑而言没有乱码，只有二进制字节；对人脑才有乱码。啊 GBK:0xB0 0xA1,Unicode-16 LE:0x4A 0x55,Unicode-16 BE:0x55 0x4A,UTF-8:0xE5 0x95 0x8A 仅供参考：

void HexDump(char *buf,int len,int addr) {
    int i,j,k;
    char binstr[80];

    for (i=0;i<len;i++) {
        if (0==(i%16)) {
            sprintf(binstr,"%08x -",i+addr);
            sprintf(binstr,"%s %02x",binstr,(unsigned char)buf[i]);
        } else if (15==(i%16)) {
            sprintf(binstr,"%s %02x",binstr,(unsigned char)buf[i]);
            sprintf(binstr,"%s  ",binstr);
            for (j=i-15;j<=i;j++) {
                sprintf(binstr,"%s%c",binstr,('!'<buf[j]&&buf[j]<='~')?buf[j]:'.');
            }
            printf("%s\n",binstr);
        } else {
            sprintf(binstr,"%s %02x",binstr,(unsigned char)buf[i]);
        }
    }
    if (0!=(i%16)) {
        k=16-(i%16);
        for (j=0;j<k;j++) {
            sprintf(binstr,"%s   ",binstr);
        }
        sprintf(binstr,"%s  ",binstr);
        k=16-k;
        for (j=i-k;j<i;j++) {
            sprintf(binstr,"%s%c",binstr,('!'<buf[j]&&buf[j]<='~')?buf[j]:'.');
        }
        printf("%s\n",binstr);
    }
}