linux下如何读取UNICODE文件内容

hzy694358 2011-08-11 05:02:00

linux 下：
FILE *stream = NULL;
stream = fopen("/var/www/vhosts/hzy13207.cn/httpdocs/hzy1.txt","rb");
if (stream == NULL)
{
return -1;
}
wchar_t szBuffer[1025] = {0};
if(fgetws(szBuffer,1024,stream) == NULL)
printf("err:%d",errno);
------------------------------------------------
每次调用fgetws都会返回NULL err：84
如果是用fgets，则碰到ascII字符就会有一个0的字符，自动截断
linux下到底改如何读取UNICODE文件呢？

hzy1.txt 是Windows下创建的，UTF-16无BOM文件
（linux下又该如何创建UNICODE编码的文件）

还有，我sizeof(wchar_t) 结果是4，linux下的宽字符是4字节的？？
晕了
第二贴了，拜托各位……

...全文

707 12 打赏收藏转发到动态举报

写回复

用AI写文章

12 条回复

切换为时间正序

请发表友善的回复…

发表回复

ljhhh0123 2011-08-12

打赏
举报

打开文件读取时强行每次读2字节。我只有cygwin环境：

#include <stdio.h>

#include <wchar.h>

#include <stdlib.h>

#include <locale.h>

typedef unsigned short wchar;

main(){

size_t n;

FILE *stream = NULL;

wchar szBuffer[1024];

int i;



setlocale(LC_ALL,"");

stream = fopen("test.txt","rb");

if (stream == NULL)

{

  return -1;

}



while((n=fread(szBuffer,2,1024,stream)) > 0)

  for(i=0; i<n; i++){

    putwchar(szBuffer[i]);

    }

    

}

luciferisnotsatan 2011-08-12

打赏
举报

[Quote=引用 8 楼 hzy694358 的回复:]
fgets是遇到\n截断。
strlen这类是遇到\0结束。
----------------------------
对啊
我现在是已经读取出来了存放在char里
但是这样的char根本没法用啊，碰到 0 后面的内容就不处理了
这个要如何处理呢
[/Quote]
没记错的话，另一个帖子里，是要转码吧。
二进制，直接用fread读，fread会返回读取了多少。那么就不用再strlen获得长度了。之后直接把那个char数组传给iconv的函数就行了。

size_t fread(
void *buffer,
size_t size,
size_t count,
FILE *stream
);

Parameters
buffer
Storage location for data.

size
Item size in bytes.

count
Maximum number of items to be read.

stream
Pointer to FILE structure.

Return Value
fread returns the number of full items actually read, which may be less than count if an error occurs or if the end of the file is encountered before reaching count. Use the feof or ferror function to distinguish a read error from an end-of-file condition. If size or count is 0, fread returns 0 and the buffer contents are unchanged. If stream or buffer is a null pointer, fread invokes the invalid parameter handler, as described in Parameter Validation. If execution is allowed to continue, this function sets errno to EINVAL and returns 0.

See _doserrno, errno, _sys_errlist, and _sys_nerr for more information on these, and other, error codes.

Remarks
The fread function reads up to count items of size bytes from the input stream and stores them in buffer. The file pointer associated with stream (if there is one) is increased by the number of bytes actually read. If the given stream is opened in text mode, carriage return–linefeed pairs are replaced with single linefeed characters. The replacement has no effect on the file pointer or the return value. The file-pointer position is indeterminate if an error occurs. The value of a partially read item cannot be determined.

This function locks out other threads. If you need a non-locking version, use _fread_nolock.

hzy694358 2011-08-12

打赏
举报

求解答 …………

hzy694358 2011-08-12

打赏
举报

我的代码：



FILE *stream = NULL;

stream = fopen("/var/www/vhosts/hzy13207.cn/httpdocs/hzy1.txt","r+b");

if (stream == NULL)

{

    printf("open file failed");

    return -1;

}



string str;

char szBuffer[1025] = {0};

char szTemp[3073] = {0};

size_t inBytes = 1024;

size_t outBytes = 3072;

while(fgets(szBuffer,1024,stream) != NULL)

{

   printf("%s\n",szBuffer);

   iconv_t conv;

   conv = iconv_open("UTF-8", "UTF-16");  //WCHAR_T即unicode

   //iconv函数会产生段错误 这个是什么原因

   if(iconv(conv, (char **)&szBuffer, &inBytes, (char **) &szTemp, &outBytes)==-1)

   {

       switch(errno)

       {

          case  E2BIG:

              printf("E2BiG\n");

          break;

          case  EILSEQ:

              printf("EILSEQ\n");

          break;

          case  EINVAL:

              printf("EINVAL\n");

          break;

       }

     }

    iconv_close(conv);

     //string strTemp(szTemp);

     //str += strTemp;

     printf("unicode\n");

    memset(szBuffer,0,1025);

    memset(szTemp,0,3073);

   }



    //printf("con:%s\n",str.c_str());

   fclose(stream);

iconv(conv, (char **)&szBuffer, &inBytes, (char **) &szTemp, &outBytes)
该函数会产生段错误这个是什么原因？？

hzy694358 2011-08-11

打赏
举报

fgets是遇到\n截断。
strlen这类是遇到\0结束。
----------------------------
对啊
我现在是已经读取出来了存放在char里
但是这样的char根本没法用啊，碰到 0 后面的内容就不处理了
这个要如何处理呢

luciferisnotsatan 2011-08-11

打赏
举报

[Quote=引用 6 楼 hzy694358 的回复:]

引用 4 楼 luciferisnotsatan 的回复:

fgets是遇到'\n'截断，不是\0。
char[]，用fread读吧

这个我更不理解了
ascII字符保存成UNICODE，在Windows下是两个字节
比如文件内容是abcd
则读出来是 97 00 98 00 99 00 100 00（十进制表示了）
而char 碰到0 就截断了，怎么会是换行符\n 呢
……
[/Quote]
fgets是遇到\n截断。
strlen这类是遇到\0结束。

hzy694358 2011-08-11

打赏
举报

[Quote=引用 4 楼 luciferisnotsatan 的回复:]

fgets是遇到'\n'截断，不是\0。
char[]，用fread读吧
[/Quote]
这个我更不理解了
ascII字符保存成UNICODE，在Windows下是两个字节
比如文件内容是abcd
则读出来是 97 00 98 00 99 00 100 00（十进制表示了）
而char 碰到0 就截断了，怎么会是换行符\n 呢
也就是说如果用fgets得到的szBuffer 要进行其他的处理，只有a被处理了其他的就没法处理了
比如：我想读取出来并打印出来要怎么处理呢
这个用fread也是一样的问题

luciferisnotsatan 2011-08-11