紧急求助!如何获取文本文件的编码方式是utf-8还是gb2312或者ansi

sunbf 2011-01-27 04:47:30

知道的只有文本文件的路径,通过何种方法可以得到它的编码呢?多谢各位!在线等

...全文

496 8 打赏收藏转发到动态举报

写回复

用AI写文章

8 条回复

切换为时间正序

请发表友善的回复…

发表回复

sunbf 2011-02-03

打赏
举报

可惜4楼的是c写的代码我的是vb 用不了：（偶是菜鸟，能翻译过来吗？

CloneCenter 2011-01-28

打赏
举报

4 楼是正确的，如果UTF8文件有前导的字节，那.NET自己就可以正确识别编码了。如果没有前导字节，需要自己写方法去判断文件编码，4 楼的应该是可用的。

sunbf 2011-01-28

打赏
举报

2 楼 3楼无论打开什么类型文件得到的结果都是一样的 :( 不行
4楼的c 我不会用

纠结的程序猿 2011-01-28

打赏
举报

UTF-8有的文件有BOM（BOM就是表示文件类型的前两个字节或三个字节），有的文件没有BOM。
当UTF-8没有BOM的时候，记事本就不能严格区分ANSI文件和UTF-8文件。
这种情况下记事本只能根据文件中的内容进行猜测。如果ANSI文件中内容刚好和某些UTF-8编码匹配，记事本就会把它误当作UTF-8来处理，从而显示乱码。
一个比较常见的例子是“没”，把一个汉字“没”输入文件，然后打开，当今绝大部分流行的文件编辑器都会显示乱码“û”。唯一能正确显示这个汉字的文件编辑器是PilotEdit，PilotEdit具有很强的编码识别能力。

wuyq11 2011-01-27

打赏
举报

using System;
using System.IO;
using System.Text;

namespace ICSharpCode.TextEditor.Util
{
/// <summary>
/// Class that can open text files with auto-detection of the encoding.
/// </summary>
public static class FileReader
{
public static bool IsUnicode(Encoding encoding)
{
int codepage = encoding.CodePage;
// return true if codepage is any UTF codepage
return codepage == 65001 || codepage == 65000 || codepage == 1200 || codepage == 1201;
}

public static string ReadFileContent(string fileName, ref Encoding encoding, Encoding defaultEncoding)
{
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read)) {
using (StreamReader reader = OpenStream(fs, encoding, defaultEncoding)) {
encoding = reader.CurrentEncoding;
return reader.ReadToEnd();
}
}
}

public static StreamReader OpenStream(FileStream fs, Encoding suggestedEncoding, Encoding defaultEncoding)
{
if (fs.Length > 3) {
// the autodetection of StreamReader is not capable of detecting the difference
// between ISO-8859-1 and UTF-8 without BOM.
int firstByte = fs.ReadByte();
int secondByte = fs.ReadByte();
switch ((firstByte << 8) | secondByte) {
case 0x0000: // either UTF-32 Big Endian or a binary file; use StreamReader
case 0xfffe: // Unicode BOM (UTF-16 LE or UTF-32 LE)
case 0xfeff: // UTF-16 BE BOM
case 0xefbb: // start of UTF-8 BOM
// StreamReader autodetection works
fs.Position = 0;
return new StreamReader(fs);
default:
return AutoDetect(fs, (byte)firstByte, (byte)secondByte, defaultEncoding);
}
} else {
if (suggestedEncoding != null) {
return new StreamReader(fs, suggestedEncoding);
} else {
return new StreamReader(fs);
}
}
}

static StreamReader AutoDetect(FileStream fs, byte firstByte, byte secondByte, Encoding defaultEncoding)
{
int max = (int)Math.Min(fs.Length, 500000); // look at max. 500 KB
const int ASCII = 0;
const int Error = 1;
const int UTF8 = 2;
const int UTF8Sequence = 3;
int state = ASCII;
int sequenceLength = 0;
byte b;
for (int i = 0; i < max; i++) {
if (i == 0) {
b = firstByte;
} else if (i == 1) {
b = secondByte;
} else {
b = (byte)fs.ReadByte();
}
if (b < 0x80) {
// normal ASCII character
if (state == UTF8Sequence) {
state = Error;
break;
}
} else if (b < 0xc0) {
// 10xxxxxx : continues UTF8 byte sequence
if (state == UTF8Sequence) {
--sequenceLength;
if (sequenceLength < 0) {
state = Error;
break;
} else if (sequenceLength == 0) {
state = UTF8;
}
} else {
state = Error;
break;
}
} else if (b >= 0xc2 && b < 0xf5) {
// beginning of byte sequence
if (state == UTF8 || state == ASCII) {
state = UTF8Sequence;
if (b < 0xe0) {
sequenceLength = 1; // one more byte following
} else if (b < 0xf0) {
sequenceLength = 2; // two more bytes following
} else {
sequenceLength = 3; // three more bytes following
}
} else {
state = Error;
break;
}
} else {
// 0xc0, 0xc1, 0xf5 to 0xff are invalid in UTF-8 (see RFC 3629)
state = Error;
break;
}
}
fs.Position = 0;
switch (state) {
case ASCII:
case Error:
// when the file seems to be ASCII or non-UTF8,
// we read it using the user-specified encoding so it is saved again
// using that encoding.
if (IsUnicode(defaultEncoding)) {
// the file is not Unicode, so don't read it using Unicode even if the
// user has choosen Unicode as the default encoding.

// If we don't do this, SD will end up always adding a Byte Order Mark
// to ASCII files.
defaultEncoding = Encoding.Default; // use system encoding instead
}
return new StreamReader(fs, defaultEncoding);
default:
return new StreamReader(fs);
}
}
}
}

wuyq11 2011-01-27

打赏
举报

NET默认的编码是UTF-8
StreamReader sr=new StreamReader(@ "F:\temp\1.txt ");
sr.CurrentEncoding
开头字节 Charset/encoding
EF BB BF UTF-8
FE FF UTF-16/UCS-2, little endian
FF FE UTF-16/UCS-2, big endian
FF FE 00 00 UTF-32/UCS-4, little endian.
00 00 FE FF UTF-32/UCS-4, big-endian.

xingyuebuyu 2011-01-27

打赏
举报

        Dim fs As System.IO.FileStream = New System.IO.FileStream("d:\1111.txt", IO.FileMode.Open)



        Dim sr As System.IO.StreamReader = New System.IO.StreamReader(fs)

        Dim txtEncode As System.Text.Encoding = sr.CurrentEncoding

        sr.Close()

        sr.Dispose()

        fs.Close()

        fs.Dispose()