紧急求助!如何获取文本文件的编码方式是utf-8还是gb2312或者ansi

sunbf 2011-01-27 04:47:30
知道的只有文本文件的路径,通过何种方法可以得到它的编码呢?多谢各位!在线等
...全文
496 8 打赏 收藏 转发到动态 举报
写回复
用AI写文章
8 条回复
切换为时间正序
请发表友善的回复…
发表回复
sunbf 2011-02-03
  • 打赏
  • 举报
回复
可惜4楼的是c写的代码 我的是vb 用不了 :( 偶是菜鸟,能翻译过来吗?
CloneCenter 2011-01-28
  • 打赏
  • 举报
回复
4 楼是正确的,如果UTF8文件有前导的字节,那.NET自己就可以正确识别编码了。如果没有前导字节,需要自己写方法去判断文件编码,4 楼的应该是可用的。
sunbf 2011-01-28
  • 打赏
  • 举报
回复
2 楼 3楼 无论打开什么类型文件 得到的结果都是一样的 :( 不行
4楼的c 我不会用
纠结的程序猿 2011-01-28
  • 打赏
  • 举报
回复
UTF-8有的文件有BOM(BOM就是表示文件类型的前两个字节或三个字节),有的文件没有BOM。
当UTF-8没有BOM的时候,记事本就不能严格区分ANSI文件和UTF-8文件。
这种情况下记事本只能根据文件中的内容进行猜测。如果ANSI文件中内容刚好和某些UTF-8编码匹配,记事本就会把它误当作UTF-8来处理,从而显示乱码。
一个比较常见的例子是“没”,把一个汉字“没”输入文件,然后打开,当今绝大部分流行的文件编辑器都会显示乱码“û”。唯一能正确显示这个汉字的文件编辑器是PilotEdit,PilotEdit具有很强的编码识别能力。
wuyq11 2011-01-27
  • 打赏
  • 举报
回复
using System;
using System.IO;
using System.Text;

namespace ICSharpCode.TextEditor.Util
{
/// <summary>
/// Class that can open text files with auto-detection of the encoding.
/// </summary>
public static class FileReader
{
public static bool IsUnicode(Encoding encoding)
{
int codepage = encoding.CodePage;
// return true if codepage is any UTF codepage
return codepage == 65001 || codepage == 65000 || codepage == 1200 || codepage == 1201;
}

public static string ReadFileContent(string fileName, ref Encoding encoding, Encoding defaultEncoding)
{
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read)) {
using (StreamReader reader = OpenStream(fs, encoding, defaultEncoding)) {
encoding = reader.CurrentEncoding;
return reader.ReadToEnd();
}
}
}

public static StreamReader OpenStream(FileStream fs, Encoding suggestedEncoding, Encoding defaultEncoding)
{
if (fs.Length > 3) {
// the autodetection of StreamReader is not capable of detecting the difference
// between ISO-8859-1 and UTF-8 without BOM.
int firstByte = fs.ReadByte();
int secondByte = fs.ReadByte();
switch ((firstByte << 8) | secondByte) {
case 0x0000: // either UTF-32 Big Endian or a binary file; use StreamReader
case 0xfffe: // Unicode BOM (UTF-16 LE or UTF-32 LE)
case 0xfeff: // UTF-16 BE BOM
case 0xefbb: // start of UTF-8 BOM
// StreamReader autodetection works
fs.Position = 0;
return new StreamReader(fs);
default:
return AutoDetect(fs, (byte)firstByte, (byte)secondByte, defaultEncoding);
}
} else {
if (suggestedEncoding != null) {
return new StreamReader(fs, suggestedEncoding);
} else {
return new StreamReader(fs);
}
}
}

static StreamReader AutoDetect(FileStream fs, byte firstByte, byte secondByte, Encoding defaultEncoding)
{
int max = (int)Math.Min(fs.Length, 500000); // look at max. 500 KB
const int ASCII = 0;
const int Error = 1;
const int UTF8 = 2;
const int UTF8Sequence = 3;
int state = ASCII;
int sequenceLength = 0;
byte b;
for (int i = 0; i < max; i++) {
if (i == 0) {
b = firstByte;
} else if (i == 1) {
b = secondByte;
} else {
b = (byte)fs.ReadByte();
}
if (b < 0x80) {
// normal ASCII character
if (state == UTF8Sequence) {
state = Error;
break;
}
} else if (b < 0xc0) {
// 10xxxxxx : continues UTF8 byte sequence
if (state == UTF8Sequence) {
--sequenceLength;
if (sequenceLength < 0) {
state = Error;
break;
} else if (sequenceLength == 0) {
state = UTF8;
}
} else {
state = Error;
break;
}
} else if (b >= 0xc2 && b < 0xf5) {
// beginning of byte sequence
if (state == UTF8 || state == ASCII) {
state = UTF8Sequence;
if (b < 0xe0) {
sequenceLength = 1; // one more byte following
} else if (b < 0xf0) {
sequenceLength = 2; // two more bytes following
} else {
sequenceLength = 3; // three more bytes following
}
} else {
state = Error;
break;
}
} else {
// 0xc0, 0xc1, 0xf5 to 0xff are invalid in UTF-8 (see RFC 3629)
state = Error;
break;
}
}
fs.Position = 0;
switch (state) {
case ASCII:
case Error:
// when the file seems to be ASCII or non-UTF8,
// we read it using the user-specified encoding so it is saved again
// using that encoding.
if (IsUnicode(defaultEncoding)) {
// the file is not Unicode, so don't read it using Unicode even if the
// user has choosen Unicode as the default encoding.

// If we don't do this, SD will end up always adding a Byte Order Mark
// to ASCII files.
defaultEncoding = Encoding.Default; // use system encoding instead
}
return new StreamReader(fs, defaultEncoding);
default:
return new StreamReader(fs);
}
}
}
}

wuyq11 2011-01-27
  • 打赏
  • 举报
回复
NET默认的编码是UTF-8
StreamReader sr=new StreamReader(@ "F:\temp\1.txt ");
sr.CurrentEncoding
开头字节 Charset/encoding
EF BB BF UTF-8
FE FF UTF-16/UCS-2, little endian
FF FE UTF-16/UCS-2, big endian
FF FE 00 00 UTF-32/UCS-4, little endian.
00 00 FE FF UTF-32/UCS-4, big-endian.
xingyuebuyu 2011-01-27
  • 打赏
  • 举报
回复
        Dim fs As System.IO.FileStream = New System.IO.FileStream("d:\1111.txt", IO.FileMode.Open)

Dim sr As System.IO.StreamReader = New System.IO.StreamReader(fs)
Dim txtEncode As System.Text.Encoding = sr.CurrentEncoding
sr.Close()
sr.Dispose()
fs.Close()
fs.Dispose()
sunbf 2011-01-27
  • 打赏
  • 举报
回复
对了,只会vb.net :(

16,552

社区成员

发帖
与我相关
我的任务
社区描述
VB技术相关讨论,主要为经典vb,即VB6.0
社区管理员
  • VB.NET
  • 水哥阿乐
  • 无·法
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧