java 如何判断txt的编码字符集

HopeMan1124 2013-03-27 03:24:34

最近在做一个文本阅读器，显示时需要获得本地导入的txt的编码字符集，网上搜了，都不行，还在搜，请高手给个成熟的方法，谢谢了。在线狂等中.
我的代码(不可用)：
private static void judgeTextCode(String strFilePath) {
FileInputStream fis = null;
try {
fis = new FileInputStream(strFilePath);
int a = fis.read();
int b = fis.read();
if(a==0xFF&&b==0xFE) {
System.out.println("----------Unicode------");
}
else if(a==0xFE&&b==0xFF) {
System.out.println("----------UTF-16BE------");
}
else if(a==0xEF&&b==0xBB) {
System.out.println("----------UTF-8------");
} else {
System.out.println("----------GBK------");
}
} catch(Exception e) {
e.printStackTrace();
} finally {
try {
if(fis != null) {
fis.close();
}
} catch(Exception e) {
e.printStackTrace();
}
}

}

...全文

687 11 打赏收藏转发到动态举报

写回复

用AI写文章

11 条回复

切换为时间正序

请发表友善的回复…

发表回复

orisun 2013-06-05

打赏
举报

是呵，我现在做项目，也遇到这个问题，你解决了吗？

引用 8 楼 windows771053651 的回复:

我用了个开源的探测器方法: public static void getCharset(String path) { CodepageDetectorProxy detector = CodepageDetectorProxy.getInstance(); detector.add(new ParsingDetector(false)); detector.add(ASCIIDetector.getInstance()); detector.add(UnicodeDetector.getInstance()); java.nio.charset.Charset charset = null; File f=new File(path); try { charset = detector.detectCodepage(new BufferedInputStream(new FileInputStream(f)),100); } catch (Exception ex) {ex.printStackTrace();} if(charset!=null){ System.out.println(f.getName()+"编码是："+charset.name()); }else{ System.out.println(f.getName()+"未知"); } } 但总是报:java.lang.IllegalArgumentException: More than the given length had to be read and the given stream could not be reset. Undetermined state for this detection. at info.monitorenter.cpdetector.io.CodepageDetectorProxy.detectCodepage(CodepageDetectorProxy.java:198) 谁用过这个探测器啊，指点下

sdfhejian520 2013-03-29

打赏
举报

byte[] head = new byte[3]; inputStream.read(head); String code = "gbk"; if (head[0] == -1 && head[1] == -2){ code = "UTF-16"; } if (head[0] == -2 && head[1] == -1){ code = "Unicode"; } if (head[0] == -17 && head[1] == -69 && head[2] == -65){ code = "UTF-8"; }

HopeMan1124 2013-03-29

打赏
举报

在网上搜到一个方法，探测各种文件(html,txt,ascii)字符集都可以:http://blog.sina.com.cn/s/blog_80e4822f0101dd0s.html

HopeMan1124 2013-03-27

打赏
举报

我用了个开源的探测器方法: public static void getCharset(String path) { CodepageDetectorProxy detector = CodepageDetectorProxy.getInstance(); detector.add(new ParsingDetector(false)); detector.add(ASCIIDetector.getInstance()); detector.add(UnicodeDetector.getInstance()); java.nio.charset.Charset charset = null; File f=new File(path); try { charset = detector.detectCodepage(new BufferedInputStream(new FileInputStream(f)),100); } catch (Exception ex) {ex.printStackTrace();} if(charset!=null){ System.out.println(f.getName()+"编码是："+charset.name()); }else{ System.out.println(f.getName()+"未知"); } } 但总是报:java.lang.IllegalArgumentException: More than the given length had to be read and the given stream could not be reset. Undetermined state for this detection. at info.monitorenter.cpdetector.io.CodepageDetectorProxy.detectCodepage(CodepageDetectorProxy.java:198) 谁用过这个探测器啊，指点下

HopeMan1124 2013-03-27

打赏
举报

可是还是不行阿，记得网上有个探测器什么的

花谢尊前不敢香 2013-03-27

打赏
举报

else { code = "GBK"; } GBK改为ANSI

HopeMan1124 2013-03-27

打赏
举报

可是有篇txt，本来是UTF-8的，用上面方法得到的是GBK，出现乱码，但是用ireader读就没问题

花谢尊前不敢香 2013-03-27

打赏
举报

txt就四种编码。

HopeMan1124 2013-03-27

打赏
举报

上面的不行啊，有些txt判断正确，有些不正确

u010055153 2013-03-27

打赏
举报

65465461313113131313

花谢尊前不敢香 2013-03-27

打赏
举报



public static void main(String[] args) {
		try {
			String charset = getCharset(new File("c:\\2.txt"));
			System.out.println(charset);
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
	 public static String toHex(byte[] byteArray) {
	        int i;
	        StringBuffer buf = new StringBuffer("");
	        int len = byteArray.length;
	        for (int offset = 0; offset < len; offset++) {
	            i = byteArray[offset];
	            if (i < 0)
	                i += 256;
	            if (i < 16)
	                buf.append("0");
	            buf.append(Integer.toHexString(i));
	        }
	        return buf.toString().toUpperCase();
	    }
	  private static String getCharset(File fileName) throws IOException {
	        BufferedInputStream bin = new BufferedInputStream(new FileInputStream(fileName));
	        byte[] b = new byte[10];
	        bin.read(b, 0, b.length);
	        String first = toHex(b);
	 //这里可以看到各种编码的前几个字符是什么，gbk编码前面没有多余的
	        String code = null;
	        if (first.startsWith("EFBBBF")) {
	            code = "UTF-8";
	        } else if (first.startsWith("FEFF00")) {
	            code = "UTF-16BE";
	        } else if (first.startsWith("FFFE")) {
	            code = "Unicode";
	        } else if (first.startsWith("FFFE")) {
	            code = "Unicode";
	        } else {
	            code = "GBK";
	        }
	        return code;
	    }

Java Web开发常见问题.pdf

commons-fileupload-1.2.1.jar和commons-io-1.3.2.jar 案例上传： http://hi.baidu.com/lichao77821/blog

package com.golden.nacecsns.weblayer.action.web.login; import java.io.BufferedInputStream; import java.io.File; import java.io.FileInputStream; import java.io.IOException; public class eee { pub

Java判断字符串是什么字符集编码