网页源码中汉字unicode以/u开头的编码如何转换汉字。。送分

shuaialang 2012-03-21 12:30:16

特别是中西结合的

/uxxxxasdasd/uassss

送分。。。。。。。。。。。。。。。。

...全文

514 5 打赏收藏转发到动态举报

写回复

用AI写文章

5 条回复

切换为时间正序

请发表友善的回复…

发表回复

shuaialang 2012-06-18

打赏
举报

你也是做JSON？mail：gebin@vip.qq.com

fxz1c 2012-05-27

打赏
举报

怎么解决的。麻烦指点下，谢谢

erhan 2012-03-21

打赏
举报

哦，介个应该不难吧

shuaialang 2012-03-21

打赏
举报

MD,一发完帖就解决了。。送分了

shuaialang 2012-03-21

打赏
举报

\u
打反了。。

大多源码来自互联网，本人只做部分正和 import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStreamReader; import java.io.OutputStreamWriter; import com.vince.*; /** * 将本地文件以哪种编码输出 * @param inputfile 输入文件的路径 * @param outfile 输出文件的路径 * @param code 输出文件的编码 * @throws IOException */ public class Charchange{ public static void main(String[] args) throws IOException { String inputfile,outputfile,code; inputfile = "D:\\迅雷\\work\\fen\\2-temp-test.txt";//要转码的文件 outputfile = "D:\\迅雷\\work\\1.txt";//输出的文件 code = "utf-8"; System.out.println("转码开始"); convert(inputfile,outputfile,code); System.out.println("转码完成"); } public static void convert(String inputfile,String outfile,String code) throws IOException { StringBuffer sb = new StringBuffer(); StringBuffer sb2 = new StringBuffer(); //得到当前文件的编码 String ch=getCharset(inputfile); InputStreamReader isr=null; OutputStreamWriter osw =null; //根据当前文件编码进行解码 if(ch.equals("UTF8")){ isr= new InputStreamReader(new FileInputStream(inputfile), "UTF-8"); }else if(ch.equals("Unicode")){ isr= new InputStreamReader(new FileInputStream(inputfile), "Unicode"); }else { isr= new InputStreamReader(new FileInputStream(inputfile), "GB2312"); } //将字符串存入StringBuffer中 BufferedReader br = new BufferedReader(isr); String line = null; while ((line = br.readLine()) != null) { sb.append(line + "\n"); } br.close(); isr.close(); //以哪种方式写入文件 if("UTF-8".equals(code)){ osw = new OutputStreamWriter(new FileOutputStream(outfile), "UTF-8"); }else if("GB2312".equals(code)){ osw = new OutputStreamWriter(new FileOutputStream(outfile), "GB2312"); }else if("Unicode".equals(code)){ osw = new OutputStreamWriter(new FileOutputStream(outfile), "Unicode"); }else{ osw = new OutputStreamWriter(new FileOutputStream(outfile), "UTF-8"); } BufferedWriter bw = new BufferedWriter(osw); String sb1 = sb.toString(); String a1 = deal(sb1); bw.write(a1); bw.close(); osw.close(); } /** * 根据文件路径判断编码 * @param str * @return * @throws IOException */ private static String getCharset(String str) throws IOException{ BytesEncodingDetect s = new BytesEncodingDetect(); String code = BytesEncodingDetect.javaname[s.detectEncoding(new File(str))]; return code; } //本方法完成单个无字符的转换 public static String Change(String temp){ String myString = temp.replace("&#", ""); String[] split = myString.split(";"); StringBuilder sb = new StringBuilder(); for (int i = 0; i < split.length; i++) { sb.append((char)Integer.parseInt(split[i])); } return sb.toString(); } //接收String sb1并对字符串的联合处理 public static String deal(String sb1) { //模块化开始 String car="";//小车运输单个字符 while(sb1.length()!=0){ int markStar = sb1.indexOf("&"); //判断方法是以&开头的数据默认为要处理的无字符 if(markStar==0){ String temp = sb1.substring(markStar,8); car = car+Change(temp); sb1=sb1.substring(8); }else if(markStar==-1&sb1;.length()>0){ String temp = sb1.substring(0,sb1.length()); car = car+temp; sb1=sb1.substring(sb1.length()); }else{ String temp = sb1.substring(0,markStar); car = car+temp; sb1=sb1.substring(markStar); } } return car.toString() ; } }

前言 Python文件默认的编码格式是ascii ，无法识别汉字，因为ascii码中没有中文。所以py文件中要写中文字符时，一般在开头加 # -*- coding: utf-8 -*- 或者 #coding=utf-8。这是指定一种编码格式，意味着用该编码存储中文字符(也可以是gbk、gb2312等)。关于测试的几点注意 ——————————————– 注1：代码中有中文，就要在头部指定编码方式，如果用编辑器写代码，还要注意IDE的文件存储编码格式(一般在setting) 注2：python3.x的源码文件默认使用utf-8编码，可以解析中文，开头不指定也行，但为了规范和避免一些意想不到

str字符串 s = '中文' # s: s是个str对象，中文字符串。存储方式是字节码。字节码是怎么存的：如果这行代码在python解释器中输入&运行，那么s的格式就是解释器的编码格式；如果这行代码是在源码文件中写入、保存然后执行，那么解释器载入代码时就将s初始化为文件指定编码(比如py文件开头那行的utf-8)； unicode对象字符串 unicode是一种编码标准，具体的实现可能是utf-8，utf-16，gbk等等，这就是中文字符串和unicode有密切关系的原因。 python内部使用两个字节存储一个unicode对象（unicode对象并不只能是

Python [`paiθən]，译为“蟒蛇”。 Python语言拥有者是Python Software Foundation(PSF)。 PSF是非盈利组织，致力于保护Python语言开放、开源和发展。 Python 3.0 在设计的时候没有考虑向下兼容。（1）基础语法编码默认情况下，Python 3 源码文件以 UTF-8 编码，所有字符串都是 unicode 字符串。为源码文件指定不同的编码： # -*- coding: cp-1252 -*- 标识符开头字符必须是字母或下划线 _ 。标识符的其余部分由字母、数字、下划线和汉字组成。标识符对大小写敏感。在 Python 3 中，非 ASCII 标识符也是允许的。

C语言实现中文编码转换一、GBK、UNICODE、UTF8之间编码的关系二、UNICODE、UTF8之间转化实现三、UNICODE、GBK之间的转化四、编码对照表五、完整代码一、GBK、UNICODE、UTF8之间编码的关系 GBK：GBK全名为汉字内码扩展规范，英文名Chinese Internal Code Specification。GBK 采用双字节表示，总体编码范围为8140-FEFE，首字节在81-FE 之间，尾字节在40-FE 之间，剔除 xx7F一条线。总计23940 个码位，共收入218

Delphi

5,388

社区成员

262,730

社区内容

发帖

与我相关

我的任务

社区管理员

加入社区

近7日
近30日
至今

加载中

查看更多榜单

社区公告

暂无公告

试试用AI创作助手写篇文章吧

+ 用AI写文章

网页源码中汉字unicode以/u开头的编码如何转换汉字。。 送分

网页源码中汉字unicode以/u开头的编码如何转换汉字。。送分