Friday 1 May 2009

python 编码转换


**常见的编码转换分为以下几种情况:**


===== unicode 转换为其它编码(GBK, GB2312等) =====


例如:a为unicode编码 要转为gb2312。a.encode('gb2312') <code python> # -*- coding=gb2312 -*- a = u"中文" a_gb2312 = a.encode('gb2312') print a_gb2312 </code>


===== 其它编码(utf-8,GBK)转换为unicode =====


例如:a为gb2312编码,要转为unicode. unicode(a, 'gb2312')或a.decode('gb2312') <code python> # -*- coding=gb2312 -*- a = u"中文" a_gb2312 = a.encode('gb2312') print a_gb2312


a_unicode = a_gb2312.decode('gb2312') assert(a_unicode == a) a_utf_8 = a_unicode.encode('utf-8') print a_utf_8 </code>


===== 非unicode编码之间的转换 =====


编码1(GBK,GB2312) 转换为 编码2(utf-8,utf-16,ISO-8859-1)


可以先转为unicode再转为编码2


如gb2312转utf-8 <code python> # -*- coding=gb2312 -*- a = u"中文" a_gb2312 = a.encode('gb2312') print a_gb2312


a_unicode = a_gb2312.decode('gb2312') assert(a_unicode == a) a_utf_8 = a_unicode.encode('utf-8') print a_utf_8 </code>


===== 判断字符串的编码 =====


isinstance(s, str) 用来判断是否为一般字符串 \\ isinstance(s, unicode) 用来判断是否为unicode \\ 如果一个字符串已经是unicode了,再执行unicode转换有时会出错(并不都出错) \\






No comments:

Post a Comment