代码之家  ›  专栏  ›  技术社区  ›  Amandasaurus

获取python可以编码到的所有编码的列表

  •  53
  • Amandasaurus  · 技术社区  · 15 年前

    我正在编写一个脚本,它将尝试在Python2.6中将字节编码成许多不同的编码。有什么方法可以得到一个可用的编码列表,我可以重复一遍吗?

    我之所以要这样做是因为用户的某些文本编码不正确。有一些有趣的人物。我知道是Unicode字符把它弄乱了。我想给他们一个答案,比如“你的文本编辑器把这个字符串解释成X编码,而不是Y编码”。我想我会尝试用一种编码方式对字符进行编码,然后用另一种编码方式再次对其进行解码,看看我们是否得到相同的字符序列。

    也就是这样:

    for encoding1, encoding2 in itertools.permutation(encodinglist(), 2):
      try:
        unicode_string = my_unicode_character.encode(encoding1).decode(encoding2)
      except:
        pass
    
    8 回复  |  直到 15 年前
        1
  •  26
  •   John Machin Santi    15 年前

    不幸地 encodings.aliases.aliases.keys() 不是合适的答案。

    aliases (正如人们所期望的那样)包含几个不同键映射到相同值的情况,例如 1252 windows_1252 都映射到 cp1252 . 如果不是 aliases.keys() 你用 set(aliases.values()) .

    但还有一个更严重的问题: 别名 不包含没有别名的编解码器(如CP856、CP874、CP875、CP737和Koi8_)。

    >>> from encodings.aliases import aliases
    >>> def find(q):
    ...     return [(k,v) for k, v in aliases.items() if q in k or q in v]
    ...
    >>> find('1252') # multiple aliases
    [('1252', 'cp1252'), ('windows_1252', 'cp1252')]
    >>> find('856') # no codepage 856 in aliases
    []
    >>> find('koi8') # no koi8_u in aliases
    [('cskoi8r', 'koi8_r')]
    >>> 'x'.decode('cp856') # but cp856 is a valid codec
    u'x'
    >>> 'x'.decode('koi8_u') # but koi8_u is a valid codec
    u'x'
    >>>
    

    同样值得注意的是,尽管您获得了完整的编解码器列表,但最好忽略那些与编码/解码字符集无关的编解码器,但执行一些其他转换,例如。 zlib , quopri base64 .

    这就引出了为什么要“尝试将字节编码成许多不同的编码”的问题。如果我们知道这一点,我们可能能够引导您朝正确的方向前进。

    首先,这是模棱两可的。一个将字节解码为Unicode,另一个将Unicode编码为字节。你想做什么?

    您真正要实现的是:您是否试图确定要使用哪个编解码器来解码一些传入的字节,并计划使用所有可能的编解码器来尝试这一点?[注意:Latin1将解码任何内容]您是否试图通过使用所有可能的编解码器对某些Unicode文本进行编码来确定其语言?[注意:UTF8将对任何内容进行编码]。

        2
  •  47
  •   Mark Amery Harley Holcombe    6 年前

    这里的其他答案似乎表明构造这个列表 以编程方式 困难重重,充满陷阱。但是,这样做可能是不必要的,因为文档包含了一个完整的标准编码列表,这是Python支持的,并且是从Python2.3开始就已经完成的。

    您可以在以下位置找到这些列表(对于目前发布的语言的每个稳定版本):

    下面是每个文档化的Python版本的列表。请注意,如果您希望向后兼容,而不仅仅是支持特定版本的python,那么您可以从 最新的 python版本和 check whether each encoding exists in the Python running your program 在尝试使用它之前。

    python 2.3(59个编码)

    ['ascii',
     'cp037',
     'cp424',
     'cp437',
     'cp500',
     'cp737',
     'cp775',
     'cp850',
     'cp852',
     'cp855',
     'cp856',
     'cp857',
     'cp860',
     'cp861',
     'cp862',
     'cp863',
     'cp864',
     'cp865',
     'cp869',
     'cp874',
     'cp875',
     'cp1006',
     'cp1026',
     'cp1140',
     'cp1250',
     'cp1251',
     'cp1252',
     'cp1253',
     'cp1254',
     'cp1255',
     'cp1256',
     'cp1257',
     'cp1258',
     'latin_1',
     'iso8859_2',
     'iso8859_3',
     'iso8859_4',
     'iso8859_5',
     'iso8859_6',
     'iso8859_7',
     'iso8859_8',
     'iso8859_9',
     'iso8859_10',
     'iso8859_13',
     'iso8859_14',
     'iso8859_15',
     'koi8_r',
     'koi8_u',
     'mac_cyrillic',
     'mac_greek',
     'mac_iceland',
     'mac_latin2',
     'mac_roman',
     'mac_turkish',
     'utf_16',
     'utf_16_be',
     'utf_16_le',
     'utf_7',
     'utf_8']

    python 2.4(85个编码)

    ['ascii',
     'big5',
     'big5hkscs',
     'cp037',
     'cp424',
     'cp437',
     'cp500',
     'cp737',
     'cp775',
     'cp850',
     'cp852',
     'cp855',
     'cp856',
     'cp857',
     'cp860',
     'cp861',
     'cp862',
     'cp863',
     'cp864',
     'cp865',
     'cp866',
     'cp869',
     'cp874',
     'cp875',
     'cp932',
     'cp949',
     'cp950',
     'cp1006',
     'cp1026',
     'cp1140',
     'cp1250',
     'cp1251',
     'cp1252',
     'cp1253',
     'cp1254',
     'cp1255',
     'cp1256',
     'cp1257',
     'cp1258',
     'euc_jp',
     'euc_jis_2004',
     'euc_jisx0213',
     'euc_kr',
     'gb2312',
     'gbk',
     'gb18030',
     'hz',
     'iso2022_jp',
     'iso2022_jp_1',
     'iso2022_jp_2',
     'iso2022_jp_2004',
     'iso2022_jp_3',
     'iso2022_jp_ext',
     'iso2022_kr',
     'latin_1',
     'iso8859_2',
     'iso8859_3',
     'iso8859_4',
     'iso8859_5',
     'iso8859_6',
     'iso8859_7',
     'iso8859_8',
     'iso8859_9',
     'iso8859_10',
     'iso8859_13',
     'iso8859_14',
     'iso8859_15',
     'johab',
     'koi8_r',
     'koi8_u',
     'mac_cyrillic',
     'mac_greek',
     'mac_iceland',
     'mac_latin2',
     'mac_roman',
     'mac_turkish',
     'ptcp154',
     'shift_jis',
     'shift_jis_2004',
     'shift_jisx0213',
     'utf_16',
     'utf_16_be',
     'utf_16_le',
     'utf_7',
     'utf_8']

    python 2.5(86个编码)

    ['ascii',
     'big5',
     'big5hkscs',
     'cp037',
     'cp424',
     'cp437',
     'cp500',
     'cp737',
     'cp775',
     'cp850',
     'cp852',
     'cp855',
     'cp856',
     'cp857',
     'cp860',
     'cp861',
     'cp862',
     'cp863',
     'cp864',
     'cp865',
     'cp866',
     'cp869',
     'cp874',
     'cp875',
     'cp932',
     'cp949',
     'cp950',
     'cp1006',
     'cp1026',
     'cp1140',
     'cp1250',
     'cp1251',
     'cp1252',
     'cp1253',
     'cp1254',
     'cp1255',
     'cp1256',
     'cp1257',
     'cp1258',
     'euc_jp',
     'euc_jis_2004',
     'euc_jisx0213',
     'euc_kr',
     'gb2312',
     'gbk',
     'gb18030',
     'hz',
     'iso2022_jp',
     'iso2022_jp_1',
     'iso2022_jp_2',
     'iso2022_jp_2004',
     'iso2022_jp_3',
     'iso2022_jp_ext',
     'iso2022_kr',
     'latin_1',
     'iso8859_2',
     'iso8859_3',
     'iso8859_4',
     'iso8859_5',
     'iso8859_6',
     'iso8859_7',
     'iso8859_8',
     'iso8859_9',
     'iso8859_10',
     'iso8859_13',
     'iso8859_14',
     'iso8859_15',
     'johab',
     'koi8_r',
     'koi8_u',
     'mac_cyrillic',
     'mac_greek',
     'mac_iceland',
     'mac_latin2',
     'mac_roman',
     'mac_turkish',
     'ptcp154',
     'shift_jis',
     'shift_jis_2004',
     'shift_jisx0213',
     'utf_16',
     'utf_16_be',
     'utf_16_le',
     'utf_7',
     'utf_8',
     'utf_8_sig']

    python 2.6(90个编码)

    ['ascii',
     'big5',
     'big5hkscs',
     'cp037',
     'cp424',
     'cp437',
     'cp500',
     'cp737',
     'cp775',
     'cp850',
     'cp852',
     'cp855',
     'cp856',
     'cp857',
     'cp860',
     'cp861',
     'cp862',
     'cp863',
     'cp864',
     'cp865',
     'cp866',
     'cp869',
     'cp874',
     'cp875',
     'cp932',
     'cp949',
     'cp950',
     'cp1006',
     'cp1026',
     'cp1140',
     'cp1250',
     'cp1251',
     'cp1252',
     'cp1253',
     'cp1254',
     'cp1255',
     'cp1256',
     'cp1257',
     'cp1258',
     'euc_jp',
     'euc_jis_2004',
     'euc_jisx0213',
     'euc_kr',
     'gb2312',
     'gbk',
     'gb18030',
     'hz',
     'iso2022_jp',
     'iso2022_jp_1',
     'iso2022_jp_2',
     'iso2022_jp_2004',
     'iso2022_jp_3',
     'iso2022_jp_ext',
     'iso2022_kr',
     'latin_1',
     'iso8859_2',
     'iso8859_3',
     'iso8859_4',
     'iso8859_5',
     'iso8859_6',
     'iso8859_7',
     'iso8859_8',
     'iso8859_9',
     'iso8859_10',
     'iso8859_13',
     'iso8859_14',
     'iso8859_15',
     'iso8859_16',
     'johab',
     'koi8_r',
     'koi8_u',
     'mac_cyrillic',
     'mac_greek',
     'mac_iceland',
     'mac_latin2',
     'mac_roman',
     'mac_turkish',
     'ptcp154',
     'shift_jis',
     'shift_jis_2004',
     'shift_jisx0213',
     'utf_32',
     'utf_32_be',
     'utf_32_le',
     'utf_16',
     'utf_16_be',
     'utf_16_le',
     'utf_7',
     'utf_8',
     'utf_8_sig']

    python 2.7(93编码)

    ['ascii',
     'big5',
     'big5hkscs',
     'cp037',
     'cp424',
     'cp437',
     'cp500',
     'cp720',
     'cp737',
     'cp775',
     'cp850',
     'cp852',
     'cp855',
     'cp856',
     'cp857',
     'cp858',
     'cp860',
     'cp861',
     'cp862',
     'cp863',
     'cp864',
     'cp865',
     'cp866',
     'cp869',
     'cp874',
     'cp875',
     'cp932',
     'cp949',
     'cp950',
     'cp1006',
     'cp1026',
     'cp1140',
     'cp1250',
     'cp1251',
     'cp1252',
     'cp1253',
     'cp1254',
     'cp1255',
     'cp1256',
     'cp1257',
     'cp1258',
     'euc_jp',
     'euc_jis_2004',
     'euc_jisx0213',
     'euc_kr',
     'gb2312',
     'gbk',
     'gb18030',
     'hz',
     'iso2022_jp',
     'iso2022_jp_1',
     'iso2022_jp_2',
     'iso2022_jp_2004',
     'iso2022_jp_3',
     'iso2022_jp_ext',
     'iso2022_kr',
     'latin_1',
     'iso8859_2',
     'iso8859_3',
     'iso8859_4',
     'iso8859_5',
     'iso8859_6',
     'iso8859_7',
     'iso8859_8',
     'iso8859_9',
     'iso8859_10',
     'iso8859_11',
     'iso8859_13',
     'iso8859_14',
     'iso8859_15',
     'iso8859_16',
     'johab',
     'koi8_r',
     'koi8_u',
     'mac_cyrillic',
     'mac_greek',
     'mac_iceland',
     'mac_latin2',
     'mac_roman',
     'mac_turkish',
     'ptcp154',
     'shift_jis',
     'shift_jis_2004',
     'shift_jisx0213',
     'utf_32',
     'utf_32_be',
     'utf_32_le',
     'utf_16',
     'utf_16_be',
     'utf_16_le',
     'utf_7',
     'utf_8',
     'utf_8_sig']

    python 3.0(89个编码)

    ['ascii',
     'big5',
     'big5hkscs',
     'cp037',
     'cp424',
     'cp437',
     'cp500',
     'cp737',
     'cp775',
     'cp850',
     'cp852',
     'cp855',
     'cp856',
     'cp857',
     'cp860',
     'cp861',
     'cp862',
     'cp863',
     'cp864',
     'cp865',
     'cp866',
     'cp869',
     'cp874',
     'cp875',
     'cp932',
     'cp949',
     'cp950',
     'cp1006',
     'cp1026',
     'cp1140',
     'cp1250',
     'cp1251',
     'cp1252',
     'cp1253',
     'cp1254',
     'cp1255',
     'cp1256',
     'cp1257',
     'cp1258',
     'euc_jp',
     'euc_jis_2004',
     'euc_jisx0213',
     'euc_kr',
     'gb2312',
     'gbk',
     'gb18030',
     'hz',
     'iso2022_jp',
     'iso2022_jp_1',
     'iso2022_jp_2',
     'iso2022_jp_2004',
     'iso2022_jp_3',
     'iso2022_jp_ext',
     'iso2022_kr',
     'latin_1',
     'iso8859_2',
     'iso8859_3',
     'iso8859_4',
     'iso8859_5',
     'iso8859_6',
     'iso8859_7',
     'iso8859_8',
     'iso8859_9',
     'iso8859_10',
     'iso8859_13',
     'iso8859_14',
     'iso8859_15',
     'johab',
     'koi8_r',
     'koi8_u',
     'mac_cyrillic',
     'mac_greek',
     'mac_iceland',
     'mac_latin2',
     'mac_roman',
     'mac_turkish',
     'ptcp154',
     'shift_jis',
     'shift_jis_2004',
     'shift_jisx0213',
     'utf_32',
     'utf_32_be',
     'utf_32_le',
     'utf_16',
     'utf_16_be',
     'utf_16_le',
     'utf_7',
     'utf_8',
     'utf_8_sig']

    python 3.1(90个编码)

    [ ASCII’,
    “Big5”
    “Big5HKSCs”,
    'CP037',
    “CP424”,
    “CP437”,
    ‘CP500’,
    “CP737”,
    “CP775”,
    “CP850”,
    “CP852”,
    “CP855”,
    “CP856”,
    “CP85 7”,
    “CP860”,
    “CP861”
    “CP862”,
    'CP863',
    “CP864”,
    “CP865”,
    “CP866”,
    “CP869',
    “CP84'”
    “CP875”,
    “CP932”,
    “CP949”,
    “CP950”,
    “CP1006”
    “CP1026”
    “CP1140”,
    “CP1250”,
    “CP1251”
    “CP1252”,
    “CP1253”
    “CP1254”
    “CP1255”
    “CP1256”
    “CP1257”,
    'CP1258',
    “EujyJP”,
    “EujyJiSy2004”
    “EujyJISX0213”,
    “Eukul-Kr”,
    “GB23 12”
    GBK,
    'GB18030',
    “赫兹”
    “ISO2022JP”,
    'iso2022_jp_1',
    “ISO2022A JPY2”,
    'iso2022_jp_2004',
    “ISO2022A JPY3”,
    'iso2022_jp_ext',
    “ISO2022A KR”,
    拉丁尼1
    “ISO88 592”,
    ‘ISO8599O3’,
    “ISO88 594”,
    “ISO88 595”,
    “ISO88 596”,
    “ISO88 597”,
    “ISO88 598 8”,
    'ISO8859_9',
    “ISO88 5910”,
    “ISO88 5913”,
    'ISO8859 U 14',
    'ISO8859 U 15',
    “ISO8591616”
    约哈布,
    “Kii8yr”,
    “Koi8uu”
    “麦克西里尔”
    麦克希腊语,
    “麦克西兰”
    “麦克拉廷2”
    “麦克罗曼”
    麦克土尔其,
    “PTCP154”,
    “Sefftjji”
    'shift_jis_2004',
    'SHIFT U JISX0213',
    “UTF32”
    “Uutf32”,
    “Uutf32”,
    “UTF1616”
    “Uutf1616be”
    “Uutf1616le”
    “UTF7”,
    “Uutf8”
    “Uutf88sig”

    python 3.2(92个编码)

    ['ascii',
     'big5',
     'big5hkscs',
     'cp037',
     'cp424',
     'cp437',
     'cp500',
     'cp720',
     'cp737',
     'cp775',
     'cp850',
     'cp852',
     'cp855',
     'cp856',
     'cp857',
     'cp858',
     'cp860',
     'cp861',
     'cp862',
     'cp863',
     'cp864',
     'cp865',
     'cp866',
     'cp869',
     'cp874',
     'cp875',
     'cp932',
     'cp949',
     'cp950',
     'cp1006',
     'cp1026',
     'cp1140',
     'cp1250',
     'cp1251',
     'cp1252',
     'cp1253',
     'cp1254',
     'cp1255',
     'cp1256',
     'cp1257',
     'cp1258',
     'euc_jp',
     'euc_jis_2004',
     'euc_jisx0213',
     'euc_kr',
     'gb2312',
     'gbk',
     'gb18030',
     'hz',
     'iso2022_jp',
     'iso2022_jp_1',
     'iso2022_jp_2',
     'iso2022_jp_2004',
     'iso2022_jp_3',
     'iso2022_jp_ext',
     'iso2022_kr',
     'latin_1',
     'iso8859_2',
     'iso8859_3',
     'iso8859_4',
     'iso8859_5',
     'iso8859_6',
     'iso8859_7',
     'iso8859_8',
     'iso8859_9',
     'iso8859_10',
     'iso8859_13',
     'iso8859_14',
     'iso8859_15',
     'iso8859_16',
     'johab',
     'koi8_r',
     'koi8_u',
     'mac_cyrillic',
     'mac_greek',
     'mac_iceland',
     'mac_latin2',
     'mac_roman',
     'mac_turkish',
     'ptcp154',
     'shift_jis',
     'shift_jis_2004',
     'shift_jisx0213',
     'utf_32',
     'utf_32_be',
     'utf_32_le',
     'utf_16',
     'utf_16_be',
     'utf_16_le',
     'utf_7',
     'utf_8',
     'utf_8_sig']

    python 3.3(93编码)

    ['ascii',
     'big5',
     'big5hkscs',
     'cp037',
     'cp424',
     'cp437',
     'cp500',
     'cp720',
     'cp737',
     'cp775',
     'cp850',
     'cp852',
     'cp855',
     'cp856',
     'cp857',
     'cp858',
     'cp860',
     'cp861',
     'cp862',
     'cp863',
     'cp864',
     'cp865',
     'cp866',
     'cp869',
     'cp874',
     'cp875',
     'cp932',
     'cp949',
     'cp950',
     'cp1006',
     'cp1026',
     'cp1140',
     'cp1250',
     'cp1251',
     'cp1252',
     'cp1253',
     'cp1254',
     'cp1255',
     'cp1256',
     'cp1257',
     'cp1258',
     'cp65001',
     'euc_jp',
     'euc_jis_2004',
     'euc_jisx0213',
     'euc_kr',
     'gb2312',
     'gbk',
     'gb18030',
     'hz',
     'iso2022_jp',
     'iso2022_jp_1',
     'iso2022_jp_2',
     'iso2022_jp_2004',
     'iso2022_jp_3',
     'iso2022_jp_ext',
     'iso2022_kr',
     'latin_1',
     'iso8859_2',
     'iso8859_3',
     'iso8859_4',
     'iso8859_5',
     'iso8859_6',
     'iso8859_7',
     'iso8859_8',
     'iso8859_9',
     'iso8859_10',
     'iso8859_13',
     'iso8859_14',
     'iso8859_15',
     'iso8859_16',
     'johab',
     'koi8_r',
     'koi8_u',
     'mac_cyrillic',
     'mac_greek',
     'mac_iceland',
     'mac_latin2',
     'mac_roman',
     'mac_turkish',
     'ptcp154',
     'shift_jis',
     'shift_jis_2004',
     'shift_jisx0213',
     'utf_32',
     'utf_32_be',
     'utf_32_le',
     'utf_16',
     'utf_16_be',
     'utf_16_le',
     'utf_7',
     'utf_8',
     'utf_8_sig']

    python 3.4(96编码)

    ['ascii',
     'big5',
     'big5hkscs',
     'cp037',
     'cp273',
     'cp424',
     'cp437',
     'cp500',
     'cp720',
     'cp737',
     'cp775',
     'cp850',
     'cp852',
     'cp855',
     'cp856',
     'cp857',
     'cp858',
     'cp860',
     'cp861',
     'cp862',
     'cp863',
     'cp864',
     'cp865',
     'cp866',
     'cp869',
     'cp874',
     'cp875',
     'cp932',
     'cp949',
     'cp950',
     'cp1006',
     'cp1026',
     'cp1125',
     'cp1140',
     'cp1250',
     'cp1251',
     'cp1252',
     'cp1253',
     'cp1254',
     'cp1255',
     'cp1256',
     'cp1257',
     'cp1258',
     'cp65001',
     'euc_jp',
     'euc_jis_2004',
     'euc_jisx0213',
     'euc_kr',
     'gb2312',
     'gbk',
     'gb18030',
     'hz',
     'iso2022_jp',
     'iso2022_jp_1',
     'iso2022_jp_2',
     'iso2022_jp_2004',
     'iso2022_jp_3',
     'iso2022_jp_ext',
     'iso2022_kr',
     'latin_1',
     'iso8859_2',
     'iso8859_3',
     'iso8859_4',
     'iso8859_5',
     'iso8859_6',
     'iso8859_7',
     'iso8859_8',
     'iso8859_9',
     'iso8859_10',
     'iso8859_11',
     'iso8859_13',
     'iso8859_14',
     'iso8859_15',
     'iso8859_16',
     'johab',
     'koi8_r',
     'koi8_u',
     'mac_cyrillic',
     'mac_greek',
     'mac_iceland',
     'mac_latin2',
     'mac_roman',
     'mac_turkish',
     'ptcp154',
     'shift_jis',
     'shift_jis_2004',
     'shift_jisx0213',
     'utf_32',
     'utf_32_be',
     'utf_32_le',
     'utf_16',
     'utf_16_be',
     'utf_16_le',
     'utf_7',
     'utf_8',
     'utf_8_sig']

    python 3.5(98编码)

    ['ascii',
     'big5',
     'big5hkscs',
     'cp037',
     'cp273',
     'cp424',
     'cp437',
     'cp500',
     'cp720',
     'cp737',
     'cp775',
     'cp850',
     'cp852',
     'cp855',
     'cp856',
     'cp857',
     'cp858',
     'cp860',
     'cp861',
     'cp862',
     'cp863',
     'cp864',
     'cp865',
     'cp866',
     'cp869',
     'cp874',
     'cp875',
     'cp932',
     'cp949',
     'cp950',
     'cp1006',
     'cp1026',
     'cp1125',
     'cp1140',
     'cp1250',
     'cp1251',
     'cp1252',
     'cp1253',
     'cp1254',
     'cp1255',
     'cp1256',
     'cp1257',
     'cp1258',
     'cp65001',
     'euc_jp',
     'euc_jis_2004',
     'euc_jisx0213',
     'euc_kr',
     'gb2312',
     'gbk',
     'gb18030',
     'hz',
     'iso2022_jp',
     'iso2022_jp_1',
     'iso2022_jp_2',
     'iso2022_jp_2004',
     'iso2022_jp_3',
     'iso2022_jp_ext',
     'iso2022_kr',
     'latin_1',
     'iso8859_2',
     'iso8859_3',
     'iso8859_4',
     'iso8859_5',
     'iso8859_6',
     'iso8859_7',
     'iso8859_8',
     'iso8859_9',
     'iso8859_10',
     'iso8859_11',
     'iso8859_13',
     'iso8859_14',
     'iso8859_15',
     'iso8859_16',
     'johab',
     'koi8_r',
     'koi8_t',
     'koi8_u',
     'kz1048',
     'mac_cyrillic',
     'mac_greek',
     'mac_iceland',
     'mac_latin2',
     'mac_roman',
     'mac_turkish',
     'ptcp154',
     'shift_jis',
     'shift_jis_2004',
     'shift_jisx0213',
     'utf_32',
     'utf_32_be',
     'utf_32_le',
     'utf_16',
     'utf_16_be',
     'utf_16_le',
     'utf_7',
     'utf_8',
     'utf_8_sig']

    python 3.6(98编码)

    [ ASCII’,
    “Big5”
    “Big5HKSCs”,
    “CP037”
    “CP27 3”,
    “CP424”,
    “CP437”,
    “CP500”,
    'CP720',
    “CP737”,
    “CP775”,
    “CP850”,
    “CP852”,
    “CP855”,
    “CP856”,
    “CP85 7”,
    “CP858”,
    “CP860”,
    “CP861”
    “CP862”,
    “CP863”
    “CP864”,
    'CP865',
    “CP866”,
    “CP869',
    “CP84'”
    “CP875”,
    “CP932”,
    “CP949”,
    “CP950”,
    'CP1006',
    “CP1026”
    “CP1125”,
    “CP1140”,
    “CP1250”,
    'CP1251',
    “CP1252”,
    “CP1253”
    “CP1254”
    “CP1255”
    “CP1256”
    “CP1257”,
    'CP1258',
    “CP6500 1”,
    “EujyJP”,
    “EujyJiSy2004”
    “EujyJISX0213”,
    “Eukul-Kr”,
    “GB23 12”
    GBK,
    'GB18030',
    “赫兹”
    “ISO2022JP”,
    'iso2022_jp_1',
    “ISO2022A JPY2”,
    'iso2022_jp_2004',
    “ISO2022A JPY3”,
    'iso2022_jp_ext',
    'iso2022_kr',
    拉丁尼1
    “ISO88 592”,
    ‘ISO8599O3’,
    “ISO88 594”,
    “ISO88 595”,
    “ISO88 596”,
    “ISO88 597”,
    “ISO88 598 8”,
    “ISO88 599”,
    “ISO88 5910”,
    “ISO88 5911”,
    “ISO88 5913”,
    “ISO88 5914”,
    “ISO88 5915”,
    “ISO8591616”
    约哈布,
    “Kii8yr”,
    ‘Ki88t’,
    'koi8_',
    “KZ1048”
    “麦克西里尔”
    麦克希腊语,
    “麦克西兰”
    “麦克拉廷2”
    “麦克罗曼”
    麦克土尔其,
    “PTCP154”,
    “Sefftjji”
    'shift_jis_2004',
    'SHIFT U JISX0213',
    “UTF32”
    “Uutf32”,
    “Uutf32”,
    “UTF1616”
    “Uutf1616be”
    'utf_16_le',
    “UTF7”,
    “Uutf8”
    “Uutf88sig”

    python 3.7(98编码)

    [ ASCII’,
    “Big5”,
    “Big5hkss”,
    “CP037”
    “CP27 3”,
    “CP424”,
    'CP437',
    “CP500”,
    'CP720',
    'CP737',
    “CP775”,
    “CP850”,
    “CP852”,
    “CP855”,
    “CP856”,
    “CP85 7”,
    “CP858”,
    “CP860”,
    “CP861”
    “CP862”,
    'CP863',
    “CP864”,
    “CP865”,
    “CP866”,
    “CP869',
    “CP84'”
    “CP875”,
    “CP932”,
    “CP949”,
    “CP950”,
    “CP1006”
    “CP1026”
    “CP1125”,
    “CP1140”,
    “CP1250”,
    'CP1251',
    “CP1252”,
    'CP1253',
    “CP1254”
    “CP1255”
    'CP1256',
    “CP1257”,
    “CP1258”
    “CP6500 1”,
    “EujyJP”,
    ‘euc-jis-2004’,
    “EujyJISX0213”,
    “Eukul-Kr”,
    “GB23 12”
    GBK,
    GB18030,
    “赫兹”
    “ISO2022JP”,
    “ISO2022A JPY1”,
    “ISO2022A JPY2”,
    'iso2022_jp_2004',
    “ISO2022A JPY3”,
    'iso2022_jp_ext',
    “ISO2022A KR”,
    拉丁尼1
    “ISO88 592”,
    ‘ISO8599O3’,
    “ISO88 594”,
    “ISO88 595”,
    “ISO88 596”,
    “ISO88 597”,
    “ISO88 598 8”,
    “ISO88 599”,
    “ISO88 5910”,
    “ISO88 5911”,
    “ISO88 5913”,
    “ISO88 5914”,
    “ISO88 5915”,
    “ISO8591616”
    约哈布,
    “Kii8yr”,
    ‘Ki88t’,
    “Koi8uu”
    “KZ1048”
    “麦克西里尔”
    麦克希腊语,
    “麦克西兰”
    “麦克拉廷2”
    “麦克罗曼”
    麦克土尔其,
    “PTCP154”,
    “Sefftjji”
    'shift_jis_2004',
    'SHIFT U JISX0213',
    “UTF32”
    'utf_32_be',
    “Uutf32”,
    “UTF1616”
    “Uutf1616be”
    “Uutf1616le”
    “UTF7”,
    “Uutf8”
    “Uutf88sig”

    如果它们与任何人的用例相关,请注意文档也列出了一些 Python-specific encodings 其中许多看起来主要是用于Python的内部,或者在某种程度上很奇怪,比如 'undefined' 如果尝试使用异常,则始终引发异常的编码。你可能想完全忽略这些,如果,像这里的问答者,你试图找出在现实世界中遇到的一些文本使用了什么编码。从python 3.6开始,列表如下:

    ["idna",
     "mbcs",
     "oem",
     "palmos",
     "punycode",
     "raw_unicode_escape",
     "rot_13",
     "undefined",
     "unicode_escape",
     "unicode_internal",
     "base64_codec",
     "bz2_codec",
     "hex_codec",
     "quopri_codec",
     "string_escape",
     "uu_codec",
     "zlib_codec"]
    

    最后,如果您想为更新版本的python更新我上面的表,下面是我用来生成它们的(粗糙的,不是非常健壮的)脚本:

    import requests
    import lxml.html
    import pprint
    
    for version, url in [
        ('2.3', 'https://docs.python.org/2.3/lib/node130.html'),
        ('2.4', 'https://docs.python.org/2.4/lib/standard-encodings.html'),
        ('2.5', 'https://docs.python.org/2.5/lib/standard-encodings.html'),
        ('2.6', 'https://docs.python.org/2.6/library/codecs.html#standard-encodings'),
        ('2.7', 'https://docs.python.org/2.7/library/codecs.html#standard-encodings'),
        ('3.0', 'https://docs.python.org/3.0/library/codecs.html#standard-encodings'),
        ('3.1', 'https://docs.python.org/3.1/library/codecs.html#standard-encodings'),
        ('3.2', 'https://docs.python.org/3.2/library/codecs.html#standard-encodings'),
        ('3.3', 'https://docs.python.org/3.3/library/codecs.html#standard-encodings'),
        ('3.4', 'https://docs.python.org/3.4/library/codecs.html#standard-encodings'),
        ('3.5', 'https://docs.python.org/3.5/library/codecs.html#standard-encodings'),
        ('3.6', 'https://docs.python.org/3.6/library/codecs.html#standard-encodings'),
    ]:
        html = requests.get(url).text
        doc = lxml.html.fromstring(html)
        standard_encodings_table = doc.xpath(
            '//table[preceding::h2[.//text()[contains(., "Standard Encodings")]]][//th/text()="Codec"]'
        )[0]
        codecs = standard_encodings_table.xpath('.//td[1]/text()')
        print("## Python %s (%i encodings)" % (version, len(codecs)))
        print('<pre><code>' + pprint.pformat(codecs) + '</code></pre>')
    
        3
  •  17
  •   Prof. Falken    11 年前

    也许你应该尝试使用 Universal Encoding Detector (chardet)库,而不是自己实现它。

    >>> import chardet
    >>> s = '\xe2\x98\x83' # ☃
    >>> chardet.detect(s)
    {'confidence': 0.505, 'encoding': 'utf-8'}
    
        4
  •  13
  •   Community rohancragg    7 年前

    你可以 use a technique 列出中的所有模块 encodings 包裹。

    import pkgutil
    import encodings
    
    false_positives = set(["aliases"])
    
    found = set(name for imp, name, ispkg in pkgutil.iter_modules(encodings.__path__) if not ispkg)
    found.difference_update(false_positives)
    print found
    
        5
  •  4
  •   Anurag Uniyal    12 年前

    我怀疑在编解码器模块中有这样的方法/功能,但是如果您看到 encoding/__init__.py ,search函数通过encodings modules文件夹进行搜索,因此可以这样做,例如

    >>> os.listdir(os.path.dirname(encodings.__file__))
    ['cp500.pyc', 'utf_16_le.py', 'gb18030.py', 'mbcs.pyc', 'undefined.pyc', 'idna.pyc', 'punycode.pyc', 'cp850.py', 'big5hkscs.pyc', 'mac_arabic.py', '__init__.pyc', 'string_escape.py', 'hz.py', 'cp037.py', 'cp737.py', 'iso8859_5.pyc', 'iso8859_13.pyc', 'cp861.pyc', 'cp862.py', 'iso8859_9.pyc', 'cp949.py', 'base64_codec.pyc', 'koi8_r.py', 'iso8859_2.py', 'ptcp154.pyc', 'uu_codec.pyc', 'mac_croatian.pyc', 'charmap.pyc', 'iso8859_15.pyc', 'euc_jp.py', 'cp1250.py', 'iso8859_10.pyc', 'koi8_r.pyc', 'unicode_escape.pyc', 'cp863.pyc', 'iso8859_4.pyc', 'cp852.py', 'unicode_internal.py', 'big5hkscs.py', 'cp1257.pyc', 'cp1254.py', 'shift_jisx0213.py', 'shift_jis.pyc', 'cp869.pyc', 'hp_roman8.py', 'iso8859_4.py', 'cp775.py', 'cp1251.py', 'mac_cyrillic.pyc', 'mac_greek.pyc', 'mac_roman.pyc', 'iso8859_11.pyc', 'iso8859_6.py', 'utf_8_sig.py', 'iso8859_3.py', 'iso2022_jp_1.py', 'ascii.py', 'cp1026.pyc', 'cp1250.pyc', 'cp950.py', 'raw_unicode_escape.py', 'euc_jis_2004.pyc', 'cp775.pyc', 'euc_kr.py', 'mac
    _greek.py', 'big5.pyc', 'shift_jis_2004.pyc', 'gbk.pyc', 'cp1254.pyc', 'cp1255.pyc', 'cp855.pyc', 'string_escape.pyc', 'cp949.pyc', 'cp1258.pyc', 'iso8859_3.pyc', 'mac_iceland.pyc', 'cp1251.pyc', 'cp860.py', 'cp856.py', 'cp874.py', 'iso2022_kr.py', 'cp856.pyc', 'rot_13.py', 'palmos.py', 'iso2022_jp_2.pyc', 'mac_farsi.py', 'koi8_u.pyc', 'cp1256.py', 'iso8859_10.py', 'tis_620.py', 'iso8859_14.pyc', 'cp1253.py', 'cp1258.py', 'cp437.py', 'cp862.pyc', 'mac_turkish.py', 'undefined.py', 'euc_kr.pyc', 'gb18030.pyc', 'aliases.pyc', 'iso8859_9.py', 'uu_codec.py', 'gbk.py', 'quopri_codec.pyc', 'iso8859_7.py', 'mac_iceland.py', 'iso8859_2.pyc', 'euc_jis_2004.py', 'iso2022_jp_3.pyc', 'cp874.pyc', '__init__.py', 'mac_roman.py', 'iso8859_16.py', 'cp866.py', 'unicode_internal.pyc', 'mac_turkish.pyc', 'johab.pyc', 'cp037.pyc', 'punycode.py', 'cp1253.pyc', 'euc_jisx0213.pyc', 'iso2022_jp_2004.pyc', 'iso2022_kr.pyc', 'zlib_codec.pyc', 'cp932.py', 'cp1255.py', 'iso2022_jp_1.pyc', 'cp857.pyc', 'cp424.pyc',
     'iso2022_jp_2.py', 'iso2022_jp.pyc', 'mbcs.py', 'utf_8.py', 'palmos.pyc', 'cp1252.pyc', 'aliases.py', 'quopri_codec.py', 'latin_1.pyc', 'iso2022_jp.py', 'zlib_codec.py', 'cp1026.py', 'cp860.pyc', 'cp1252.py', 'hex_codec.pyc', 'iso8859_1.pyc', 'cp850.pyc', 'cp861.py', 'iso8859_15.py', 'cp865.pyc', 'hp_roman8.pyc', 'iso8859_7.pyc', 'mac_latin2.py', 'iso8859_11.py', 'mac_centeuro.pyc', 'iso8859_6.pyc', 'ascii.pyc', 'mac_centeuro.py', 'iso2022_jp_3.py', 'bz2_codec.py', 'mac_arabic.pyc', 'euc_jisx0213.py', 'tis_620.pyc', 'shift_jis_2004.py', 'utf_8.pyc', 'cp855.py', 'mac_romanian.pyc', 'iso8859_8.py', 'cp869.py', 'ptcp154.py', 'utf_16_be.py', 'iso2022_jp_ext.pyc', 'bz2_codec.pyc', 'base64_codec.py', 'latin_1.py', 'charmap.py', 'hz.pyc', 'cp950.pyc', 'cp875.pyc', 'cp1006.pyc', 'utf_16.py', 'shift_jisx0213.pyc', 'cp424.py', 'cp932.pyc', 'iso8859_5.py', 'mac_romanian.py', 'utf_8_sig.pyc', 'iso8859_1.py', 'cp875.py', 'cp437.pyc', 'cp865.py', 'utf_7.py', 'utf_16_be.pyc', 'rot_13.pyc', 'euc_jp.p
    yc', 'raw_unicode_escape.pyc', 'iso8859_8.pyc', 'utf_16.pyc', 'iso8859_14.py', 'iso8859_16.pyc', 'cp852.pyc', 'cp737.pyc', 'mac_croatian.py', 'mac_latin2.pyc', 'iso2022_jp_ext.py', 'cp1140.py', 'mac_cyrillic.py', 'cp1257.py', 'cp500.py', 'cp1140.pyc', 'shift_jis.py', 'unicode_escape.py', 'cp864.py', 'cp864.pyc', 'cp857.py', 'hex_codec.py', 'mac_farsi.pyc', 'idna.py', 'johab.py', 'utf_7.pyc', 'cp863.py', 'iso8859_13.py', 'koi8_u.py', 'gb2312.pyc', 'cp1256.pyc', 'cp866.pyc', 'iso2022_jp_2004.py', 'utf_16_le.pyc', 'gb2312.py', 'cp1006.py', 'big5.py']
    

    但是因为任何人都可以注册一个编解码器,所以这不会是详尽的列表。

        6
  •  2
  •   Luciano Ramalho    10 年前

    python源代码的脚本位于 Tools/unicode/listcodecs.py 其中列出了所有编解码器。

    然而,在列出的编解码器中,有一些不是Unicode到字节转换器,例如 base64_codec , quopri_codec bz2_codec 正如@john machin所指出的。

        7
  •  0
  •   tzot    15 年前

    也许你可以这样做:

    from encodings.aliases import aliases
    print aliases.keys()
        8
  •  0
  •   anthony sottile    7 年前

    下面是列出stdlib编码包中定义的所有编码的编程方法,请注意,这不会列出用户定义的编码。这结合了其他答案中的一些技巧,但实际上使用编解码器的规范名称生成了一个工作列表。

    import encodings
    import pkgutil
    import pprint
    
    
    all_encodings = set()
    
    for _, modname, _ in pkgutil.iter_modules(
            encodings.__path__, encodings.__name__ + '.',
    ):
        try:
            mod = __import__(modname, fromlist=[str('__trash')])
        except (ImportError, LookupError):
            # A few encodings are platform specific: mcbs, cp65001
            # print('skip {}'.format(modname))
            pass
    
        try:
            all_encodings.add(mod.getregentry().name)
        except AttributeError as e:
            # the `aliases` module doensn't actually provide a codec
            # print('skip {}'.format(modname))
            if 'regentry' not in str(e):
                raise
    
    pprint.pprint(sorted(all_encodings))