代码之家 › 专栏 › 技术社区 › saffsd

如何在python doctests中包含unicode字符串?

doctest unicode python

18

saffsd · 技术社区 · 15 年前

我正在研究一些必须操作Unicode字符串的代码。我正在为它写教义,但遇到了麻烦。下面是说明问题的最小示例:

# -*- coding: utf-8 -*-
def mylen(word):
  """
  >>> mylen(u"Ã¡Ã©ÃÃ³Ãº")
  5
  """
  return len(word)

print mylen(u"Ã¡Ã©ÃÃ³Ãº")

首先,我们运行代码以查看 print mylen(u"Ã¡Ã©ÃÃ³Ãº") .

$ python mylen.py
5

接下来,我们对它运行doctest来查看问题。

$ python -m
5
**********************************************************************
File "mylen.py", line 4, in mylen.mylen
Failed example:
    mylen(u"Ã¡Ã©ÃÃ³Ãº")
Expected:
    5
Got:
    10
**********************************************************************
1 items had failures:
   1 of   1 in mylen.mylen
***Test Failed*** 1 failures.

那我怎么测试呢 mylen(u"Ã¡Ã©ÃÃ³Ãº") 评估为5?

5 回复 | 直到 15 年前

1

18

u0b34a0f6ae 15 年前

如果需要Unicode字符串,则必须使用Unicode DocStrings!介意 u 你说什么?

# -*- coding: utf-8 -*-
def mylen(word):
  u"""        <----- SEE 'u' HERE
  >>> mylen(u"Ã¡Ã©ÃÃ³Ãº")
  5
  """
  return len(word)

print mylen(u"Ã¡Ã©ÃÃ³Ãº")

只要测试通过,这就可以工作。对于python 2.x,您还需要另一个黑客来使详细的doctest模式工作,或者在测试失败时获得正确的跟踪:

if __name__ == "__main__":
    import sys
    reload(sys)
    sys.setdefaultencoding("UTF-8")
    import doctest
    doctest.testmod()

注意!仅在调试时使用setDefaultEncoding。我接受它用于doctest,但在您的生产代码中没有任何地方。

2

6

dmitry_romanov Akram BEN GHANEM 13 年前

python 2.6.6不太了解unicode输出,但可以使用以下方法修复:

已经描述过黑客 sys.setdefaultencoding("UTF-8")
Unicode DocString(上面也提到过,非常感谢)
和 print 声明。

在我的例子中,这个docstring告诉我们测试被破坏了:

def beatiful_units(*units):
    u'''Returns nice string like 'erg/(cmÂ² sec)'.

    >>> beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))
    u'erg/(cmÂ² sec)'
    '''

带“错误”信息

Failed example:
    beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))
Expected:
    u'erg/(cmÂ² sec)'
Got:
    u'erg/(cm\xb2 sec)'

使用 打印 我们可以解决这个问题:

def beatiful_units(*units):
    u'''Returns nice string like 'erg/(cmÂ² sec)'.

    >>> print beatiful_units(('erg', 1), ('cm', -2), ('sec', -1))
    erg/(cmÂ² sec)
    '''

3

2

Ned Deily 15 年前

在Python中,这似乎是一个已知且尚未解决的问题。见公开问题 here 和 here .

毫不奇怪,它可以在python 3中修改为正常工作,因为所有字符串都是Unicode格式的:

def mylen(word):
  """
  >>> mylen("Ã¡Ã©ÃÃ³Ãº")
  5
  """
  return len(word)

print(mylen("Ã¡Ã©ÃÃ³Ãº"))

4

1

Andrew Dalke 15 年前

我的解决方案是转义unicode字符,比如u'\xe1\xe9\xed\xf3\xfa'。虽然不容易阅读,但我的测试只有几个非ASCII字符,因此在这些情况下,我把描述放在一边作为注释,就像“n with tilde”。

5

1

Pieter Ennes 10 年前

如前所述,您需要确保docstrings是unicode。

如果您可以切换到python 3,那么它将在那里自动工作,如 二者都 源编码已经是UTF-8,默认字符串类型是Unicode。

要在python 2中实现相同的目标,需要 coding: utf-8 旁边可以在所有docstring前面加上前缀 u ,或者简单地加上

from __future__ import unicode_literals