html5lib
注意,它的最新版本(0.11)有点旧。使用python部分,我有递归问题,如中所述
Issue 70
和
Issue 59
但找不到最近稳定的反复无常的版本。
最新的提示不好,我从
python setup.py install
:
byte-compiling build/bdist.linux-x86_64/egg/html5lib/treewalkers/_base.py to _base.pyc
File "build/bdist.linux-x86_64/egg/html5lib/treewalkers/_base.py", line 40
"data": []}
^
SyntaxError: invalid syntax
我在运行时得到以下错误:
soup = parser.parse(page.read())
File "build/bdist.linux-x86_64/egg/html5lib/html5parser.py", line 165, in parse
File "build/bdist.linux-x86_64/egg/html5lib/html5parser.py", line 144, in _parse
File "build/bdist.linux-x86_64/egg/html5lib/html5parser.py", line 454, in processDoctype
TypeError: insertDoctype() takes exactly 4 arguments (2 given)
我在python 2.5.2上使用它,使用LXML和BeautifulSoup。