法定警告:
不要使用正则表达式来解析(X)HTML。您最好使用解析器,例如
BeautifulSoup
.
例如。
>>> from BeautifulSoup import BeautifulSoup
>>> html = """<html><table border = 1><tr><td>JDICOM</td><td>Thu Sep 16 10:13:34 CDT 2010</td></tr></html>"""
>>> soup = BeautifulSoup(html)
>>> for each in soup.findAll(name = 'td'):
print each.contents[0]
JDICOM
Thu Sep 16 10:13:34 CDT 2010
>>>
>>> import re
>>> pattern = re.compile('<td>(.*?)</td>', re.I | re.S)
>>> for each in pattern.findall(html):
print each
JDICOM
Thu Sep 16 10:13:34 CDT 2010
>>>