好的,这段代码有效:
from bs4 import BeautifulSoup
import urllib
import re
htmlfile = urllib.urlopen(MY SITE URL SITS HERE)
soup = BeautifulSoup(htmlfile.read())
title = soup.find('p', {'class': 'deal-title should-truncate'}).getText()
print "Title: " + str(title)
但是上面的代码只给了我第一个结果。我希望每次查找都能在整个站点中循环。为此,我尝试使用一个全面的循环来查找每次出现的数字标记(因为这个段落标记总是位于数字标记之间)。这样,我只能专注于图中的内容。然而,当我尝试以下内容时:
from bs4 import BeautifulSoup
import urllib
import re
htmlfile = urllib.urlopen(MY WEBSITE URL SITS HERE)
soup = BeautifulSoup(htmlfile.read())
deals = [figure for figure in soup.findAll('figure')]
for i in deals:
title = i.find('p', {'class': 'deal-title should-truncate'}).getText()
print "Title: " + str(title)
我收到此错误:
回溯(最近一次调用):文件“C:\Python27\blah.py”,行
11英寸
title=i.find('p',{'class':'交易标题应截断'}).getText()AttributeError:“NoneType”对象没有
属性'getText'
现在我正在尝试:
from bs4 import BeautifulSoup import urllib import re
htmlfile = urllib.urlopen(MY SITE SITS HERE) soup = BeautifulSoup(htmlfile.read())
deals = soup.findAll('figure')
for i in deals:
title = i.find('p', {'class': 'deal-title should-truncate'})
if (title == None):
title = "NONE"
else:
title = title.getText()
print "Title: " + str(title)
现在错误是:
回溯(最近一次调用):文件“C:\Python27\blah.py”,行
16英寸
print“Title:”+str(Title)UnicodeEncodeError:“ascii”编解码器无法在位置12编码字符u“\u2013”:序号不在
范围(128)