代码之家 › 专栏 › 技术社区 › Rudy

美丽汤中的url错误

ubuntu-12.04 beautifulsoup python-2.7 python

Rudy · 技术社区 · 11 年前

我正在尝试使用beautifulsoup从Craigslist获取数据PID和价格。我写了一个单独的代码,它给了我CLallsites.txt文件。在这个代码中,我试图从txt文件中获取每个站点,并获得前10页中所有条目的PID。我的代码是:

  from bs4 import BeautifulSoup       
  from urllib2 import urlopen 
  readfile = open("CLallsites.txt")
  product = "mcy"
  while 1:
    u = ""
    count = 0
    line = readfile.readline()
    commaposition = line.find(',')
    site = line[0:commaposition]
    location = line[commaposition+1:]
    site_filename = location + '.txt'
    f = open(site_filename, "a")
    while (count < 10):
       sitenow = site + "\\" + product + "\\" + str(u)
       html = urlopen(str(sitenow))                      
       soup = BeautifulSoup(html)                
       postings = soup('p',{"class":"row"})
       for post in postings:
            y = post['data-pid']
            print y
       count = count +1
       index = count*100
       u = "index" + str(index) + ".html"
    if not line:
        break
    pass

我的CLallsites.txt如下所示:

craiglist网站,位置(Stackoverflow不允许使用cragslist链接发布,所以我无法显示文本,如果有帮助,我可以尝试附加文本文件。)

当我运行代码时,我会得到以下错误:

追踪(最近一次通话):

文件“reading.py”,第16行,in html=urlopen(str(sitenow))

文件“/usr/lib/python2.7/urllib2.py”,第126行,在urlopen中 return _opener.open(url、数据、超时)

文件“/usr/lib/python2.7/urllib2.py”,第400行,打开回应=自我_打开(请求,数据)

文件“/usr/lib/python2.7/urllib2.py”,第418行,在_open中 '打开',请求)

文件“/usr/lib/python2.7/urllib2.py”,第378行,在_call_chain中结果=函数(*args)

http_open中的文件“/usr/lib/python2.7/urllib2.py”,第1207行 return self.do_open(httplib.HTTPConnection,req)

文件“/usr/lib/python2.7/urllib2.py”,第1177行,在do_open中引发URLError(错误)

urlib2.URLError(URL错误):

你知道我做错了什么吗?

1 回复 | 直到 11 年前

A. Rodas 11 年前

我不知道的内容是什么 sitenow ,但它看起来是一个无效的URL。请注意,URL使用斜杠而不是反斜杠(因此语句应该类似于 sitenow = site + "/" + product + "/" + str(u) )

推荐文章

BAE · 无法将主机添加到已知主机列表中,如何调试[已关闭]

9 年前

DoOrDoNot · plpython,postgresql(ubuntu)中没有此类目录错误

9 年前

Ardy Dedase · 导入错误:没有名为couchbase.libouchbase的模块

9 年前

mahi · 使用java代码[重复]触发终端命令

10 年前

cloudygoose · 如何查找包的源?

10 年前

arash moeen · android USB调试消失,不再工作

10 年前

Freac · 自动化Apache STORM部署

10 年前

HXH · Rails connect oracle引发错误

10 年前

nwalke · 尝试安装Grails 2.2.1总是通过GVM为我提供Grails 2.1.1

10 年前

Nishant Kumar · PTrace:linux/user.h:没有这样的文件或目录

10 年前