代码之家  ›  专栏  ›  技术社区  ›  Josh

Ruby gem mechanize抛出错误:未定义的方法`<=>'

  •  1
  • Josh  · 技术社区  · 14 年前

    我正在使用Ruby gem mechanize来抓取一些html。。。当我加载我的页面并显示必要的结果时,页面会变好。重新加载后,在执行“search_results=@agent.submit(search_form)”时出现此错误:

    undefined method `<=>' for {emptyelem <input name="hl" value="en" type="hidden">}:Hpricot::Elem
    

    在我发布任何代码之前,这能让我振作起来吗?

    谢谢。

    代码:

        start = Time.now
    
        # initial set up
        @agent = Mechanize.new
        Mechanize.html_parser = Hpricot
        page = @agent.get("http://www.google.com/")
        search_form = page.forms.first
    
        # conduct initial search
        @search_term = search_form.q = params[:search].to_s
        search_results = @agent.submit(search_form)
    
        # helper variables
        search_qs = ""; @page_number = 1; i = 0; @flag = false;
    
        # get the query string structure
        search_results.links.each { |li| search_qs = li.href if li.href.match(/.*search\?q=.*start=.*/) }
    
        # search through all paginated pages
        while (i < 500)
          search_qs = search_qs.gsub(/start=\d+/,"start=#{i}")
          @search_url = "http://google.com#{search_qs}"
          search_results = @agent.get(@search_url)
          search_results.links.each { |li| @flag = true if li.text.match("All Bout Texas Tailgating") }
          break if @flag
          i+=10; @page_number+=1
        end
    
    @execution_time = Time.now-start
    
    render :layout => false
    

    <h2>Query results for "<%= @search_term %>" on Google</h2>
    
    <% if @flag %>
        <p>What page is this keyword found: <b><%= @page_number %></b></p>
        <p><%= link_to  "Click to see page", "#{@search_url}", {:target => "_blank"} %></p>
        <p>How long did this query take to run?: <%= @execution_time %> seconds</p>
    <% else %>
        <p>Keyword not found in Google search reults</p>
    <% end %>
    

    堆栈跟踪:

     NoMethodError (undefined method `<=>' for {emptyelem <input name="hl" value="en" type="hidden">}:Hpricot::Elem):
      mechanize (1.0.0) lib/mechanize/form/field.rb:30:in `<=>'
      mechanize (1.0.0) lib/mechanize/form.rb:171:in `sort'
      mechanize (1.0.0) lib/mechanize/form.rb:171:in `build_query'
      mechanize (1.0.0) lib/mechanize.rb:373:in `submit'
      app/controllers/admin/importer_controller.rb:24:in `check_page_rank'
      /opt/local/lib/ruby/1.8/webrick/httpserver.rb:104:in `service'
      /opt/local/lib/ruby/1.8/webrick/httpserver.rb:65:in `run'
      /opt/local/lib/ruby/1.8/webrick/server.rb:173:in `start_thread'
      /opt/local/lib/ruby/1.8/webrick/server.rb:162:in `start'
      /opt/local/lib/ruby/1.8/webrick/server.rb:162:in `start_thread'
      /opt/local/lib/ruby/1.8/webrick/server.rb:95:in `start'
      /opt/local/lib/ruby/1.8/webrick/server.rb:92:in `each'
      /opt/local/lib/ruby/1.8/webrick/server.rb:92:in `start'
      /opt/local/lib/ruby/1.8/webrick/server.rb:23:in `start'
      /opt/local/lib/ruby/1.8/webrick/server.rb:82:in `start'
    
    Rendered rescues/_trace (98.4ms)
    Rendered rescues/_request_and_response (1.2ms)
    Rendering rescues/layout (internal_server_error)
    
    1 回复  |  直到 14 年前
        1
  •  0
  •   Brett Bender    14 年前

    所以如果你仔细看看 source for mechanize 在form.rb中,表单提交调用一个名为build_query的函数,该函数对表单上的字段进行排序。由于sort使用了<=>运算符,而且它在Hpricot元素上未定义,因此您得到一个异常。

    似乎机械化是为了使用 Nokogiri -它可能与其他解析实现有未修复的错误。我没有深入了解mechanize的来源,也不想责怪任何人,但您可能想尝试切换到Nokogiri为这个项目(如果可能的话)。从这一小段看来,你并不是在严重依赖Hpricot。在我看来,mechanize在Hpricot的一个隐藏表单字段上抛出一个异常似乎很奇怪,但在这方面,堆栈跟踪非常清楚。

    您的另一个主要选择是跳转到mechanize源代码中,看看是否可以自己修复它(或者在mechanize github上提交一个bug,希望有人能找到它)。

    祝你好运。