代码之家  ›  专栏  ›  技术社区  ›  Mislav

使用POST和g-recaptcha-response参数提交表单

  •  8
  • Mislav  · 技术社区  · 6 年前

    我要提交来自以下网页的表单: http://www.hzzo-net.hr/statos_OIB.htm

    首先,我使用2captcha服务绕过recaptcha:

    # parameters
    api_key <- "c+++"
    api_url <- "http://2captcha.com/in.php"
    site_key <- "6Lc3SAgUAAAAALFnYxUbXlcJ8I9grvAPC6LFTKQs"
    hzzo_url <- "http://www.hzzo-net.hr/statos_OIB.htm"
    
    # GET method
    req_url <- paste0("http://2captcha.com/in.php?key=", api_key,"&method=userrecaptcha&googlekey=", 
                      site_key, "&pageurl=", hzzo_url)
    get_response <- POST(req_url)
    hzzo_content <- content(get_response)
    hzzo_content <- xml_text(hzzo_content)
    captcha_id <- stringr::str_extract_all(hzzo_content[[1]], "\\d+")[[1]]
    
    # solve captcha
    Sys.sleep(16L)
    captcha2_solve <- function(apiKey, capstchaID){
      req_url <- paste0("http://2captcha.com/res.php?key=", api_key,"&action=get&id=", capstchaID)
      result <- GET(req_url)
      captcha_content <- content(result)
      hzzo_response <- xml_text(captcha_content)
      hzzo_response <- strsplit(hzzo_response, "\\|")
      return(hzzo_response)
      # hzzo_response <- hzzo_response[[1]][[2]]
      # return(hzzo_response)
    }
    hzzo_response <- captcha2_solve(api_key, captcha_id)
    while(hzzo_response[[1]] == "CAPCHA_NOT_READY"){
      Sys.sleep(16L)
      hzzo_response <- captcha2_solve(api_key, captcha_id)
      return(hzzo_response)
    }
    hzzo_response <- hzzo_response[[1]][[2]]
    

    "03AHqfIOmo9BlCsCKyg-lDes4oW-U3PWgCtATRUqXFcEV032acDgGoOzrV8GiZNDzCF4TbCVLcY8HZ8hR1JqO11YdRExvgPDL0EUsjCZdI0rUm_LnBRRifyb66X7V6r4n8CIm1si3EKmw36XIcZK7MGrHSNWRrj2aGzWAYO8ceobViOICOhkYe9Bsfv64tUHWvHSqNIoesD_FHplbWG3B0eMag5341NyycjpNLxgNCwVzA8mhCU3oQUcloze-mIclFMZ7J_nbVhXdy8-qipF5ZFH4xIhSQXHH-TqxyaGQFjKdgLch7MuDEQVRcQGo1o4QuSEoeCTjlPn3Mah5vC8zKrnqfbMgiOVOIDJFGvFY4KOivbBzYTz5nW9g"
    

    在那之后,我应该提交表格。这是我做不到的部分。

    我试图将所有参数添加到POST:

    parameters <- list(
      'upoib' = "93335620125", # example of number to enter
      'g-recaptcha-response' = hzzo_response
    )
    
    test <- POST(
      "http://www.hzzo-net.hr/statos_OIB.htm",
      body = toJSON(parameters), 
      encode = "json",
      verbose()
    )
    

    如果我有recaptcha response变量,如何提交表单?是否可以与httr包一起提交,或者我必须使用Selenium。代码可以是R或Python(只需要最后一部分POST函数)。

    1 回复  |  直到 5 年前
        1
  •  3
  •   t.m.adam    6 年前

    如果您检查html,您将看到表单的操作是 ../cgi-bin/statos_OIB.cgi http://www.hzzo-net.hr/cgi-bin/statos_OIB.cgi ,因此必须使用该url。

    另外,在一些测试之后,我发现服务器返回500个响应,除非一个有效的referer( http://www.hzzo-net.hr/statos_OIB.htm

    我不熟悉R,但是我可以用Python提供一个例子,使用 requests 图书馆。

    import requests
    
    url = "http://www.hzzo-net.hr/cgi-bin/statos_OIB.cgi"
    hzzo_response = 'your token'
    data = {
        'upoib': '93335620125', 
        'g-recaptcha-response': hzzo_response
    }
    headers = {'referer': 'http://www.hzzo-net.hr/statos_OIB.htm'}
    r = requests.post(url, data=data, headers=headers)
    html = r.text
    
    print(html)
    

    httr 我设法在R中“翻译”了上面的代码。如果提供了有效的令牌,代码将产生正确的结果。

    library(httr)
    
    url <- "http://www.hzzo-net.hr/cgi-bin/statos_OIB.cgi"
    hzzo_response <- "your token"
    parameters <- list(
      'upoib' = "93335620125", 
      'g-recaptcha-response' = hzzo_response
    )
    test <- POST(
      url,
      body = parameters, 
      add_headers(Referer = 'http://www.hzzo-net.hr/statos_OIB.htm'),
      encode = "form",
      verbose()
    )
    html <- content(test, 'text', encoding = 'UTF-8')
    
    print(html)